Menu

Sampling and Sampling Distributions – A Comprehensive Guide on Sampling and Sampling Distributions

Explore the fundamentals of sampling and sampling distributions in statistics. Dive deep into various sampling methods, from simple random to stratified, and uncover the significance of sampling distributions in detail.

Written by Jagdeesh | 7 min read

Explore the fundamentals of sampling and sampling distributions in statistics. Dive deep into various sampling methods, from simple random to stratified, and uncover the significance of sampling distributions in detail.

In this blog post we will learn

  1. What is Sampling?
  2. Why Sample?
  3. Types of Sampling Methods
    3.1. Simple Random Sampling (SRS)
    3.2. Stratified Sampling
    3.3. Cluster Sampling
    3.4. Systematic Sampling
    3.5. Convenience Sampling
    3.6. Quota Sampling
  4. Simple demonstration of different sampling methods using Python
  5. What is a Sampling Distribution?
    5.1. Simulate and visualize the sampling distribution of the sample mean using Python
    5.2. Key Concepts in Sampling Distributions
    5.3. Importance of Sampling Distributions
  6. Conclusion

1. What is Sampling?

Sampling refers to the process of selecting a subset (or a sample) from a larger set (often called a population). Instead of collecting data from every individual in the population (which can be time-consuming and costly), researchers typically collect data from a sample and then use that sample to make inferences about the larger population.

For example, if we wanted to know the average height of all adult men in a country, instead of measuring every single man, we could measure a sample of them and then estimate the average height for the entire group.

2. Why Sample?

Sampling has a host of benefits:

  1. Cost-effective: It’s often cheaper to collect data from a sample than from an entire population.
  2. Time-saving: Sampling can save a considerable amount of time.
  3. Feasibility: In some cases, it’s virtually impossible to survey an entire population.
  4. Accuracy: If done correctly, sampling can provide accurate estimates of population parameters.

3. Types of Sampling Methods

3.1. Simple Random Sampling (SRS)

Definition: Every individual in the population has an equal chance of being selected.

Example: Imagine a bowl containing 100 unique lottery tickets. If you were to close your eyes and pick out 10 tickets one at a time, you’re engaging in simple random sampling.

3.2. Stratified Sampling

Definition: The population is divided into non-overlapping groups (or strata) based on a particular characteristic, and then a random sample is taken from each group.

Example: Let’s say you’re researching study habits among high school students across freshmen, sophomores, juniors, and seniors. Instead of picking randomly from the whole school, you first divide students by grade level and then randomly pick an equal number from each grade. This ensures representation from all grades.

3.3. Cluster Sampling

Definition: The population is divided into clusters (often geographically), and then a random sample of clusters is chosen. All or a random sample of members from those selected clusters will be surveyed.

Example: Imagine you want to survey households in a large city. The city is divided into different neighborhoods (clusters). Instead of sampling households from the entire city, you randomly select a few neighborhoods and then survey all households (or a random sample of them) within those selected neighborhoods.

3.4. Systematic Sampling

Definition: Every $k$ th individual is selected from a list or sequence.

Example: You have a list of 1,000 customers and want to select 50 for a survey. To do this, you might select every 20th customer from the list (1,000 divided by 50 equals 20). So you’d survey the 20th, 40th, 60th customer, and so on.

3.5. Convenience Sampling

Definition: The sample is chosen based on what is easy or convenient, rather than any systematic or random method.

Example: A street interviewer stops passers-by at a mall entrance to ask about their shopping preferences. Here, the sample consists of whoever happens to be at that particular entrance at that time – it’s convenient, but not necessarily representative of all shoppers.

3.6. Quota Sampling

Definition: The researcher ensures equal or proportionate representation of subjects depending on certain characteristics, but the selection within those categories might be non-random.

Example: If you’re surveying voters’ intentions before an election and you know the gender distribution is 50% male and 50% female, you might ensure that out of 100 surveyed individuals, 50 are male and 50 are female. However, how you select those 50 males and females might not be random.

4. Simple demonstration of different sampling methods using Python

python
import pandas as pd
import numpy as np

# Create a sample DataFrame for demonstration
data = {
    'ID': range(1, 101),  # IDs for 100 individuals
    'Age': np.random.randint(15, 65, 100),  # Random ages between 15 and 65
    'Grade': np.random.choice(['Freshman', 'Sophomore', 'Junior', 'Senior'], 100)  # Random school grades
}
df = pd.DataFrame(data)

df.head()
IDAgeGrade
0148Junior
1223Sophomore
2362Sophomore
3424Freshman
4544Junior
python
# 1. Simple Random Sampling (SRS)
srs_sample = df.sample(n=10)  # Get 10 random rows from the DataFrame

print("Simple Random Sampling (SRS) Sample:")
srs_sample
python
Simple Random Sampling (SRS) Sample:
IDAgeGrade
899038Freshman
989921Junior
767737Junior
979837Junior
282918Junior
6716Junior
323344Freshman
242556Junior
949533Senior
8924Senior
python
# 2. Stratified Sampling
strat_sample = df.groupby('Grade').apply(lambda x: x.sample(n=2)).reset_index(drop=True)  # Get 2 samples from each grade

print("\nStratified Sampling Sample:")
strat_sample
python
Stratified Sampling Sample:
IDAgeGrade
01059Freshman
19752Freshman
21217Junior
39837Junior
48851Senior
53034Senior
67233Sophomore
73530Sophomore
python
# 3. Cluster Sampling
clusters = df.groupby(df.index // 10)  # Create 10 clusters
selected_clusters = clusters.apply(lambda x: x if np.random.rand() < 0.2 else None).dropna()  # Select 20% of clusters

print("\nCluster Sampling Sample:")
selected_clusters
python
Cluster Sampling Sample:
IDAgeGrade
5505122Senior
515250Sophomore
525333Sophomore
535425Freshman
545530Senior
555646Senior
565728Freshman
575848Senior
585926Junior
596025Junior
python
# 4. Systematic Sampling
k = len(df) // 10
sys_sample = df.iloc[::k].head(10)

print("\nSystematic Sampling Sample:")
sys_sample
python
Systematic Sampling Sample:
IDAgeGrade
0148Junior
101146Junior
202141Freshman
303134Junior
404124Sophomore
505122Senior
606152Freshman
707128Senior
808160Freshman
909118Freshman
python
# 5. Convenience Sampling
# Here, we'll just take the first 10 rows. In real-world scenarios, this would be akin to surveying whoever comes first.
conv_sample = df.head(10)

print("\nConvenience Sampling Sample:")
conv_sample
python
Convenience Sampling Sample:
IDAgeGrade
0148Junior
1223Sophomore
2362Sophomore
3424Freshman
4544Junior
5652Sophomore
6716Junior
7850Sophomore
8924Senior
91059Freshman
python
# 6. Quota Sampling
# Let's say we have a quota to sample 3 individuals from each grade.
quota_sample = df.groupby('Grade').apply(lambda x: x.sample(n=3)).reset_index(drop=True)

print("\nQuota Sampling Sample:")
quota_sample
python
Quota Sampling Sample:
IDAgeGrade
04736Freshman
13344Freshman
21059Freshman
39639Junior
4544Junior
55926Junior
63034Senior
78851Senior
87856Senior
95333Sophomore
10850Sophomore
116416Sophomore

5. What is a Sampling Distribution?

A sampling distribution is the distribution of a statistic (like the mean or proportion) based on all possible samples of a given size from a population. It tells us how much we would expect our sample statistic to vary from one sample to another.

For instance, if we were to repeatedly draw different samples of 100 men from our earlier example and calculate the average height for each sample, the distribution of those sample means would be the sampling distribution of the mean.

5.1. Simulate and visualize the sampling distribution of the sample mean using Python

In this example:

  1. We’ve created a population with a mean of 75 and a standard deviation of 15.
  2. We then repeatedly (1,000 times) drew random samples (each of size 100) from this population.
  3. For each sample, we computed its mean and stored it.
  4. Finally, we visualized the distribution of these sample means.
python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generating the population data
np.random.seed(0)
population_data = np.random.randn(10000) * 15 + 75  # Let's say the population data is normally distributed with mean 75 and standard deviation 15.

# Simulate the sampling distribution of the sample mean
num_samples = 1000
sample_size = 100
sample_means = []

for _ in range(num_samples):
    sample = np.random.choice(population_data, size=sample_size, replace=False)
    sample_means.append(np.mean(sample))

# Plotting
plt.hist(sample_means, bins=30, edgecolor='k', alpha=0.7)
plt.title("Sampling Distribution of the Sample Mean")
plt.xlabel("Sample Mean")
plt.ylabel("Frequency")
plt.axvline(x=np.mean(sample_means), color='r', linestyle='dashed', linewidth=1)
plt.show()

5.2. Key Concepts in Sampling Distributions

Central Limit Theorem (CLT): For a sufficiently large sample size, the sampling distribution of the sample mean will be approximately normal, regardless of the population’s distribution. This is a powerful property that allows us to make statistical inferences.

To learn more about Central Limit Theorem refer to this blog post Central Limit Theorem

Standard Error (SE): It measures the dispersion or variability of sample statistics from one sample to the next. A smaller SE indicates that our sample statistic (like the mean) is more consistent across different samples.

5.3. Importance of Sampling Distributions

Sampling distributions are crucial for hypothesis testing and confidence interval estimation. Knowing how our sample statistic behaves (its distribution) under repeated sampling allows us to:

  1. Assess the likelihood of observing our sample results if some null hypothesis were true.
  2. Gauge the precision of our sample estimates.

6. Conclusion

Sampling and its associated distribution provide the foundation for much of inferential statistics. By understanding these concepts, we are better equipped to make informed decisions based on sample data. As always, the key lies in choosing the right sampling method and ensuring that our sample is representative of the larger population.

Free Course
Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course
Trusted by 50,000+ learners
Jagdeesh
Written by
Related Course
Master Statistics — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Free Callback - Limited Slots
Not Sure Which Course to Start With?
Talk to our AI Counsellors and Practitioners. We'll help you clear all your questions for your background and goals, bridging the gap between your current skills and a career in AI.
10-digit mobile number
📞
Thank You!
We'll Call You Soon!
Our learning advisor will reach out within 24 hours.
(Check your inbox too — we've sent a confirmation)
⚡ Before you go

Python.
SQL. NumPy.
All free.

Get the exact 10-course programming foundation that Data Science professionals use.

🐍
Core Python — from first line to expert level
📈
NumPy & Pandas — the #1 libraries every DS job needs
🗃️
SQL Levels I–III — basics to Window Functions
📄
Real industry data — Jupyter notebooks included
R A M S K
57,000+ students
★★★★★ Rated 4.9/5
⚡ Before you go
Python. SQL.
All Free.
R A M S K
57,000+ students  ★★★★★ 4.9/5
Get Free Access Now
10 courses. Real projects. Zero cost. No credit card.
New learners enrolling right now
🔒 100% free ☕ No spam, ever ✓ Instant access
🚀
You're in!
Check your inbox for your access link.
(Check Promotions or Spam if you don't see it)
Or start your first course right now:
Start Free Course →
Scroll to Top
Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science