Statistics is fundamentally about inferring properties of a large population from a smaller sample. Understanding how sample statistics vary — their sampling distributions — is what makes that inference valid.
The Law of Large Numbers explains why averages stabilize. The Central Limit Theorem explains why they stabilize to a normal distribution — one of the most remarkable facts in all of mathematics.
These results are the engine behind confidence intervals and hypothesis tests: once you know the sampling distribution of a statistic, you can quantify exactly how confident you should be in your conclusions.
Population vs. sample
A population is the complete set of individuals or observations of interest. A parameter (e.g., , , ) is a fixed numerical characteristic of the population — unknown and usually unknowable without a census.
A sample is a subset of the population. A statistic (e.g., , , ) is computed from sample data. Statistics are observable but vary from sample to sample — they are random variables.
A statistic is an unbiased estimator of parameter if . The sample mean is unbiased for . The sample variance is unbiased for .
Sampling design: simple random sampling (SRS) gives every sample of size equal probability. Stratified sampling divides the population into groups (strata) and draws SRSs from each, improving precision. Cluster sampling and systematic sampling are used when SRS is impractical.
Bias in sampling: a convenience sample (whoever is available) or voluntary response sample systematically misrepresents the population. No amount of statistical analysis can fix a biased design — always randomise.
💡Explain it simply
The population is the entire jar of marbles. The sample is the handful you grab. You want your handful to look like the full jar. Random sampling is the only way to make that likely — biased sampling is like always reaching to the top of the jar where red marbles float.
The Law of Large Numbers
Weak Law of Large Numbers: for i.i.d. random variables with mean , the sample mean converges in probability to : for any , as .
The Strong Law of Large Numbers guarantees almost sure convergence: . Almost every sequence of i.i.d. observations has a time-average that converges to the true mean.
The LLN justifies using observed frequencies as probability estimates, trusting large studies over small ones, and the general principle that more data gives better estimates.
The LLN does not imply the 'gambler's fallacy.' After coin flips all landing heads, the next flip is still . The LLN talks about the eventual average, not compensation of past outcomes.
💡Explain it simply
Flip a fair coin times and you might get heads (). Flip times and you will get very close to heads (). The LLN says the average stabilises toward the truth as you collect more data. Past outcomes don't 'owe' you future results — each flip is still .
The Central Limit Theorem
Central Limit Theorem (CLT): if are i.i.d. with mean and finite variance , then the standardised mean in distribution as . Equivalently, for large .
This is remarkable because the result holds regardless of the population's shape — whether it is skewed, bimodal, or uniform. The averaging process irons out all non-normality.
The standard error of the mean is . Doubling reduces by a factor of , not . To halve the standard error you need to quadruple the sample size.
Rule of thumb: is often sufficient for the normal approximation to be good, but heavily skewed populations may require or more. Always check with a histogram of the data.
The CLT also applies to sums: . And to proportions: when and .
💡Explain it simply
Here is the miracle: it doesn't matter what shape the population has. Pull samples of size , compute the average each time, plot all those averages — they always form a bell curve for large enough . The CLT is why normal distribution tools work across virtually every scientific field.
CLT probability for a sample mean
- Population: , , .
- By CLT: , so .
- .
- Interpretation: there is only a chance that a sample of size has a mean exceeding when the true mean is .
Sampling distribution of the proportion
The sample proportion (where counts successes in Bernoulli trials) estimates the population proportion .
Mean: (unbiased). Standard error: .
By the CLT: when and .
The sampling distribution of the difference (from two independent samples) is approximately . This is used for two-proportion -tests and -intervals.
💡Explain it simply
If you survey people and count how many say yes, the proportion who say yes () is your estimate of the true population proportion . The sampling distribution tells you how much will bounce around from survey to survey — and for large it bounces around in a perfectly normal pattern.
Common Mistakes to Avoid
- Confusing (population standard deviation) with (standard error of the mean). The SE is the standard deviation of the sampling distribution, always smaller than .
- Applying the CLT with a small from a heavily skewed population. Check the histogram before assuming normality.
- The gambler's fallacy: thinking past outcomes will 'balance out' future ones. The LLN applies to long-run averages, not short-run compensation.
- Forgetting the CLT applies to , not to individual observations. Individual values remain non-normal if the population is non-normal.
- Applying the proportion CLT approximation when or . Use exact binomial methods instead.