Estimation is the process of using sample data to infer the value of an unknown population parameter. The two main tools are point estimates (a single best guess) and interval estimates (a range of plausible values).
The sampling distribution tells us how a statistic varies from sample to sample. Understanding it is the key to constructing valid confidence intervals.
Maximum Likelihood Estimation (MLE) provides a principled general-purpose method for finding the best-fitting parameter value given observed data.
Sampling distributions and standard error
A sampling distribution is the probability distribution of a statistic (such as ) computed from all possible random samples of size from a population. It describes how the statistic varies from sample to sample.
The standard error of the mean is , or estimated as when is unknown. It quantifies the precision of as an estimator of — larger gives smaller and therefore more precise estimates.
By the Central Limit Theorem, is approximately for large . This holds regardless of the shape of the population distribution, making normal-based inference broadly applicable.
A statistic is an unbiased estimator of a parameter if . The sample mean is unbiased for . The sample variance (with ) is unbiased for .
Mean Squared Error: . A good estimator minimises MSE — there is a bias-variance trade-off.
💡Explain it simply
Imagine repeatedly drawing samples of size and computing the mean each time. You would get a slightly different mean every time, but they would cluster around the true . The standard error measures how tightly they cluster. The bigger the sample, the tighter the cluster.
Point estimation and MLE
A point estimate is a single numerical value used as a best guess for a population parameter. Common examples: estimates ; estimates ; estimates .
Maximum Likelihood Estimation (MLE) finds the parameter value that maximises the likelihood function — the probability of observing the actual data, viewed as a function of .
In practice, we maximise (log-likelihood) which is easier to differentiate. Set and solve. For a normal population, MLE gives and (note: MLE uses , not ).
MLE has excellent asymptotic properties: it is consistent, asymptotically efficient (achieves the Cramér-Rao lower bound), and asymptotically normal. For large samples it is the go-to estimation method.
💡Explain it simply
MLE asks: given the data I observed, which parameter value would have made this data most likely? It's like tuning a radio — you turn the dial until the signal is clearest. The dial setting that gives the clearest signal is your maximum likelihood estimate.
Confidence intervals
A confidence interval provides a range of plausible values for an unknown parameter: for a known , or when is estimated.
Correct interpretation: if we were to repeat the sampling procedure many times, approximately of the resulting intervals would contain the true parameter. The interval either contains or it does not — after construction, there is no probability to speak of.
The margin of error determines the half-width of the interval. To achieve a desired margin of error , use .
The -distribution has heavier tails than the normal and is appropriate when is estimated from data. As degrees of freedom increases, the -distribution converges to the standard normal. For the difference is minimal in practice.
For a proportion : . Valid when and .
💡Explain it simply
A confidence interval is like saying: 'I used a method that catches the true value of the time.' It's like fishing with a net that lands fish of the time. This particular net either caught the fish or didn't — the refers to the reliability of the net, not this specific cast.
95% t-interval for a mean
- , , . Construct a CI.
- .
- (from -table with ).
- CI: .
- Interpretation: we are confident the true population mean falls between and .
Type I and Type II errors
A Type I error (false positive, size ) occurs when is true but we reject it. We control this by choosing before testing.
A Type II error (false negative, probability ) occurs when is false but we fail to reject it. Power is the probability of correctly detecting a real effect.
Factors that increase power: larger sample size (reduces , making it easier to detect real differences); larger true effect size (bigger signal); larger (less strict rejection criterion); smaller (less variability).
Sample size planning: choose to achieve a desired power (typically or ) at a specified effect size and significance level .
💡Explain it simply
Type I error: you shout 'fire!' when there is no fire (false alarm). Type II error: there is a fire but you say nothing (miss). To reduce false alarms, raise the bar for shouting fire ( gets smaller) — but then you will miss more real fires. Only collecting more information (larger ) reduces both simultaneously.
Practical vs. statistical significance
Statistical significance () means the data is unlikely under . Practical significance means the effect is large enough to matter in the real world. These are independent: large samples make tiny effects statistically significant; small samples may miss large effects.
Effect size measures the magnitude of an effect independently of sample size. Cohen's for means; or for correlations; odds ratio for proportions. Always report effect size alongside -values.
Confidence intervals are more informative than -values alone: they show both the direction, magnitude, and uncertainty of an estimate — not merely a binary decision.
💡Explain it simply
With a huge sample, even a change of degrees in temperature becomes 'statistically significant' — but nobody cares about degrees. Statistical significance just means 'the data is very unlikely under .' Practical significance asks the more important question: 'Does this actually matter?'
Common Mistakes to Avoid
- Interpreting a CI as ' probability the parameter is inside.' The parameter is fixed. The randomness is in the interval.
- Confusing 'fail to reject ' with 'accept .' Failing to reject only means evidence is insufficient — it is not proof of .
- Using a -interval when is unknown and is small. Use the -interval with degrees of freedom.
- Ignoring effect size and reporting only the -value. A small -value with a trivially small effect size is not a scientifically important finding.
- Claiming is strong evidence and is no evidence at all. The -value is a continuous measure of evidence, not a binary threshold.