← Back to modules

Statistical Inference: Estimation

Point estimation, maximum likelihood estimation, and confidence intervals.

Estimation is the process of using sample data to infer the value of an unknown population parameter. The two main tools are point estimates (a single best guess) and interval estimates (a range of plausible values).

The sampling distribution tells us how a statistic varies from sample to sample. Understanding it is the key to constructing valid confidence intervals.

Maximum Likelihood Estimation (MLE) provides a principled general-purpose method for finding the best-fitting parameter value given observed data.

Sampling distributions and standard error

A sampling distribution is the probability distribution of a statistic (such as Xˉ\bar{X}) computed from all possible random samples of size nn from a population. It describes how the statistic varies from sample to sample.

The standard error of the mean is SEXˉ=σ/nSE_{\bar{X}} = \sigma/\sqrt{n}, or estimated as s/ns/\sqrt{n} when σ\sigma is unknown. It quantifies the precision of Xˉ\bar{X} as an estimator of μ\mu — larger nn gives smaller SESE and therefore more precise estimates.

By the Central Limit Theorem, Xˉ\bar{X} is approximately N(μ,σ2/n)N(\mu, \sigma^2/n) for large nn. This holds regardless of the shape of the population distribution, making normal-based inference broadly applicable.

A statistic is an unbiased estimator of a parameter θ\theta if E(θ^)=θE(\hat{\theta}) = \theta. The sample mean Xˉ\bar{X} is unbiased for μ\mu. The sample variance s2s^2 (with n1n-1) is unbiased for σ2\sigma^2.

Mean Squared Error: MSE(θ^)=Var(θ^)+[Bias(θ^)]2\text{MSE}(\hat{\theta}) = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2. A good estimator minimises MSE — there is a bias-variance trade-off.

Helpful?

Point estimation and MLE

A point estimate is a single numerical value used as a best guess for a population parameter. Common examples: xˉ\bar{x} estimates μ\mu; s2s^2 estimates σ2\sigma^2; p^=x/n\hat{p} = x/n estimates pp.

Maximum Likelihood Estimation (MLE) finds the parameter value θ^\hat{\theta} that maximises the likelihood function L(θ)=i=1nf(xi;θ)L(\theta) = \prod_{i=1}^n f(x_i;\theta) — the probability of observing the actual data, viewed as a function of θ\theta.

In practice, we maximise (θ)=lnL(θ)\ell(\theta) = \ln L(\theta) (log-likelihood) which is easier to differentiate. Set d/dθ=0d\ell/d\theta = 0 and solve. For a normal population, MLE gives μ^=xˉ\hat{\mu}=\bar{x} and σ^2=1n(xixˉ)2\hat{\sigma}^2 = \frac{1}{n}\sum(x_i-\bar{x})^2 (note: MLE uses nn, not n1n-1).

MLE has excellent asymptotic properties: it is consistent, asymptotically efficient (achieves the Cramér-Rao lower bound), and asymptotically normal. For large samples it is the go-to estimation method.

Helpful?

Confidence intervals

A 100(1α)%100(1-\alpha)\% confidence interval provides a range of plausible values for an unknown parameter: xˉ±zα/2SE\bar{x} \pm z_{\alpha/2}\cdot SE for a known σ\sigma, or xˉ±tα/2,n1(s/n)\bar{x} \pm t_{\alpha/2,n-1}\cdot(s/\sqrt{n}) when σ\sigma is estimated.

Correct interpretation: if we were to repeat the sampling procedure many times, approximately 100(1α)%100(1-\alpha)\% of the resulting intervals would contain the true parameter. The interval either contains μ\mu or it does not — after construction, there is no probability to speak of.

The margin of error E=zSEE = z^*\cdot SE determines the half-width of the interval. To achieve a desired margin of error EE, use n=(zσ/E)2n = (z^*\sigma/E)^2.

The tt-distribution has heavier tails than the normal and is appropriate when σ\sigma is estimated from data. As degrees of freedom df=n1df = n-1 increases, the tt-distribution converges to the standard normal. For n30n \geq 30 the difference is minimal in practice.

For a proportion pp: p^±zp^(1p^)/n\hat{p} \pm z^*\sqrt{\hat{p}(1-\hat{p})/n}. Valid when np^10n\hat{p}\geq 10 and n(1p^)10n(1-\hat{p})\geq 10.

Helpful?

Type I and Type II errors

A Type I error (false positive, size α\alpha) occurs when H0H_0 is true but we reject it. We control this by choosing α\alpha before testing.

A Type II error (false negative, probability β\beta) occurs when H0H_0 is false but we fail to reject it. Power =1β= 1-\beta is the probability of correctly detecting a real effect.

Factors that increase power: larger sample size nn (reduces SESE, making it easier to detect real differences); larger true effect size (bigger signal); larger α\alpha (less strict rejection criterion); smaller σ\sigma (less variability).

Sample size planning: choose nn to achieve a desired power (typically 80%80\% or 90%90\%) at a specified effect size and significance level α\alpha.

Helpful?

Practical vs. statistical significance

Statistical significance (pαp \leq \alpha) means the data is unlikely under H0H_0. Practical significance means the effect is large enough to matter in the real world. These are independent: large samples make tiny effects statistically significant; small samples may miss large effects.

Effect size measures the magnitude of an effect independently of sample size. Cohen's d=(xˉμ0)/sd = (\bar{x}-\mu_0)/s for means; rr or r2r^2 for correlations; odds ratio for proportions. Always report effect size alongside pp-values.

Confidence intervals are more informative than pp-values alone: they show both the direction, magnitude, and uncertainty of an estimate — not merely a binary decision.

Helpful?