← Back to modules

Continuous Probability Distributions

PDFs, CDFs, normal, exponential, and uniform distributions.

Continuous random variables can take any value in an interval. Their distributions are described by probability density functions (PDFs): probabilities are areas under the curve, never the height at a point.

The normal distribution is the cornerstone of statistics. Its bell-curve shape emerges naturally when many small independent effects are added, and it underpins virtually all classical inference methods.

The exponential and uniform distributions complete the toolkit for modeling waiting times, lifetimes, and situations where all outcomes are equally likely.

PDFs and CDFs

A probability density function (PDF) f(x)f(x) satisfies f(x)0f(x) \geq 0 and f(x)dx=1\int_{-\infty}^{\infty} f(x)\,dx = 1. Probabilities are areas under the curve: P(aXb)=abf(x)dxP(a \leq X \leq b) = \int_a^b f(x)\,dx. The PDF height is not a probability — it is a density (probability per unit length).

For a continuous distribution, P(X=c)=ccf(x)dx=0P(X = c) = \int_c^c f(x)\,dx = 0 for any single point cc. This is why P(Xb)=P(X<b)P(X \leq b) = P(X < b) for continuous distributions: adding or removing a single endpoint changes nothing.

The cumulative distribution function (CDF) is F(x)=P(Xx)=xf(t)dtF(x) = P(X \leq x) = \int_{-\infty}^x f(t)\,dt. It is non-decreasing from 00 to 11, right-continuous, and gives the probability of being at or below xx.

The relationship between PDF and CDF: f(x)=F(x)f(x) = F'(x) (derivative of CDF is PDF) and F(x)=xf(t)dtF(x) = \int_{-\infty}^x f(t)\,dt (antiderivative of PDF is CDF).

Quantiles and percentiles: the pp-th quantile xpx_p satisfies F(xp)=pF(x_p) = p. For the standard normal, x0.975=1.96x_{0.975} = 1.96 — this is where 97.5%97.5\% of the distribution lies below.

Helpful?

The normal distribution

The normal (Gaussian) distribution N(μ,σ2)N(\mu, \sigma^2) has PDF f(x)=1σ2πexp ⁣((xμ)22σ2)f(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right). It is symmetric about μ\mu; the inflection points are at μ±σ\mu \pm \sigma.

The empirical (68-95-99.7) rule: P(μσXμ+σ)0.6827P(\mu-\sigma \leq X \leq \mu+\sigma) \approx 0.6827; P(μ2σXμ+2σ)0.9545P(\mu-2\sigma \leq X \leq \mu+2\sigma) \approx 0.9545; P(μ3σXμ+3σ)0.9973P(\mu-3\sigma \leq X \leq \mu+3\sigma) \approx 0.9973.

Standardisation: any XN(μ,σ2)X\sim N(\mu,\sigma^2) is transformed to Z=(Xμ)/σN(0,1)Z = (X-\mu)/\sigma \sim N(0,1). Standard normal probabilities are read from zz-tables or computed with software.

The normal distribution is additive: if XN(μ1,σ12)X\sim N(\mu_1,\sigma_1^2) and YN(μ2,σ22)Y\sim N(\mu_2,\sigma_2^2) are independent, then X+YN(μ1+μ2,σ12+σ22)X+Y\sim N(\mu_1+\mu_2,\sigma_1^2+\sigma_2^2).

Why normal? By the Central Limit Theorem, the sum (or average) of many independent, identically distributed random variables is approximately normal regardless of the underlying distribution. This explains the normal's ubiquity in nature and statistics.

Helpful?

The exponential and uniform distributions

The exponential distribution Exp(λ)\text{Exp}(\lambda): f(x)=λeλxf(x) = \lambda e^{-\lambda x} for x0x \geq 0. Mean =1/λ= 1/\lambda, variance =1/λ2= 1/\lambda^2. It models the waiting time between events in a Poisson process at rate λ\lambda.

Memoryless property: P(X>s+tX>s)=P(X>t)P(X > s+t \mid X > s) = P(X > t). The past waiting time provides no information about the future wait. This is the continuous analogue of the geometric distribution's memorylessness.

The uniform distribution U(a,b)U(a,b): f(x)=1/(ba)f(x) = 1/(b-a) for axba \leq x \leq b. Mean =(a+b)/2= (a+b)/2, variance =(ba)2/12= (b-a)^2/12. Every sub-interval of equal length is equally probable.

The chi-squared distribution χk2\chi^2_k is the sum of kk squared independent standard normals. Used in goodness-of-fit tests and inference about variances. Mean =k= k, variance =2k= 2k.

The tt-distribution with ν\nu degrees of freedom is the ratio Z/V/νZ/\sqrt{V/\nu} where ZN(0,1)Z\sim N(0,1) and Vχν2V\sim\chi^2_\nu independently. It is bell-shaped but heavier-tailed than the normal, converging to N(0,1)N(0,1) as ν\nu\to\infty.

Helpful?

Normal approximation to the binomial

When nn is large and pp is not extreme, XBinomial(n,p)X\sim\text{Binomial}(n,p) can be approximated by N(np,np(1p))N(np,\, np(1-p)).

Validity condition: np10np \geq 10 and n(1p)10n(1-p) \geq 10. When pp is close to 00 or 11, use the Poisson approximation instead.

Continuity correction: because the binomial is discrete, P(X=k)P(X = k) is approximated by P(k0.5Xk+0.5)P(k-0.5 \leq X \leq k+0.5) using the normal. For P(Xk)P(X \leq k), use P(Z(k+0.5np)/np(1p))P(Z \leq (k+0.5-np)/\sqrt{np(1-p)}).

Helpful?