Continuous Probability Distributions — Free Statistics Lesson

Continuous random variables can take any value in an interval. Their distributions are described by probability density functions (PDFs): probabilities are areas under the curve, never the height at a point.

The normal distribution is the cornerstone of statistics. Its bell-curve shape emerges naturally when many small independent effects are added, and it underpins virtually all classical inference methods.

The exponential and uniform distributions complete the toolkit for modeling waiting times, lifetimes, and situations where all outcomes are equally likely.

PDFs and CDFs

A probability density function (PDF) $f(x)$ satisfies $f(x) \geq 0$ and $\int_{-\infty}^{\infty} f(x)\,dx = 1$ . Probabilities are areas under the curve: $P(a \leq X \leq b) = \int_a^b f(x)\,dx$ . The PDF height is not a probability — it is a density (probability per unit length).

For a continuous distribution, $P(X = c) = \int_c^c f(x)\,dx = 0$ for any single point $c$ . This is why $P(X \leq b) = P(X < b)$ for continuous distributions: adding or removing a single endpoint changes nothing.

The cumulative distribution function (CDF) is $F(x) = P(X \leq x) = \int_{-\infty}^x f(t)\,dt$ . It is non-decreasing from $0$ to $1$ , right-continuous, and gives the probability of being at or below $x$ .

The relationship between PDF and CDF: $f(x) = F'(x)$ (derivative of CDF is PDF) and $F(x) = \int_{-\infty}^x f(t)\,dt$ (antiderivative of PDF is CDF).

Quantiles and percentiles: the $p$ -th quantile $x_p$ satisfies $F(x_p) = p$ . For the standard normal, $x_{0.975} = 1.96$ — this is where $97.5\%$ of the distribution lies below.

💡Explain it simply

A PDF is like a topographic map of probability. The height at any point shows how densely packed the probability is there. But the actual probability of a region is the volume (area) under the curve over that region — not the height at a single point.

The CDF is the running total: $F(x)$ tells you the total probability that has accumulated by the point $x$ . It starts at $0$ (nothing accumulated yet) and ends at $1$ (all probability accounted for).

Helpful?

The normal distribution

The normal (Gaussian) distribution $N(\mu, \sigma^2)$ has PDF $f(x) = \frac{1}{\sigma\sqrt{2\pi}}\exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$ . It is symmetric about $\mu$ ; the inflection points are at $\mu \pm \sigma$ .

The empirical (68-95-99.7) rule: $P(\mu-\sigma \leq X \leq \mu+\sigma) \approx 0.6827$ ; $P(\mu-2\sigma \leq X \leq \mu+2\sigma) \approx 0.9545$ ; $P(\mu-3\sigma \leq X \leq \mu+3\sigma) \approx 0.9973$ .

Standardisation: any $X\sim N(\mu,\sigma^2)$ is transformed to $Z = (X-\mu)/\sigma \sim N(0,1)$ . Standard normal probabilities are read from $z$ -tables or computed with software.

The normal distribution is additive: if $X\sim N(\mu_1,\sigma_1^2)$ and $Y\sim N(\mu_2,\sigma_2^2)$ are independent, then $X+Y\sim N(\mu_1+\mu_2,\sigma_1^2+\sigma_2^2)$ .

Why normal? By the Central Limit Theorem, the sum (or average) of many independent, identically distributed random variables is approximately normal regardless of the underlying distribution. This explains the normal's ubiquity in nature and statistics.

💡Explain it simply

The bell curve is the shape that emerges whenever many small, independent random factors add together. Heights, measurement errors, test scores — they all pile up near the average and thin out toward the extremes. That's the normal distribution, and it appears everywhere precisely because of this 'sum of many small effects' mechanism.

Normal probability using z-scores

Exam scores: $X\sim N(70, 10^2)$ . Find $P(60 \leq X \leq 85)$ .
Standardise: $z_1 = (60-70)/10 = -1$ , $z_2 = (85-70)/10 = 1.5$ .
$P(-1 \leq Z \leq 1.5) = \Phi(1.5) - \Phi(-1) = 0.9332 - 0.1587 = 0.7745$ .
About $77.5\%$ of students score between $60$ and $85$ .

Helpful?

The exponential and uniform distributions

The exponential distribution $\text{Exp}(\lambda)$ : $f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$ . Mean $= 1/\lambda$ , variance $= 1/\lambda^2$ . It models the waiting time between events in a Poisson process at rate $\lambda$ .

Memoryless property: $P(X > s+t \mid X > s) = P(X > t)$ . The past waiting time provides no information about the future wait. This is the continuous analogue of the geometric distribution's memorylessness.

The uniform distribution $U(a,b)$ : $f(x) = 1/(b-a)$ for $a \leq x \leq b$ . Mean $= (a+b)/2$ , variance $= (b-a)^2/12$ . Every sub-interval of equal length is equally probable.

The chi-squared distribution $\chi^2_k$ is the sum of $k$ squared independent standard normals. Used in goodness-of-fit tests and inference about variances. Mean $= k$ , variance $= 2k$ .

The $t$ -distribution with $\nu$ degrees of freedom is the ratio $Z/\sqrt{V/\nu}$ where $Z\sim N(0,1)$ and $V\sim\chi^2_\nu$ independently. It is bell-shaped but heavier-tailed than the normal, converging to $N(0,1)$ as $\nu\to\infty$ .

💡Explain it simply

Exponential: you're waiting for a bus that comes randomly at an average rate of once every $10$ minutes. The exponential distribution models how long you wait. The memoryless property says: having already waited $5$ minutes tells you nothing — the next bus is still just as far away as when you started.

Uniform: you pick a random number between $0$ and $1$ . Every tiny interval of the same width has the same probability. Completely flat.

Exponential waiting time

Calls arrive at rate $\lambda = 2$ per minute. What is $P(\text{wait} > 1\text{ min})$ ?
$P(X > 1) = 1 - F(1) = 1 - (1 - e^{-2}) = e^{-2} \approx 0.135$ .
About $13.5\%$ chance of waiting more than $1$ minute.
Mean wait: $1/\lambda = 0.5$ min.

Helpful?

Normal approximation to the binomial

When $n$ is large and $p$ is not extreme, $X\sim\text{Binomial}(n,p)$ can be approximated by $N(np,\, np(1-p))$ .

Validity condition: $np \geq 10$ and $n(1-p) \geq 10$ . When $p$ is close to $0$ or $1$ , use the Poisson approximation instead.

Continuity correction: because the binomial is discrete, $P(X = k)$ is approximated by $P(k-0.5 \leq X \leq k+0.5)$ using the normal. For $P(X \leq k)$ , use $P(Z \leq (k+0.5-np)/\sqrt{np(1-p)})$ .

💡Explain it simply

For large $n$ , a binomial histogram starts to look like a smooth bell curve. The normal approximation exploits this: instead of summing many binomial terms, you compute one area under the normal curve. The continuity correction patches up the gap between the 'staircase' (binomial) and the 'smooth curve' (normal).

Helpful?

⚠️

Common Mistakes to Avoid

Computing $P(X = c)$ for a continuous random variable. Point probabilities are always zero. Always specify an interval.
Reading the PDF height as a probability. $f(x)$ can exceed $1$ ; it is a density, not a probability.
Applying the normal approximation when $np < 10$ or $n(1-p) < 10$ . Use exact binomial or Poisson instead.
Forgetting the $z$ -table gives left-tail probabilities. For $P(X > x)$ , use $1 - \Phi(z)$ . For $P(a < X < b)$ , use $\Phi(z_b) - \Phi(z_a)$ .
Confusing the rate $\lambda$ with the mean in the exponential distribution. Mean $= 1/\lambda$ , not $\lambda$ .