← Back to modules

Discrete Probability Distributions

Random variables, PMFs, expected value, binomial, Poisson, and geometric distributions.

A discrete probability distribution assigns probabilities to a countable set of outcomes. The binomial and Poisson distributions model the most common real-world counting processes.

The foundation is the random variable: a function that assigns a number to each outcome. Expected value and variance summarize its center and spread.

Mastering discrete distributions is the first step toward understanding statistical inference — all confidence intervals and hypothesis tests are built on distributional assumptions.

Random variables and expected value

A random variable XX is a function from the sample space to the real numbers. Discrete random variables take a countable set of values (integers, fractions). Continuous random variables take any value in an interval.

The probability mass function (PMF) of a discrete XX gives P(X=xi)P(X=x_i) for each possible value. The PMF must satisfy P(X=xi)0P(X=x_i)\geq 0 and iP(X=xi)=1\sum_i P(X=x_i) = 1.

The expected value (population mean) is μ=E(X)=ixiP(X=xi)\mu = E(X) = \sum_i x_i P(X=x_i). It is the long-run average over infinitely many independent repetitions. It does not need to be a possible value of XX — a fair die has E(X)=3.5E(X)=3.5.

The variance of XX is σ2=Var(X)=E[(Xμ)2]=E(X2)[E(X)]2\sigma^2 = \text{Var}(X) = E[(X-\mu)^2] = E(X^2)-[E(X)]^2. The standard deviation σ=Var(X)\sigma = \sqrt{\text{Var}(X)} is in the same units as XX.

Key properties: E(aX+b)=aE(X)+bE(aX+b)=aE(X)+b, Var(aX+b)=a2Var(X)\text{Var}(aX+b) = a^2\text{Var}(X). For independent variables: E(X+Y)=E(X)+E(Y)E(X+Y)=E(X)+E(Y) always, and Var(X+Y)=Var(X)+Var(Y)\text{Var}(X+Y)=\text{Var}(X)+\text{Var}(Y) only when XX and YY are independent.

Helpful?

The binomial distribution

The binomial distribution models the number of successes XX in exactly nn independent trials, each with the same success probability pp. It arises in coin flips, quality control, clinical trials, and polling.

Requirements (BINS): Binary outcomes only, Independent trials, Number of trials nn is fixed in advance, Same probability pp for each trial.

PMF: P(X=k)=(nk)pk(1p)nkP(X=k) = \binom{n}{k}p^k(1-p)^{n-k}, for k=0,1,,nk=0,1,\ldots,n. The (nk)\binom{n}{k} term counts the number of ways to arrange exactly kk successes among nn trials.

Mean: μ=np\mu = np. Standard deviation: σ=np(1p)\sigma = \sqrt{np(1-p)}. When p=0.5p=0.5 the distribution is symmetric; when p0.5p\neq 0.5 it is skewed toward 00 (if p<0.5p<0.5) or toward nn (if p>0.5p>0.5).

Cumulative probabilities P(Xk)P(X\leq k) are computed by summing the PMF or using tables/software. For P(Xk)P(X\geq k), use the complement: 1P(Xk1)1-P(X\leq k-1).

Helpful?

The Poisson distribution

The Poisson distribution models the count of independent events in a fixed interval of time, area, or volume, when events occur at a constant average rate λ>0\lambda > 0.

PMF: P(X=k)=eλλk/k!P(X=k) = e^{-\lambda}\lambda^k/k!, for k=0,1,2,k=0,1,2,\ldots. The Poisson distribution has no upper bound — in principle any count is possible.

Both the mean and variance equal λ\lambda: E(X)=Var(X)=λE(X)=\text{Var}(X)=\lambda. A Poisson random variable is characterised entirely by λ\lambda.

Typical applications: calls arriving at a call centre per hour, photons hitting a detector per second, mutations in a DNA sequence per million base pairs, accidents at an intersection per month.

Poisson as a limit of the binomial: when nn is large, pp is small, and np=λnp=\lambda is moderate, the binomial B(n,p)B(n,p) is well-approximated by Poisson(λ\lambda). This is the rare-event approximation.

Helpful?

The geometric and negative binomial distributions

The geometric distribution models the number of trials until the first success. P(X=k)=(1p)k1pP(X=k) = (1-p)^{k-1}p. Mean =1/p= 1/p. The geometric distribution is memoryless: P(X>m+nX>m)=P(X>n)P(X>m+n|X>m) = P(X>n).

The negative binomial distribution generalises: it counts the number of trials needed to achieve exactly rr successes. Mean =r/p= r/p, variance =r(1p)/p2= r(1-p)/p^2.

The hypergeometric distribution models sampling without replacement from a finite population. Used when the population is small enough that each draw changes the probabilities — for example, selecting 55 cards from a deck without replacing them.

Helpful?

Choosing the right distribution

Binomial: fixed nn trials, each independent with probability pp, counting the number of successes.

Poisson: counting rare, independent events in a fixed window; no upper bound on the count; mean and variance are equal.

Geometric: counting the number of trials until the first success; memoryless.

Hypergeometric: like the binomial but sampling without replacement from a finite population; used when the population is small relative to the sample.

Decision guide: if nn is large, pp is small, and npnp is moderate, the Poisson approximation to the binomial is accurate and simpler to compute.

Helpful?