A discrete probability distribution assigns probabilities to a countable set of outcomes. The binomial and Poisson distributions model the most common real-world counting processes.
The foundation is the random variable: a function that assigns a number to each outcome. Expected value and variance summarize its center and spread.
Mastering discrete distributions is the first step toward understanding statistical inference — all confidence intervals and hypothesis tests are built on distributional assumptions.
Random variables and expected value
A random variable is a function from the sample space to the real numbers. Discrete random variables take a countable set of values (integers, fractions). Continuous random variables take any value in an interval.
The probability mass function (PMF) of a discrete gives for each possible value. The PMF must satisfy and .
The expected value (population mean) is . It is the long-run average over infinitely many independent repetitions. It does not need to be a possible value of — a fair die has .
The variance of is . The standard deviation is in the same units as .
Key properties: , . For independent variables: always, and only when and are independent.
💡Explain it simply
The expected value is what you would get on average if you played the game an enormous number of times. Roll a fair die a million times and the average will be very close to , even though is never an actual outcome. It's the 'gravity center' of the distribution.
Variance measures how unpredictable the outcomes are. Low variance means the outcomes cluster tightly around the mean. High variance means you get wildly different outcomes each time.
Expected value and variance of a PMF
- A game pays $10 with probability , $5 with probability , and $2 with probability .
- .
- .
- . .
The binomial distribution
The binomial distribution models the number of successes in exactly independent trials, each with the same success probability . It arises in coin flips, quality control, clinical trials, and polling.
Requirements (BINS): Binary outcomes only, Independent trials, Number of trials is fixed in advance, Same probability for each trial.
PMF: , for . The term counts the number of ways to arrange exactly successes among trials.
Mean: . Standard deviation: . When the distribution is symmetric; when it is skewed toward (if ) or toward (if ).
Cumulative probabilities are computed by summing the PMF or using tables/software. For , use the complement: .
💡Explain it simply
The binomial distribution counts 'how many times did I succeed out of tries?' Each try is a coin flip — it either works or it doesn't, with the same probability each time. The binomial formula accounts for all the different orders in which those successes could have occurred.
Binomial probability — exactly successes
- A fair coin is flipped times. Find .
- , , .
- .
- Also: , .
The Poisson distribution
The Poisson distribution models the count of independent events in a fixed interval of time, area, or volume, when events occur at a constant average rate .
PMF: , for . The Poisson distribution has no upper bound — in principle any count is possible.
Both the mean and variance equal : . A Poisson random variable is characterised entirely by .
Typical applications: calls arriving at a call centre per hour, photons hitting a detector per second, mutations in a DNA sequence per million base pairs, accidents at an intersection per month.
Poisson as a limit of the binomial: when is large, is small, and is moderate, the binomial is well-approximated by Poisson(). This is the rare-event approximation.
💡Explain it simply
The Poisson distribution answers: 'how many times will a rare event happen in a fixed window?' If buses arrive randomly at a rate of per hour, the Poisson() distribution gives you the probability of seeing buses in the next hour.
Poisson probability
- Emails arrive at rate per hour. Find — exactly emails in one hour.
- .
- Also: . There is only a chance of zero emails.
The geometric and negative binomial distributions
The geometric distribution models the number of trials until the first success. . Mean . The geometric distribution is memoryless: .
The negative binomial distribution generalises: it counts the number of trials needed to achieve exactly successes. Mean , variance .
The hypergeometric distribution models sampling without replacement from a finite population. Used when the population is small enough that each draw changes the probabilities — for example, selecting cards from a deck without replacing them.
💡Explain it simply
The geometric distribution answers: 'how many times do I have to try until I succeed?' If each attempt has a success rate, the geometric distribution tells you the probabilities of succeeding on the first try, second try, third try, and so on.
Choosing the right distribution
Binomial: fixed trials, each independent with probability , counting the number of successes.
Poisson: counting rare, independent events in a fixed window; no upper bound on the count; mean and variance are equal.
Geometric: counting the number of trials until the first success; memoryless.
Hypergeometric: like the binomial but sampling without replacement from a finite population; used when the population is small relative to the sample.
Decision guide: if is large, is small, and is moderate, the Poisson approximation to the binomial is accurate and simpler to compute.
💡Explain it simply
Binomial = 'I flip this coin exactly times, how many heads?' Poisson = 'Calls arrive randomly, how many in the next hour?' Geometric = 'I keep flipping until I get a head, how many flips?' Each distribution is the right tool for a specific type of counting question.
Common Mistakes to Avoid
- Applying the binomial formula when trials are not independent or the sample is drawn without replacement from a small population (use hypergeometric instead).
- Forgetting that Poisson requires events to be independent and the rate to be constant — a traffic jam violates both.
- Confusing (Poisson rate) with (binomial success probability). They measure different things even though both parameterise how often something happens.
- Computing directly when it is easier to use .
- Using the binomial formula for continuous data. The binomial is for counts; for continuous data use normal or other continuous distributions.