Probability quantifies uncertainty. It assigns a number between and to each possible outcome, where means impossible and means certain.
Probability theory gives us the language to reason about randomness, and it is the foundation that all of statistical inference is built upon.
We start with basic rules, then build up to conditional probability, independence, and Bayes' theorem — tools used in everything from medical testing to spam filtering.
Basic probability rules
The sample space is the set of all possible outcomes. An event is any subset of . The probability of must satisfy , , and for mutually exclusive events .
For equally likely outcomes: — the number of outcomes in divided by the total number of outcomes. This classical definition only applies when all outcomes are equally probable.
The complement rule: . Often it is easier to compute the probability of the complement and subtract. For example, the probability of at least one success is .
The addition rule: . The intersection is subtracted because it is counted twice. For mutually exclusive events , so .
The multiplication rule for independent events: . For dependent events the correct rule is .
💡Explain it simply
Probability is like assigning percentages to outcomes that must all add up to . The complement rule says: if there is a chance of rain, there is a chance of no rain — they must sum to .
The addition rule corrects for double-counting. If of people like cats and like dogs and like both, the fraction liking cats or dogs is .
Applying the addition rule
- A card is drawn from a standard deck. Find .
- , , .
- .
Conditional probability and independence
The conditional probability of given is . It re-scales probability to the reduced sample space where is known to have occurred.
Events and are independent if — knowing gives no information about . Equivalently, .
Dependence arises when knowing the outcome of one event changes the probability of another. Drawing two cards without replacement: the second draw depends on the first because the deck has changed.
Independence is a statement about probability, not causation. Two unrelated events (coin flip, stock price) can be modelled as independent even if there is no physical connection. Two related events might nevertheless be statistically independent.
For multiple independent events: . This is the basis for computing probabilities of sequences of independent trials.
💡Explain it simply
Conditional probability is 'narrowing the world.' means: among all days that are cloudy, what fraction also gets rain? You have narrowed your focus to cloudy days only.
Independence means one event tells you nothing about the other. Coin flips are independent — the fifth flip is no more likely to be heads because the last four were tails. The coin has no memory.
Bayes' theorem
Bayes' theorem reverses conditional probabilities: . The denominator is computed via the law of total probability.
In Bayesian terminology: is the prior (your belief before seeing data), is the likelihood (how probable the data is if is true), and is the posterior (updated belief after seeing data).
Medical testing: even a highly accurate test has a low positive predictive value when the disease is rare. If prevalence is and the false positive rate is , only about of positive tests are true positives. This counter-intuitive result follows directly from Bayes' theorem.
Bayes' theorem is the foundation of spam filters, medical diagnosis systems, and Bayesian machine learning. Any time you want to update a probability based on new evidence, you are implicitly applying Bayes.
💡Explain it simply
Bayes' theorem asks: 'Given that I've seen this evidence, how should I update my beliefs?' A test says you have a rare disease — but most people who test positive on a rare-disease test are actually healthy (because the disease is so rare). Bayes tells you exactly how to account for that.
Medical test — positive predictive value
- Disease prevalence: . Sensitivity (true positive rate): . False positive rate: .
- Total probability of a positive test: .
- .
- Only of positive tests are true positives — because the disease is so rare that false positives swamp true positives.
Counting principles
The multiplication principle: if a procedure consists of steps, step has choices, and the choices are independent, then the total number of outcomes is .
Permutations count ordered arrangements of objects chosen from : . The order of selection matters — is different from .
Combinations count unordered selections: . The order does not matter — is the same as .
The key question: does order matter? Arranging people in seats permutations. Choosing a committee combinations.
The binomial coefficient appears in the binomial theorem: , and directly in the binomial probability formula .
💡Explain it simply
Permutations: how many ways can runners finish first, second, and third? Order matters — Silver and Bronze are different. Combinations: how many ways can you pick friends from for a group photo? Order doesn't matter — is the same group regardless of who steps in front.
Law of total probability
If events partition (mutually exclusive and exhaustive), then for any event : .
This is the denominator in Bayes' theorem. Computing by conditioning on an exhaustive set of cases makes many calculations tractable.
Example: a factory has three machines producing , , and of output, with defect rates , , and . The overall defect rate is .
💡Explain it simply
The law of total probability says: to find the overall probability of , break the world into exhaustive cases (), find the probability of in each case, then weight them by how likely each case is. Like computing an average grade by weighting each section's score by its proportion of the class.
Common Mistakes to Avoid
- Forgetting to subtract the overlap in when events are not mutually exclusive.
- Confusing mutually exclusive with independent. Two mutually exclusive events with positive probability are never independent — if one occurs, the other cannot.
- Swapping and . These are generally different. Bayes' theorem is precisely the tool to convert between them.
- Applying without checking independence. For dependent events, use .
- Using permutations when combinations are needed. Always ask: does swapping the order of selection produce a different outcome?