← Back to modules

Probability

Sample spaces, probability rules, conditional probability, and Bayes' theorem.

Probability quantifies uncertainty. It assigns a number between 00 and 11 to each possible outcome, where 00 means impossible and 11 means certain.

Probability theory gives us the language to reason about randomness, and it is the foundation that all of statistical inference is built upon.

We start with basic rules, then build up to conditional probability, independence, and Bayes' theorem — tools used in everything from medical testing to spam filtering.

Basic probability rules

The sample space SS is the set of all possible outcomes. An event AA is any subset of SS. The probability of AA must satisfy 0P(A)10 \leq P(A) \leq 1, P(S)=1P(S) = 1, and for mutually exclusive events P(A1A2)=P(A1)+P(A2)+P(A_1 \cup A_2 \cup \cdots) = P(A_1)+P(A_2)+\cdots.

For equally likely outcomes: P(A)=A/SP(A) = |A|/|S| — the number of outcomes in AA divided by the total number of outcomes. This classical definition only applies when all outcomes are equally probable.

The complement rule: P(Ac)=1P(A)P(A^c) = 1 - P(A). Often it is easier to compute the probability of the complement and subtract. For example, the probability of at least one success is 1P(zero successes)1 - P(\text{zero successes}).

The addition rule: P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B). The intersection is subtracted because it is counted twice. For mutually exclusive events AB=A \cap B = \emptyset, so P(AB)=P(A)+P(B)P(A \cup B) = P(A) + P(B).

The multiplication rule for independent events: P(AB)=P(A)P(B)P(A \cap B) = P(A) \cdot P(B). For dependent events the correct rule is P(AB)=P(A)P(BA)P(A \cap B) = P(A) \cdot P(B|A).

Helpful?

Conditional probability and independence

The conditional probability of AA given BB is P(AB)=P(AB)/P(B)P(A|B) = P(A \cap B)/P(B). It re-scales probability to the reduced sample space where BB is known to have occurred.

Events AA and BB are independent if P(AB)=P(A)P(A|B) = P(A) — knowing BB gives no information about AA. Equivalently, P(AB)=P(A)P(B)P(A \cap B) = P(A)\cdot P(B).

Dependence arises when knowing the outcome of one event changes the probability of another. Drawing two cards without replacement: the second draw depends on the first because the deck has changed.

Independence is a statement about probability, not causation. Two unrelated events (coin flip, stock price) can be modelled as independent even if there is no physical connection. Two related events might nevertheless be statistically independent.

For multiple independent events: P(A1A2An)=P(A1)P(A2)P(An)P(A_1 \cap A_2 \cap \cdots \cap A_n) = P(A_1)\cdot P(A_2)\cdots P(A_n). This is the basis for computing probabilities of sequences of independent trials.

Helpful?

Bayes' theorem

Bayes' theorem reverses conditional probabilities: P(BA)=P(AB)P(B)P(A)P(B|A) = \frac{P(A|B)\cdot P(B)}{P(A)}. The denominator P(A)P(A) is computed via the law of total probability.

In Bayesian terminology: P(B)P(B) is the prior (your belief before seeing data), P(AB)P(A|B) is the likelihood (how probable the data is if BB is true), and P(BA)P(B|A) is the posterior (updated belief after seeing data).

Medical testing: even a highly accurate test has a low positive predictive value when the disease is rare. If prevalence is 1%1\% and the false positive rate is 5%5\%, only about 17%17\% of positive tests are true positives. This counter-intuitive result follows directly from Bayes' theorem.

Bayes' theorem is the foundation of spam filters, medical diagnosis systems, and Bayesian machine learning. Any time you want to update a probability based on new evidence, you are implicitly applying Bayes.

Helpful?

Counting principles

The multiplication principle: if a procedure consists of kk steps, step ii has nin_i choices, and the choices are independent, then the total number of outcomes is n1×n2××nkn_1\times n_2\times\cdots\times n_k.

Permutations count ordered arrangements of kk objects chosen from nn: P(n,k)=n!/(nk)!P(n,k) = n!/(n-k)!. The order of selection matters — ABCABC is different from BACBAC.

Combinations count unordered selections: (nk)=n!/(k!(nk)!)\binom{n}{k} = n!/(k!(n-k)!). The order does not matter — {A,B,C}\{A,B,C\} is the same as {C,A,B}\{C,A,B\}.

The key question: does order matter? Arranging people in seats \to permutations. Choosing a committee \to combinations.

The binomial coefficient (nk)\binom{n}{k} appears in the binomial theorem: (a+b)n=k=0n(nk)akbnk(a+b)^n = \sum_{k=0}^n \binom{n}{k} a^k b^{n-k}, and directly in the binomial probability formula P(X=k)=(nk)pk(1p)nkP(X=k)=\binom{n}{k}p^k(1-p)^{n-k}.

Helpful?

Law of total probability

If events B1,B2,,BnB_1,B_2,\ldots,B_n partition SS (mutually exclusive and exhaustive), then for any event AA: P(A)=i=1nP(ABi)P(Bi)P(A) = \sum_{i=1}^n P(A|B_i)P(B_i).

This is the denominator in Bayes' theorem. Computing P(A)P(A) by conditioning on an exhaustive set of cases makes many calculations tractable.

Example: a factory has three machines producing 50%50\%, 30%30\%, and 20%20\% of output, with defect rates 2%2\%, 3%3\%, and 5%5\%. The overall defect rate is P(defect)=0.02(0.50)+0.03(0.30)+0.05(0.20)=0.01+0.009+0.01=0.029=2.9%P(\text{defect}) = 0.02(0.50)+0.03(0.30)+0.05(0.20) = 0.01+0.009+0.01 = 0.029 = 2.9\%.

Helpful?