← Back to modules

Non-Parametric Statistics

Chi-square tests, Mann-Whitney U, and Wilcoxon signed-rank tests.

Non-parametric tests make no assumption about the shape of the population distribution. They are the tool of choice when data is ordinal, heavily skewed, or when normality cannot be assumed.

The chi-square tests handle categorical data: the goodness-of-fit test checks whether observed frequencies match a theoretical distribution; the test for independence checks whether two categorical variables are associated.

Rank-based tests (Mann-Whitney U, Wilcoxon signed-rank) are non-parametric alternatives to tt-tests. By converting data to ranks, they become robust to outliers and distributional assumptions.

Chi-square goodness-of-fit test

The goodness-of-fit test checks whether observed category frequencies match a hypothesised distribution. For kk categories with observed counts OiO_i and expected counts Ei=npiE_i = n\cdot p_i (where pip_i are the hypothesised probabilities), the test statistic is χ2=i=1k(OiEi)2Ei\chi^2 = \sum_{i=1}^k \frac{(O_i-E_i)^2}{E_i}.

Under H0H_0, χ2χk12\chi^2 \sim \chi^2_{k-1} (degrees of freedom =k1= k-1; subtract one more for each parameter estimated from the data, e.g., if μ\mu and σ\sigma are estimated for a normal fit).

The test is always right-tailed: a large χ2\chi^2 means the observed counts deviate substantially from expected. A small χ2\chi^2 (not significant) means the data is consistent with H0H_0.

Validity condition: all expected counts Ei5E_i \geq 5. If some are too small, merge adjacent categories or use the exact multinomial test.

Applications: testing whether a die is fair, whether genetic ratios follow Mendelian predictions, or whether a sample comes from a specified distribution.

Helpful?

Chi-square test for independence

A two-way contingency table displays the joint frequencies of two categorical variables. The test of independence asks H0H_0: the two variables are independent (knowing one gives no information about the other).

Under independence, the expected count in cell (i,j)(i,j) is Eij=(rowi total)×(colj total)/nE_{ij} = (\text{row}_i \text{ total}) \times (\text{col}_j \text{ total})/n. The test statistic is χ2=i,j(OijEij)2/Eij\chi^2 = \sum_{i,j} (O_{ij}-E_{ij})^2/E_{ij} with (r1)(c1)(r-1)(c-1) degrees of freedom.

The chi-square test detects association but not direction or causation. For a 2×22\times 2 table, the odds ratio or relative risk quantifies the strength of association beyond just significance.

Fisher's exact test: for small samples where χ2\chi^2 is unreliable (expected counts <5< 5), compute the exact probability of the observed table and all more extreme ones. It is exact regardless of sample size.

The χ2\chi^2 test for independence is equivalent to the zz-test for comparing two proportions in a 2×22\times 2 table: z2=χ2z^2 = \chi^2.

Helpful?

Rank-based tests: Mann-Whitney and Wilcoxon

The Mann-Whitney U test (Wilcoxon rank-sum test) is the non-parametric alternative to the two-sample tt-test. It tests whether one group tends to produce larger values than the other — formally, whether P(X1>X2)=0.5P(X_1 > X_2) = 0.5.

Procedure: combine both groups, rank all observations, sum the ranks for each group. The test statistic U=R1n1(n1+1)/2U = R_1 - n_1(n_1+1)/2 where R1R_1 is the rank sum for group 11.

The Wilcoxon signed-rank test is the paired version. Compute differences di=xiyid_i = x_i - y_i, rank their absolute values, and compare the sum of positive ranks to the sum of negative ranks. It is the non-parametric alternative to the paired tt-test.

Rank-based tests are resistant to outliers because they use only the ordering of observations, not their actual values. A single extreme outlier changes only one rank.

Efficiency: when the normal distribution holds, the Mann-Whitney U test has about 95.5%95.5\% efficiency relative to the tt-test (i.e., it needs about 5%5\% more observations to achieve the same power). For non-normal data, it can be more powerful than the tt-test.

Helpful?

Kruskal-Wallis and Spearman correlation

The Kruskal-Wallis test is the non-parametric equivalent of one-way ANOVA. It tests whether kk independent groups have the same distribution (equivalently, the same median). It ranks all NN observations and compares rank sums across groups.

Test statistic: H=12N(N+1)j=1kRj2nj3(N+1)H = \frac{12}{N(N+1)}\sum_{j=1}^k \frac{R_j^2}{n_j} - 3(N+1), where RjR_j is the rank sum for group jj. Under H0H_0, Hχk12H \approx \chi^2_{k-1}.

If Kruskal-Wallis is significant, use pairwise Mann-Whitney tests with Bonferroni correction for post-hoc comparisons.

Spearman's rank correlation rsr_s: compute ranks of xx and yy separately, then apply the Pearson correlation formula to the ranks. It measures monotone (not just linear) relationships and is robust to outliers and non-normality.

Helpful?

When to use non-parametric tests

Use non-parametric tests when: the data is ordinal (e.g., satisfaction ratings on a 1155 scale); the sample size is small and normality cannot be assumed; there are severe outliers that resist transformation; or the outcome is inherently rank-based.

With large samples and continuous data, parametric tests are usually robust by the CLT. Non-parametric tests are most valuable for small nn with non-normal data.

Power trade-off: when parametric assumptions hold, non-parametric tests have somewhat lower power (need larger nn to detect the same effect). When assumptions are violated, non-parametric tests are often more powerful.

Parametric vs. non-parametric summary: tt-test \leftrightarrow Mann-Whitney U; paired tt-test \leftrightarrow Wilcoxon signed-rank; one-way ANOVA \leftrightarrow Kruskal-Wallis; Pearson rr \leftrightarrow Spearman rsr_s.

Helpful?