Non-parametric tests make no assumption about the shape of the population distribution. They are the tool of choice when data is ordinal, heavily skewed, or when normality cannot be assumed.
The chi-square tests handle categorical data: the goodness-of-fit test checks whether observed frequencies match a theoretical distribution; the test for independence checks whether two categorical variables are associated.
Rank-based tests (Mann-Whitney U, Wilcoxon signed-rank) are non-parametric alternatives to -tests. By converting data to ranks, they become robust to outliers and distributional assumptions.
Chi-square goodness-of-fit test
The goodness-of-fit test checks whether observed category frequencies match a hypothesised distribution. For categories with observed counts and expected counts (where are the hypothesised probabilities), the test statistic is .
Under , (degrees of freedom ; subtract one more for each parameter estimated from the data, e.g., if and are estimated for a normal fit).
The test is always right-tailed: a large means the observed counts deviate substantially from expected. A small (not significant) means the data is consistent with .
Validity condition: all expected counts . If some are too small, merge adjacent categories or use the exact multinomial test.
Applications: testing whether a die is fair, whether genetic ratios follow Mendelian predictions, or whether a sample comes from a specified distribution.
💡Explain it simply
Roll a die times. If it's fair, you expect of each face. The statistic sums up how far off each actual count is from , scaled by . A large sum means the die is probably loaded; a small sum means the counts are consistent with fairness.
Goodness-of-fit for a fair die
- Roll a die times. Observed counts: . : fair die ( for all ).
- Expected: for each face.
- . .
- . Since , fail to reject . The data is consistent with a fair die.
Chi-square test for independence
A two-way contingency table displays the joint frequencies of two categorical variables. The test of independence asks : the two variables are independent (knowing one gives no information about the other).
Under independence, the expected count in cell is . The test statistic is with degrees of freedom.
The chi-square test detects association but not direction or causation. For a table, the odds ratio or relative risk quantifies the strength of association beyond just significance.
Fisher's exact test: for small samples where is unreliable (expected counts ), compute the exact probability of the observed table and all more extreme ones. It is exact regardless of sample size.
The test for independence is equivalent to the -test for comparing two proportions in a table: .
💡Explain it simply
You survey men and women about their coffee preference (tea or coffee). A contingency table records the counts. The test asks: does gender have anything to do with the drink preference? If men and women prefer coffee at the same rate, the variables are independent and will be small.
Chi-square test for independence
- Contingency table (Gender vs. Preference): Coffee: Men , Women ; Tea: Men , Women . .
- Row totals: Coffee , Tea . Column totals: Men , Women .
- Expected counts: ; ; ; .
- .
- . . Since , reject . Gender and drink preference are associated.
Rank-based tests: Mann-Whitney and Wilcoxon
The Mann-Whitney U test (Wilcoxon rank-sum test) is the non-parametric alternative to the two-sample -test. It tests whether one group tends to produce larger values than the other — formally, whether .
Procedure: combine both groups, rank all observations, sum the ranks for each group. The test statistic where is the rank sum for group .
The Wilcoxon signed-rank test is the paired version. Compute differences , rank their absolute values, and compare the sum of positive ranks to the sum of negative ranks. It is the non-parametric alternative to the paired -test.
Rank-based tests are resistant to outliers because they use only the ordering of observations, not their actual values. A single extreme outlier changes only one rank.
Efficiency: when the normal distribution holds, the Mann-Whitney U test has about efficiency relative to the -test (i.e., it needs about more observations to achieve the same power). For non-normal data, it can be more powerful than the -test.
💡Explain it simply
Instead of comparing actual test scores between two teaching methods, rank all students from (lowest score) to (highest score). Then ask: do students from Method A tend to have higher ranks? By using ranks instead of raw values, you don't need to assume anything about the distribution of scores.
Kruskal-Wallis and Spearman correlation
The Kruskal-Wallis test is the non-parametric equivalent of one-way ANOVA. It tests whether independent groups have the same distribution (equivalently, the same median). It ranks all observations and compares rank sums across groups.
Test statistic: , where is the rank sum for group . Under , .
If Kruskal-Wallis is significant, use pairwise Mann-Whitney tests with Bonferroni correction for post-hoc comparisons.
Spearman's rank correlation : compute ranks of and separately, then apply the Pearson correlation formula to the ranks. It measures monotone (not just linear) relationships and is robust to outliers and non-normality.
💡Explain it simply
Kruskal-Wallis is ANOVA with ranks: instead of comparing the actual group means, it compares the average rank positions of each group. Spearman correlation is Pearson correlation with ranks: instead of asking 'do and increase together linearly?', it asks 'do the rankings of and increase together?'
When to use non-parametric tests
Use non-parametric tests when: the data is ordinal (e.g., satisfaction ratings on a – scale); the sample size is small and normality cannot be assumed; there are severe outliers that resist transformation; or the outcome is inherently rank-based.
With large samples and continuous data, parametric tests are usually robust by the CLT. Non-parametric tests are most valuable for small with non-normal data.
Power trade-off: when parametric assumptions hold, non-parametric tests have somewhat lower power (need larger to detect the same effect). When assumptions are violated, non-parametric tests are often more powerful.
Parametric vs. non-parametric summary: -test Mann-Whitney U; paired -test Wilcoxon signed-rank; one-way ANOVA Kruskal-Wallis; Pearson Spearman .
💡Explain it simply
Non-parametric tests are your fall-back when you can't trust the bell-curve assumption. They're less picky — they work with ranks and counts rather than raw values — but they pay a small price in power when the normal assumption actually holds.
Common Mistakes to Avoid
- Using chi-square when expected cell counts are below . Merge adjacent categories or use Fisher's exact test.
- Confusing the chi-square test for independence with the Pearson correlation. Chi-square tests association between categorical variables; correlation measures linear association between quantitative ones.
- Thinking the Mann-Whitney U test the same null hypothesis as the -test. It tests stochastic dominance (which group tends to be larger), not equality of means.
- Over-using non-parametric tests. When normality holds and is large, parametric tests are more powerful. Reserve non-parametric tests for situations where parametric assumptions clearly fail.
- Applying the Kruskal-Wallis test without post-hoc comparisons after a significant result. Like ANOVA, a significant only says the groups differ — you need pairwise tests to say which ones.