Orthogonality — the generalisation of perpendicularity to any number of dimensions — is one of the most powerful structures in linear algebra. Orthogonal vectors carry independent information; orthonormal bases make coordinate computations trivial.
The central application is the least-squares problem: when a system has no solution (more equations than unknowns), find the that minimises . The answer is the projection of onto the column space of .
Gram-Schmidt orthogonalisation and the QR decomposition are the computational engines. They appear in numerical analysis, statistics (regression), signal processing, and machine learning.
Inner product, length, and orthogonality
In , the inner product (dot product) is . The length is .
Two vectors are orthogonal if . A unit vector has . An orthonormal set is orthogonal and every vector has unit length.
The orthogonal complement of a subspace is . Key relationship: and .
Every can be uniquely split as where and . This is the orthogonal decomposition theorem.
The Pythagorean theorem in : if , then . Orthogonality and right angles are the same idea in any dimension.
💡Explain it simply
Orthogonal vectors are like perpendicular compass directions — north and east carry completely independent information. Orthonormal vectors add the requirement that each direction has length exactly . Working in an orthonormal basis is like having a perfectly square grid — every coordinate reads off cleanly with a single dot product.
Orthogonal sets and projections
An orthogonal set of nonzero vectors is automatically linearly independent — each vector points in a direction not reachable by combining the others.
The orthogonal projection of onto a subspace is the unique closest point in to . The error is perpendicular to : .
If is an orthonormal basis for , the projection formula simplifies to . Each coordinate is just a dot product — no system of equations to solve.
The projection matrix onto (when has columns forming a basis for ) is . Note: (idempotent) and (symmetric).
💡Explain it simply
Projecting onto is like finding your shadow on a surface when the sun is directly overhead. The shadow is the point on the surface closest to you. The vector from shadow to you (the error) is perpendicular to the surface.
Projection onto a plane in
- Project onto with , (the -plane).
- .
- Error: . Check orthogonality to : ✓, ✓.
The Gram-Schmidt process
Gram-Schmidt converts any linearly independent set into an orthonormal basis for the same subspace.
Procedure: for each new vector , subtract its projections onto all previously constructed , then normalise. Formally: , then .
The QR decomposition: if , Gram-Schmidt produces where has orthonormal columns and is upper triangular. QR is the backbone of modern numerical eigenvalue algorithms.
Why subtract projections? Each new vector must be orthogonal to all previous . The projection is the component of that lies in the direction of . Subtracting it removes all overlap.
💡Explain it simply
Gram-Schmidt builds a clean coordinate system one axis at a time. The first axis is just the first vector, normalised. The second axis is the second vector with its shadow onto the first axis subtracted — that makes it perpendicular. The third strips out shadows on both previous axes. Each step removes overlap until you have perfectly perpendicular, unit-length axes.
Gram-Schmidt in
- Orthogonalise , .
- Step 1: . Normalise: , .
- Step 2: .
- .
- Normalise: , .
- Check: ✓.
Least-squares problems
When is inconsistent (no exact solution), the least-squares solution minimises the residual . It is not an exact solution — it is the best approximation.
Geometric insight: the closest point in to is . The residual must be orthogonal to , giving .
Normal equations: . If has linearly independent columns (), then is invertible and .
Linear regression: fitting to data points is a least-squares problem with and . The least-squares solution gives the best-fit line.
The pseudo-inverse: (when columns are independent) satisfies and . The pseudo-inverse generalises the matrix inverse to non-square systems.
💡Explain it simply
Least squares asks: if I can't hit the target exactly, where should I aim to get as close as possible? The answer is the point in the column space of closest to . The normal equations are the algebraic condition that says 'the error vector points away from the column space' — i.e., is perpendicular to it.
Common Mistakes to Avoid
- Confusing orthogonal with orthonormal. Orthogonal means perpendicular (). Orthonormal adds the unit-length requirement. The simplified projection formula only works for orthonormal bases.
- Forgetting to subtract all previous projections in Gram-Schmidt. Each new vector must be made orthogonal to every previously constructed vector, not just the previous one.
- Writing the normal equations as or similar. The correct form is .
- Trying to apply when does not have full column rank. is invertible iff the columns of are linearly independent.
- Thinking the least-squares solution satisfies . It does not — it only satisfies the normal equations.