← Back to modules

Symmetric Matrices and Quadratic Forms

Spectral theorem, quadratic forms, SVD, and PCA.

Symmetric matrices are the most well-behaved matrices in linear algebra. The Spectral Theorem guarantees that every real symmetric matrix is orthogonally diagonalisable — its eigenvectors always form an orthonormal basis, and all its eigenvalues are real.

Quadratic forms — expressions like Q(x)=xTAxQ(\mathbf{x}) = \mathbf{x}^TA\mathbf{x} — arise in optimisation, statistics (covariance), and physics (kinetic energy). Classifying them as positive definite, indefinite, etc. determines the shape of level sets and the nature of critical points.

The Singular Value Decomposition (SVD) extends diagonalisation to any matrix, not just square or symmetric ones. It underlies the most important algorithms in modern data science, including PCA, compressed sensing, and recommender systems.

The Spectral Theorem

Spectral Theorem: every real symmetric matrix AA (A=ATA=A^T) is orthogonally diagonalisable. That is, A=QΛQTA=Q\Lambda Q^T where QQ is orthogonal (Q1=QTQ^{-1}=Q^T) and Λ=diag(λ1,,λn)\Lambda=\text{diag}(\lambda_1,\ldots,\lambda_n). The columns of QQ are orthonormal eigenvectors of AA.

Three crucial consequences: (1) all eigenvalues of a real symmetric matrix are real; (2) eigenvectors corresponding to distinct eigenvalues are automatically orthogonal; (3) even with repeated eigenvalues, a full orthonormal eigenvector basis always exists.

These facts hold for every real symmetric matrix without exception — they do not hold for general matrices, which may have complex eigenvalues, non-orthogonal eigenvectors, or insufficient independent eigenvectors.

Note that A=QΛQTA=Q\Lambda Q^T means Aqk=λkqkA\mathbf{q}_k=\lambda_k\mathbf{q}_k for each column qk\mathbf{q}_k of QQ. The decomposition is also written as A=k=1nλkqkqkTA=\sum_{k=1}^n \lambda_k \mathbf{q}_k\mathbf{q}_k^T — a sum of rank-one symmetric matrices scaled by eigenvalues.

Helpful?

Quadratic forms

A quadratic form is Q(x)=xTAxQ(\mathbf{x})=\mathbf{x}^TA\mathbf{x} where AA is n×nn\times n symmetric. In two variables: Q(x1,x2)=ax12+2bx1x2+cx22Q(x_1,x_2)=ax_1^2+2bx_1x_2+cx_2^2 corresponding to A=(abbc)A=\begin{pmatrix}a&b\\b&c\end{pmatrix}.

Classification by eigenvalues: (1) positive definite: all λk>0\lambda_k>0, so Q(x)>0Q(\mathbf{x})>0 for all x0\mathbf{x}\neq\mathbf{0}; (2) positive semidefinite: all λk0\lambda_k\geq 0; (3) negative definite: all λk<0\lambda_k<0; (4) negative semidefinite: all λk0\lambda_k\leq 0; (5) indefinite: mixed signs.

The principal axes theorem: by changing to eigenvector coordinates (x=Qy\mathbf{x}=Q\mathbf{y}), the quadratic form simplifies to Q=λ1y12++λnyn2Q=\lambda_1 y_1^2+\cdots+\lambda_n y_n^2 — a pure sum of squares with no cross terms. The eigenvalues determine the shape; the eigenvectors give the principal axes.

In optimisation: at a critical point x0\mathbf{x}_0 of a smooth function ff, the Hessian HH (which is symmetric) determines the nature of the critical point. Positive definite HH means local minimum; negative definite means local maximum; indefinite means saddle point.

Positive definite test without computing eigenvalues: use Sylvester's criterion — AA is positive definite iff all leading principal minors (determinants of upper-left k×kk\times k submatrices) are positive.

Helpful?

Singular Value Decomposition

Every m×nm\times n matrix AA (any shape, any rank) has an SVD: A=UΣVTA=U\Sigma V^T, where UU is m×mm\times m orthogonal, VV is n×nn\times n orthogonal, and Σ\Sigma is m×nm\times n with nonneg diagonal entries σ1σ2σr>0\sigma_1\geq\sigma_2\geq\cdots\geq\sigma_r>0 (the singular values), where r=rank(A)r=\text{rank}(A).

Computing the SVD: the singular values are σi=λi(ATA)\sigma_i=\sqrt{\lambda_i(A^TA)}; the columns of VV are orthonormal eigenvectors of ATAA^TA; the columns of UU are orthonormal eigenvectors of AATAA^T.

The four fundamental subspaces from the SVD: Col(A)=span{u1,,ur}\text{Col}(A)=\text{span}\{\mathbf{u}_1,\ldots,\mathbf{u}_r\}, Null(A)=span{vr+1,,vn}\text{Null}(A)=\text{span}\{\mathbf{v}_{r+1},\ldots,\mathbf{v}_n\}, etc.

Best rank-kk approximation: truncate to Ak=i=1kσiuiviTA_k=\sum_{i=1}^k\sigma_i\mathbf{u}_i\mathbf{v}_i^T. By the Eckart-Young theorem, this is the closest rank-kk matrix to AA in any unitarily invariant norm. Used for image compression, latent semantic analysis, and noise reduction.

The condition number κ(A)=σ1/σr\kappa(A)=\sigma_1/\sigma_r measures numerical stability. A large condition number means small perturbations in b\mathbf{b} can cause large changes in the solution Ax=bA\mathbf{x}=\mathbf{b}.

Helpful?

Principal Component Analysis

PCA finds the directions of maximum variance in a dataset. Given a centered data matrix XX (mm observations, nn features, mean subtracted), the sample covariance matrix is C=1m1XTXC=\frac{1}{m-1}X^TX, which is symmetric and positive semidefinite.

The eigenvectors of CC (the principal components) are the directions of maximum variance. Eigenvalue λk\lambda_k equals the variance explained by the kk-th component. Projecting the data onto the top kk eigenvectors gives the best kk-dimensional linear approximation.

Connection to SVD: if X=UΣVTX=U\Sigma V^T, the principal components are the columns of VV, and the explained variances are σi2/(m1)\sigma_i^2/(m-1). The SVD gives a numerically stable way to perform PCA.

Dimensionality reduction: retain only the top kk principal components (those with the largest eigenvalues). Choose kk to capture a target percentage (e.g., 95%95\%) of total variance: i=1kλi/i=1nλi0.95\sum_{i=1}^k\lambda_i / \sum_{i=1}^n\lambda_i \geq 0.95.

Helpful?