Eigendecomposition explained – Data Science Math

eigendecomposition_feature_image

The general concept is a bit similar to the idea of decomposing an integer into its prime factors, where we learn some interesting properties that are inherent to the number or in this case matrix.

Instead of into prime factors, we decompose the matrix into its eigenvectors and eigenvalues. Let me give you the “official” definition of these first:

Eigenvalue and Eigenvector

Definition: An eigenvector of a square matrix $A$ of size $n \times n$ is a nonzero vector of length $n$ if $Av = \lambda v $ for some scalar $\lambda $. This scalar $\lambda $ is then called the eigenvalue of $A$ corresponding to the eigenvector $v$.

A common way of presenting an eigenvector is in its unit form (which means the norm is $||v||_2 = 1 $). This is because if $v$ is an eigenvector of $A$, then so is $sv$ for $ s \in \mathbb{R}\setminus \{0\} $, as you can easily verify by inserting it into the above formula.

The decomposition

Now suppose a matrix $ A $ has $n$ linearly independent eigenvectors $v_1, v_2, …, v_n$ with corresponding eigenvalues $\lambda_1, …, \lambda_n$. Form a matrix $V$ consisting of columns $v_1, …, v_n$, then the eigendecomposition of $A$ is given by:

A = V \Lambda V^{-1} =  V diag([\lambda_1, ..., \lambda_n]) V^{-1} \\
diag([\lambda_1, ..., \lambda_n]) = \begin{bmatrix}
   \lambda_{1} & 0 &  0& 0\\ 
  0 & \lambda_{2} & 0 &0 \\ 
   0& 0 &  \ddots & 0 \\ 
  0 &  0&  0 & \lambda_{n} 
 \end{bmatrix}

Existence & Uniqueness

Not every matrix fulfills the above requirement and that means not every matrix can be decomposed in this way. This is where the more general Singular Value Decomposition comes in, see Wikipedia for an overview on that (I plan to cover that topic soon).

However, every real symmetric matrix can be decomposed into such an expression and on top of that, the eigenvalues and -vectors are guaranteed to be real-valued as well. Even then, the decomposition is not unique unless every eigenvector has its own unique eigenvalue. If 2 or more vectors share the same eigenvalue, they create a span (all linear combinations possible with these vectors) in which every vector in turn would be an eigenvector too, so we have free choice here. In this real-valued case, we write

A = Q \Lambda Q^T

And $Q$ is orthogonal, which means $Q^T = Q^{-1}$.

Furthermore, it’s customary to sort the eigenvalues in descending order in the diagonal matrix $\Lambda$.

Applications & Usefulness

The eigenvalues themselves can give us different interesting insights, however the specific way of writing $A$ in this decomposition can also be helpful.

One example I found was that it is computationally useful for computing powers of $A$. Many machine learning algorithms consist of applying the same linear transformation (mostly another word for matrix) again and again in each iteration.

Now imagine we need to calculate $A^8$:

\begin{align}
A^8 &= AAAAAAAA \\
       &= A^2A^2A^2A^2 \\
       &= A^4 A^4 \\
       &= A^8
\end{align}

This means we need to compute one matrix product in every line (the rest per line are of course duplicates and we can just copy the results for those). In general for $A^n$ this means we need to do $\log_2(n)$ multiplications.

With the decomposition however, this becomes:

\begin{align}
A^8 &= Q\Lambda Q^T Q\Lambda Q^T Q\Lambda Q^T Q\Lambda Q^T Q\Lambda Q^T Q\Lambda Q^T Q\Lambda Q^T Q\Lambda Q^T \\
    &= Q \Lambda^8 Q^T
\end{align}

Because, as mentioned above, $Q^T = Q^{-1}$ since $Q$ is an orthogonal matrix. Now you might think “Okay, but we still need to compute $\Lambda^8$, so what did we gain?”

Well, computing the powers of diagonal matrices like $\Lambda$ is waaay easier and faster than for other matrices. We only need to take every entry on the diagonal to the specific power and that’s it. No fancy matrix multiplications needed. So this way of representing $A^n$ means we only need 2 matrix multiplications no matter how big the exponent is going to be. Of course, we do need to determine the eigenvalues and eigenvectors first, but for large exponents this will be worth it as the decomposition is a fixed cost again that will not increase with higher exponents – mainly because it has nothing to do with what we do with the decomposition afterwards.

Further concepts

As I mentioned above, a more general version of this concept of decomposition is the Singular Value Decomposition.

And to calculate the eigenvectors and -values, you should take a look at the determinant of a matrix and the characteristic polynomial.

Leave a Reply