On the many faces of matrix multiplication
It’s no exaggeration to say that our world runs on linear algebra. In fact, much of machine learning, at least the parts used most often, is essentially just a series of matrix multiplications.
If you hold these assertions to be true, then understanding matrix multiplication is essential for understanding ML, data analysis, and the world (okay, that last one might be a bit of a stretch). And yet, schools teach matrix multiplication in the worst way imaginable. It is introduced without any context or reference to its usefulness. Furthermore, the algorithm that is taught to perform matrix multiplication seems very arbitrary and unmotivated. I remember feeling absolutely bewildered by it and, being a lazy bastard, felt no inclination to remember it. I was plesantly surprised when I realised that matrix multiplication is, in fact, very intuitive; which is great because its ubiquitous.
First, we have to think of matrices in terms of rows and columns. This may sound obvious to many but it wasn’t to me for a very long time. In the world of intuitive matrix multiplication, the numbers inside the parent matrices—lets call them, A and B—have no meaning except as part of a row or a column.
Second, the matrix multiplication, A×B, is only defined if A has dimensions, m×n, and B has dimensions, n×l. In other words, the number of columns in A must equal the number of rows in B. In this case, the product matrix will have dimensions, m×l. m and l can take any value; they determinse the dimension of the product matrix but have no bearing on the feasibility of the matrix multiplication.
With that in mind, there are 4 ways to think of matrix multiplication:
- Row with column: Rows of A with columns of B
- Column with row: Columns of A with rows of B
- Column with column: Columns of A with columns of B
- Row with row: Rows of A with rows of B
Of course, all these interpretations of matrix multiplication are mathematically equivalent. The value in knowing them is that you can choose the perspective that best fits a situation, making that particular mathematical operation clearer and more intuitive.
All the videos in this post were created using the Manim library, and the code can be found here.
Row-Column multiplication
In this view, the product matrix is filled element by element. The element at position (i, j) is the dot product (or inner product) of the i-th row of matrix A and the j-th column of matrix B. The inner product is a foundational concept in linear algebra: it is large when the two vectors point in roughly the same direction in space, and small or negative when they point in different directions. The inner product between two vectors can also be viewed as a measure of their covariance. This is why the row–column perspective of matrix multiplication connects naturally to computing a covariance matrix, which is obtained by multiplying a matrix by its transpose: A × Aᵀ.
Column-Row multiplication
In this view, the product matrix is seen as the sum of several component matrices. Each component matrix is formed by taking the outer product of the i-th column of matrix A with the i-th row of matrix B. Only columns and rows with the same index are paired in this way.
This column–row perspective makes the idea behind Principal Component Analysis (PCA) much easier to grasp. In PCA, the goal is to reconstruct the original data as the product of two matrices, A and B, where the number of columns in A is far smaller than the original dimensionality of the data. So, if you decide to keep 2 components, you are effectively approximating your data as the sum of two rank-1 matrices.
Column-Column multiplication
In this view, the product matrix is populated column by column. Each column of the product matrix is formed by taking a weighted sum of the columns of matrix A. For the i-th column of the product, take each column j of B, multiply it by the element B(i, j), and then add up all these scaled columns. This view is useful when your original data is in matrix A. In that case, matrix B represents an operation applied to A that scales and combines its columns.
Row-Row multiplication
In this view, the product matrix is populated row by row. Each row of the product matrix is formed by taking a weighted sum of the rows of matrix B. For the i-th row of the product, take each row j of B, multiply it by the element A(i, j), and then add up all these scaled rows. This view is useful when your original data is in matrix B. In that case, matrix A represents an operation applied to B that scales and combines its rows.