An intro to vector, matrices, and what they mean in AI
Learn linear algebra visually, and AI concepts will all fall into place.
As AI gets more and more powerful, people are naturally getting scared that it’ll “replace your job”.
According to me, if you have the capacity to learn and the curiosity to explore new things constantly, nothing can stop you.
This is the first blog of a two-part series about the math of ML.
Today, we explore vectors and matrices: mathematical objects that are powering AI systems today.
If you like learning AI concepts through easy-to-understand diagrams, I’ve created a free resource that organises all my work in one place — feel free to check it out!
What actually is a vector?
You might’ve learnt about the most basic vector definition in your physics class.
Vector = Direction + Magnitude
That’s the geometric view.
But there’s another way to think about vectors that’s way more powerful for AI.
In the 1600s, René Descartes had a brilliant idea: what if we could represent geometry using numbers and algebra?
This became the birthplace of linear algebra.
Let’s say we have a point in 2D space.
In Descartes’ analytical view, this point is simply 2 numbers representing exactly where on the grid those two points lie.
We can represent that point as a vector v from the origin to the point.
We can represent points in 3D too, but this time using three numbers.
But here’s where it gets interesting.
Since we’re representing vectors in this analytical form, we can pretty much go up to any number of dimensions we want just by adding more numbers to the vector.
Essentially, a vector is just a group of numbers.
Since our world is in 3D, we can’t visualise higher dimensional vectors, but we can still use the same mathematical properties to analyse it.
The reason this is so crucial is because vectors in AI systems can go up to thousands of dimensions.
For example, in an LLM like ChatGPT, a hidden state may contain 4096 dimensions. It doesn’t matter how many dimensions it has, the same rules apply.
How do you add two vectors?
There are two ways to think about this.
The physics interpretation:
Take a step in direction 1, then take a step in direction 2. The net effect is the sum of the two vectors.
The analytical interpretation:
Just add the numbers element-wise.
The beautiful thing is that both interpretations give you exactly the same result.
Linear transformations
In LLMs, vectors are transformed from one form to another so that input is eventually turned to out.
But what are these “transformations” that are applied to change vectors from one form to another?
Think of it as taking a vector and changing its direction, magnitude, or both in a systematic way.
Both of these transformations can be achieved by multiplying a “matrix” by the vector.
For instance, to rotate a vector by 90 degrees, you might multiply it by some matrix Mrot.
Which matrix achieves this 90-degree rotation? It turns out the rotation matrix can be seen here:
Here’s how that matrix-vector multiplication looks like:
For a skewing transformation, that would look like this:
Again, the skewing matrix looks like this:
How did we figure out these matrices?
Consider î and ĵ as two special vectors:
For a 90-degree rotation:
î becomes [0, 1] (now points up)
ĵ becomes [-1, 0] (now points left)
The first column of the rotation matrix is the new position of î, and the second column is the new position of ĵ.
In fact, you can prove this analytically.
Regardless of the columns c1 and c2 of the matrix Mrot, we see that Mrot î = î_new and the same is said for ĵ.
Extending to higher dimensions
We can visualise transformations in 2D and 3D, but not in larger dimensions.
But with the power of math, we can extend to as many dimensions we want.
This is especially relevant in LLMs.
Words are represented as large-dimensional vectors, and they go through a series of transformations through the LLM before it answers your question.
Summary
In this blog, we’ve covered the fundamental mathematical building blocks that power every AI system you’ve ever used.
We started with vectors, explored vector addition, and dove into matrix transformations to understand what’s going on inside transformer architectures, how to manipulate these numbers and how AI researchers visualise them.
What’s next?
Later this week, I’ll talk more about how you can compose different transformations, do matrix multiplications, understand eigenvectors and eigenvalues, and we’ll finally explore how that links to the computations happening inside modern-day LLMs.
Honestly, understanding these mathematical foundations is what separates people who can truly work with AI from those who just use it as a black box.
If you like learning AI concepts through easy-to-understand diagrams, I’ve created a free resource that organises all my work in one place — feel free to check it out!
Acknowledgements
All diagrams were made by me on Canva.