FastRP
The core goal of FastRP is to transform a graph’s high-dimensional structural information into a lower-dimensional space, while preserving the relative similarity between nodes. Functionally, it converts a graph into compact, dense vectors where each dimension captures some aspect of the node’s position or role in the graph. But all of that honestly sounds a bit like a bunch of nothing. So how does it work?
At its heart, FastRP — which stands for Fast Random Projection — uses something called Johnson Lindenstrauss Lemma to accomplish its goal. But what exactly is that?
Imagine you wanted to compare 1,000 people who filled out surveys on how much they liked 100 different media figures, resulting in each person having 100 dimensions of media figure preference. It would be really time consuming to compare each of these surveys to each other to see who had the most similar taste.
In fact, if you wanted to measure how similar these people are to each other, you’d need to compute distances between all 1,000 points in this 100D space — which gets computationally expensive.
Instead, we could represent the results as a matrix with 1,000 "rows" (one for each person), and 100 columns (one for each media figure). If you multiply this matrix by a randomly generated matrix with 100 rows and, say, 10 columns, you’ll end up with a new matrix that compresses each person’s data into just 10 numbers.
Even though this transformation uses random numbers, it’s the same random projection matrix applied to everyone — so the relationships between people are preserved. In fact, the relative distances between people stay mostly the same, even though we’ve reduced the number of dimensions. This is because the random fluctuations that you get when multiplying by a random number will tend to average out to zero when you multiply enough of them across all the dimensions.
In our example below, we are going to be working with really small numbers. Our graph will look like this:
A → B
A → C
B → C
One Hot to Random Projection
Remember, our goal here is to transform our graph structure into a low dimensional space. But how can we represent our graph as vectors to begin with?
We could use something called one-hot vectors. Since we only have three nodes, we can use a vector of length three to represent them. A node will be represented as 1 if it matches its "spot" in a vector:
Node | One-hot |
---|---|
A | [1, 0, 0] |
B | [0, 1, 0] |
C | [0, 0, 1] |
This, of course, can be represented as a matrix:
Notice how there is a ton of zeros in our matrix? That means it is sparse, and thus computationally inefficient. We will use a random projection matrix, which is just a matrix with random numbers in it, to reduce our dimensions down.
We have three nodes here, so we will need a matrix with three "rows". And we want to reduce it down to two dimensions, so we will use a matrix with two "columns":
We are going to multiply these two matrixes together. But I am going to assume you are not familiar with how this works (since this is a guide).
We multiply each row in H by the entire matrix R. But since each row of H is a one-hot vector, the result is just the corresponding row of R.
So let's start by looking at Node A which corresponds to the first row in our matrix. Basically, we are going to multiply the top row in H by the top row in R (and add together all the values):
When we do that for each row, we get something like this:
Notice how this output is the same as our random projection matrix? That’s because each row of our one-hot matrix simply “selects” the corresponding row in the random projection. Since only one element in each row of H is 1 (and the rest are 0), the matrix multiplication just returns the matching row from R.
For this reason, FastRP doesn’t actually construct one-hot matrices or perform this multiplication in practice — it just directly assigns a random dense vector to each node.
However, this is exactly how the algorithm works conceptually, and it’s important to understand the logic before moving to more complex steps like neighbor aggregation.
Step 2: 1 Hop Neighborhood Aggregation
Remember our graph:
A → B
A → C
B → C
We can represent these connections using something called an adjacency matrix, which shows which nodes are connected to which. Before looking at the matrix form, here’s a table representation:
A | B | C | |
---|---|---|---|
A | 0 | 1 | 1 |
B | 0 | 0 | 1 |
C | 0 | 0 | 0 |
Each row is a source node, and each column is a target node. If there’s a 1 in row A, column B, that means there’s an edge from A to B. Since we’re dealing with directed edges, the reverse (B to A) would only be true if explicitly stated.
We can store the same information in a matrix.
This matrix tells us how nodes are connected. We’ll use it to aggregate neighbor embeddings — by multiplying it with the initial embeddings from Step 1.
Let's start with Node A since it has the most outbound relationships. I am really going to write everything out to make this as clear as possible, even if its not the correct notation.
In the first step, we multiply each column in [0,1,1] by its corresponding row in our matrix. That gives us the three vectors that we will then need to add together to get our final embedding. Notice how this reduces three numbers down to two?
Next let's do Node B:
Because Node C has no outbound relationships, we will get all zeros:
Step 3: Reprojection and Weighted Sum
Let’s pause and zoom out for a moment. We’re about to multiply the matrix from Step 2 by a random projection matrix — but what exactly does that mean, and how do we know what size the projection matrix should be?
Our current matrix (from Step 2) has 2 columns, meaning each node is represented by a 2-dimensional vector. If we want to keep the output in the same 2D space, we need a 2 × 2 random projection matrix — like this:
When multiplying matrices, the number of columns in the first matrix must match the number of rows in the second.
In our case, the matrix from Step 2 has a shape of n x 2, meaning we have one 2-dimensional vector per node (e.g., 3 nodes × 2 features). To apply a projection, the matrix we multiply it by must therefore have a shape of 2 x k, where k is the desired output dimension.
If we want to keep the output in the same 2D space, we set k = 2, giving us a 2x2 projection matrix.
This projection transforms each node’s vector by mixing the values within it through a linear combination. If we use a 2 x 2 projection matrix, the dimensionality remains unchanged, but the output vectors reflect a new view of the graph’s structure. This process allows us to inject randomness, reduce correlation between features, and retain efficiency, all while keeping the embedding space compact and meaningful.
Once we multiply these matrices together we will get something like this:
This represents the embeddings for the one hop neighborhood for each node. We can then combine this with our original set of embeddings, using a weighted sum. We will assign our first set of embeddings, which represents the original position of the nodes, a weight of 1, and our second set of embeddings, which represents the direct neighbors, a weight of 0.8.
That will give us the following:
Node A:
Node B:
Node C:
These are our refined embeddings that now encode not just each node’s identity, but also its immediate neighbors.
What Else?
We could continue down this path by following our same steps but instead of looking at one-hop neighbors, we could look at two-hop neighbors, but at this point it will be largely the same practice. In fact, how many neighbors away is actually a parameter that you can tune.
Let's get back to our original mission: to take something as complex as a graph and reduce it down to less dimensions. Through random projections, we were able to assign a numeric value to each nodes position, and to further encode its neighbors positions as well.