pull down to refresh

The Math Behind SN's Web of Trust

Introduction

SN's trust algorithm is based on TrustRank, which is itself a modified PageRank algorithm. The purpose of TrustRank is to calculate overall trust scores for a vector of nodes (i.e. users).
The TrustRank algorithm requires two objects as inputs: the trust graph M, and the seed vector \mathbf{v_0}. It returns a vector of trust scores as output, \mathbf{v}.

The Trust Graph

The trust graph, M, is a matrix in which each element m_{ij} describes how much node i trusts node j. The rows of M add up to 1, so it's as if each node i has one unit of trust to distribute among the other nodes.

The Seed Vector

The seed vector, \mathbf{v_0}, describes a baseline level of initial trust for each node. Typically, the seed trust for most nodes is set to zero, but a handful of known, trustworthy nodes are seeded with a trust of 1.
The seed vector influences the final trust scores by pulling the scores closer to itself. The strength of the seed vector's influence is determined by a parameter, \alpha, called the seed weight.

The TrustRank Formula

Given the trust graph, M, the seed vector \mathbf{v_0}, and the seed weight \alpha, the final trust scores are given by the vector \mathbf{v} that solves:
\mathbf{v} = (1-\alpha) M^T \mathbf{v} + \alpha \mathbf{v}_0
In SN's case, the nodes represent users, the trust graph represents the users' degree of similarity in preferences, and the seed vector represents a-priori trusted users--mainly @k00b, @ek, and territory founders (when calculating trust inside a territory.)

Diffusion Analogy and Example

The TrustRank algorithm is best understood as a diffusion process. Imagine the nodes to be three countries, and the trust scores to be the stable distribution of population in the three countries.
The algorithm describes the following population diffusion process:
  • In each period, a fraction \alpha of the entire global population simply teleports to each country at rates specified by \mathbf{v_0}.
  • The other fraction, 1-\alpha, of the global population migrates directionally according to rates specified by M. m_{ij} is the rate at which population in country i migrates to country j.
Let \mathbf{v_t} be the population distribution at time t. Based on this diffusion process, the population distribution at time t+1 is:
\mathbf{v_{t+1}} = (1-\alpha) M^T \mathbf{v_t} + \alpha \mathbf{v_0}
Repeated iteration of this diffusion process results in a stable distribution of population across countries. This stable distribution \mathbf{v} satisfies:
\mathbf{v} = (1-\alpha) M^T \mathbf{v} + \alpha \mathbf{v}_0
This is the solution to the TrustRank algorithm. Thus, one can think of the TrustRank algorithm as giving us the stable distribution of trusts when trust is allowed to repeatedly diffuse across nodes according to the trust graph and seed vector.

Example

Let the initial population of countries be \mathbf{v_1} = (0.3, 0.3, 0.4) and let the seed vector be \mathbf{v_0} = (1, 0, 0) (thus, when people teleport they always teleport to country 1). Let the seed weight be \alpha = 0.85.
Finally, let M be:
M = \left( 
\begin{array} ~
0 & 0.5 & 0.5 \\
0.5 & 0 & 0.5 \\
0 & 1 & 0
\end{array} 
\right)
This says that people in country 1, when getting a chance to migrate, will choose country 2 with 50% probability and country 3 with 50% probability. People in country 2 are 50/50 with countries 1 and 3. And people in country 3, when migrating, always move to country 2.
After the first iteration:
v_2 = 0.15 \left( 
\begin{array} ~
0 & 0.5 & 0.5 \\
0.5 & 0 & 0.5 \\
0 & 1 & 0
\end{array} 
\right)^T
\left[
\begin{array} ~
0.3 \\ 0.3 \\ 0.4
\end{array}
\right] + 
0.85 \left[
\begin{array} ~
1 \\ 0 \\ 0
\end{array}
\right]
= 
\left[
\begin{array} ~
0.8725 \\ 0.0825 \\ 0.045
\end{array}
\right]
After the second iteration:
v_3 = 0.15 \left( 
\begin{array} ~
0 & 0.5 & 0.5 \\
0.5 & 0 & 0.5 \\
0 & 1 & 0
\end{array} 
\right)^T
\left[
\begin{array} ~
0.8725 \\ 0.0825 \\ 0.045
\end{array}
\right] + 
0.85 \left[
\begin{array} ~
1 \\ 0 \\ 0
\end{array}
\right]
= 
\left[
\begin{array} ~
0.8562 \\ 0.0722 \\ 0.0716
\end{array}
\right]
If you kept going, or if you solved for the exact solution, you would get the stable population distribution:
v = 
\left[
\begin{array} ~
0.8556 \\ 0.0746 \\ 0.0698
\end{array}
\right]

What seed vector and seed weight does SN use?

SN does the trust calculations separately for each territory. For territories with at least 50 users who have ever zapped, downzapped, or posted, the seed vector is 1 for the territory founder and 0 for everyone else. For territories with less than 50 users who ever zapped, downzapped, or posted, the seed vector is 1 for @k00b, @ek, and the territory founder, and 0 for everyone else.
The seed weight is always \alpha = 0.83.

How does SN calculate its trust graph?

This is the most complicated part to explain. The trust graph is calculated on a per-territory basis. It is also calculated separately for posts and for comments.
To calculate the trust graph, first define the following metrics for each user i and item k:
  • z_{ik}: the total amount of sats that user i zapped item k with (negative if downzapping)
  • n_{ik}: the total number of times that user i zapped item k
  • T_{ik}: the timestamp when user i first zapped or downzapped item k
Now define the following second order metrics for users i and j:
before_{ij} = \sum_{k : T_{ik} > T_{jk}} 
1\left[ z_{ik} z_{jk} > 0 \right] \frac{
\min( |z_{ik}|, |z_{jk}| ) }{
\max( |z_{ik}|, |z_{jk}| ) }
after_{ij} = \sum_{k : T_{ik} < T_{jk}} 
1\left[ z_{ik} z_{jk} > 0 \right] \frac{
\min( |z_{ik}|, |z_{jk}| ) }{
\max( |z_{ik}|, |z_{jk}| ) }
disagree_{ij} = \sum_k 1\left[ z_{ik} z_{jk} < 0 \right]
total_{j} = \sum_k n_{jk}
  • before_{ij} is a measure of the correlation between i and j's zaps for items in which user j zapped before user i. It is used as measure for how much i trusts j.
  • after_{ij} is a measure of the correlation between i and j's zaps for items in which user j zapped after user i. It is removed from the denominator for trust calculations.
  • disagree_{ij} is a measure of how many items i and j had disagreements on (one up-zapped while the other down-zapped).
  • total_{j} is simply the total number of times user j zapped any items.
Roughly speaking, the idea is to measure the trust that user i has for user j as:
\frac{ before_{ij} - disagree_{ij} }{ total_{j} - after_{ij}} 
That is, out of all the times user i voted on items after user j, in what fraction did they agree vs. disagree?1
The actual calculation is a fair bit more complicated. Trust from i to j is actually calculated using the equation below, then normalized to sum up to 1 row-wise:
m_{ij} = H\left( before_{ij} - disagree_{ij}, ~ total_{j} - after_{ij}, ~ z \right)
Here, H(x,y,z) is the lower bound of the binomial proportion confidence interval, for an experiment with y trials and x successes, for a confidence level given by the Gaussian z-score z. The z score is chosen for a confidence level of 99.9999999%.
To explain why the lower bound of a binomial proportion confidence interval is used, we can think of each item in which user i zapped after user j as a trial: did user i disagree or agree with user j's zaps? The estimated probability that they agree is simply the number of agrees divided by the number of such items.
However, if the number of items is small, we can't be confident that user i and user j are actually similar in preferences. For example, if user i only zapped two items after user j, and both are agreements, then the number of trials is 2 and the number of successes is 2. The best estimate of the similarity between i and j is 100%, but because the sample size is so small, we can't actually be very confident of that estimate. The lower bound of the confidence interval would be much lower than 100%. Therefore we use that lower bound to calculate trust, in order to shade conservative in the trust scores.
The above presentation is greatly simplified. There are many technical details that are glossed over, like thresholds for which zaps and downzaps are included in the calculation, and nuances in the code for how before and after are calculated. Moreover, calculating the exact TrustRank solution by matrix inversion is computationally expensive, so an iterative approximate solution is used instead. For the actual code, one can take a look at worker/trust.js in the SN Github Repo.

How can I boost my trust score?

There is no straightforward way to boost your own trust score. This is because other users' trust of you is calculated based on their behavior after you already zapped an item. Thus, zapping items that other people already zapped doesn't directly boost your own trust score, it actually increases your trust in them.
The best way to increase your own trust score is to zap content early that you think other users will also zap in the future. Zapping content that you think @k00b, @ek, and territory founders will like is especially helpful, since these users are seeded with higher trust--though @k00b and @ek's influence only extends to the less active territories. In more active territories, only the territory founder is seeded with initial trust. So, if you want @Undisciplined to trust you, consider zapping libertarian opinions early in the ~econ territory!
Now that you have a deeper understanding of how SN's Web of Trust works, calibrate your behavior accordingly and build that trust! Happy stacking and happy zapping.
p.s. @k00b, @ek, let me know if I got anything wrong. I'll also follow up with more technical feedback in a GitHub issue.

Footnotes

  1. This is an over-simplification. In practice, it's not just the number of agreements that's counted, but the number of agreements weighted by the ratio of the zap amounts. The number of disagreements is also subtracted, which could result in a negative numerator. The denominator also cannot be strictly interpreted as the number of times user i voted after user j, both due to the weighting of the agreements by zap ratio, but also due to nuances in the code about how before/after is calculated.
I knew when I saw those big crooked "E" type symbols that it was time for me to look for the Web Of Trust For Dummies book.
reply
Hey, you got pretty far if you made it all the way to the \sum!
To put it more simply, think of the trust graph as a series of pipes between each user. Trust is a liquid that flows across these pipes. Some pipes have stronger flow than others, based on the similarity of the two users' zapping behavior. If you let the trust keep flowing across these pipes, it eventually settles on a stable amount of liquid in each person's tank. That's pretty much how it works.
reply
Hhahahaaa, saaame. Too long since i did math to even follow. Nooo thanks!
reply
Didn't you go to some hoity-toity university like Oxford or something?
reply
Yup, guilty.
reply
46 sats \ 1 reply \ @nout 18 Mar
crooked "E" means you just add up everything that's inside of it.
reply
Thank you. That's a nice, simple explanation
reply
Neat, so I got a trust boost within ~econ?
I talked to @k00b about allowing the components of v to be negative. Do you think that would help address anything? IIRC, I brought it up in the context of outlawing content from stackers with negative trust scores.
if you want @Undisciplined to trust you, consider zapping libertarian opinions early in the ~econ territory!
Good luck zapping anything in ~econ before me! Btw, I zap everything that's even remotely thoughtful. For things I find interesting, I zap more and for things I find interesting that took a decent amount of work, I zap a lot.
reply
Yep, you get a trust boost in ~econ, and your zaps are relatively more influential in building other peoples' trust scores within ~econ.
Based on my understanding of the current trust graph, I think trust scores can be negative, but this would be rare because I think downzaps are pretty rare. I don't know if anyone has a negative trust score, but I think it's theoretically possible.
Currently, I wouldn't recommend outlawing people with negative trust scores yet, because I think there are a few technical issues to be worked out first. (There's a little bit of jankiness with the current trust graph calculations--I think they're directionally correct but the scales are hard to interpret and there are a couple of issues that I brought up in the github)
reply
I know k00b told me that trust can't be negative. I'm also pretty sure he said that trust scores only ever increase.
Maybe the outlawing thing doesn't make sense, since that's supposed to be for people who post stuff that tends to get downzapped, rather than people who zap antagonistically to the rest of us.
reply
Ah, I think that's right. Trust can't get negative because of a post-processing step in which the final trust scores are normalized between 0 and 1. I think the raw, unnormalized trust scores could get negative though.
I don't know about trust scores only ever increasing though. I don't think I see that from the code.
If you're interested in some of these details, here's a link to the github discussion that I started about this.
reply
Cool, I'll check it out.
reply
Interesting discussion. Most of those points seem pretty easy to fix. I like the idea of grounding it all in a behavioral model.
The point about long histories is interesting to me. I think your suggestion about only including items that both stackers are probably aware of would help a lot. Otherwise, my trust with new users (and theirs with me) will be weird.
I just get the feeling that people are overthinking the trust score. Just keep doing what you are doing. Nothing has changed. At the end of the day, does the trust score really affect your life outside of SN?
reply
I agree. I hope this post doesn't make people overthink their trust score. This is just meant to make it more transparent for those who would be interested in how it works under the hood
reply
30 sats \ 0 replies \ @gmd 18 Mar
hmmmm... I see... yes, please do go on....
reply
The best way to increase your own trust score is to zap content early that you think other users will also zap in the future
This is lame. So your trust can suck because you don’t zap popular posts?
reply
I think you either have to make popular posts or zap popular posts to build trust. (When you make a post and pay the fee to post, it's treated like you zapping your own post in the trust graph calculations).
I do think that's appropriate though. Trust is mainly used to rank posts, so it makes sense to give a higher trust to someone who tends to zap things other people like. And to the extent that daily rewards are meant to incentivize content discovery, it also makes sense to reward those who zap content that other people like.
That being said, I can see how this might be unfair to a good SN contributor who happens to have idiosyncratic preferences. If you have thoughts to share about how to improve the WoT, I'm sure k00b and ek are open to feedback.
reply
But then you get into the debate popular vs. good. I Zap all kinds of stackers/posts/comments I zap new stackers just because. But now I know I am destroying my trust by zapping things that aren’t popular.
For example posting content that’s not Bitcoin related doesn’t get that many zaps. So if come to this site and I don’t care about bitcoin your trust can suffer.
reply
0 sats \ 0 replies \ @ek 18 Mar
For example posting content that’s not Bitcoin related doesn’t get that many zaps.
You don't lose trust for posting unpopular stuff
reply
These are fair points. If I'm reading the code correctly, your trust can go down from zapping posts that no one else zaps... but I think the effect seems like it would be very small. Still, I wonder if it's unintentional behavior. I do think there's room for an improved trust algorithm that rewards some of the behavior you describe.
For example posting content that’s not Bitcoin related doesn’t get that many zaps. So if come to this site and I don’t care about bitcoin your trust can suffer.
This part isn't true anymore, because all trust is territory specific.
reply
Hey bud, I enjoy your content. Dont worry about your trust score, I will keep reading your content as long as it is the same quality.
reply
i knew k00b and ek were in god mod! ahahah great post.
reply
If ever I could understand all that Math! Whatever you've mentioned is pretty technical stuff but by experience I've learnt to zap early (as a confession). However I don't know whether it increased my trust score.
Thanks for the last simplified part. That'll help.
reply
Zapping good content early should definitely increase your trust score!
Remember that it's territory-by-territory now, so you could have high trust in one territory but low trust in another. It all depends on how often you zap content in the territory that others subsequently also zap.
reply
Thanks. I'll remember it. There's a big drop in number of zappers in recent times. What do you think has been costing this?
reply
Almost certainly the move to non-custodial. Attaching wallets proved to be a big friction for a lot of users. Then, earning cowboy credits wasn't as fun as earning real sats.
reply
It's so damn true.
reply
Stacker news.
Where we trustlessly build trust through...
Cowboy credits?
reply