The Math Behind SN's Web of Trust
- Introduction
- Diffusion Analogy and Example
- What seed vector and seed weight does SN use?
- How does SN calculate its trust graph?
- How can I boost my trust score?
Introduction
SN's trust algorithm is based on TrustRank, which is itself a modified PageRank algorithm. The purpose of TrustRank is to calculate overall trust scores for a vector of nodes (i.e. users).
The TrustRank algorithm requires two objects as inputs: the trust graph
M
, and the seed vector \mathbf{v_0}
. It returns a vector of trust scores as output, \mathbf{v}
.The Trust Graph
The trust graph,
M
, is a matrix in which each element m_{ij}
describes how much node i
trusts node j
. The rows of M
add up to 1, so it's as if each node i
has one unit of trust to distribute among the other nodes.The Seed Vector
The seed vector,
\mathbf{v_0}
, describes a baseline level of initial trust for each node. Typically, the seed trust for most nodes is set to zero, but a handful of known, trustworthy nodes are seeded with a trust of 1.The seed vector influences the final trust scores by pulling the scores closer to itself. The strength of the seed vector's influence is determined by a parameter,
\alpha
, called the seed weight.The TrustRank Formula
Given the trust graph,
M
, the seed vector \mathbf{v_0}
, and the seed weight \alpha
, the final trust scores are given by the vector \mathbf{v}
that solves:\mathbf{v} = (1-\alpha) M^T \mathbf{v} + \alpha \mathbf{v}_0
In SN's case, the nodes represent users, the trust graph represents the users' degree of similarity in preferences, and the seed vector represents a-priori trusted users--mainly @k00b, @ek, and territory founders (when calculating trust inside a territory.)
Diffusion Analogy and Example
The TrustRank algorithm is best understood as a diffusion process. Imagine the nodes to be three countries, and the trust scores to be the stable distribution of population in the three countries.
The algorithm describes the following population diffusion process:
- In each period, a fraction
\alpha
of the entire global population simply teleports to each country at rates specified by\mathbf{v_0}
. - The other fraction,
1-\alpha
, of the global population migrates directionally according to rates specified byM
.m_{ij}
is the rate at which population in countryi
migrates to countryj
.
Let
\mathbf{v_t}
be the population distribution at time t
. Based on this diffusion process, the population distribution at time t+1
is:\mathbf{v_{t+1}} = (1-\alpha) M^T \mathbf{v_t} + \alpha \mathbf{v_0}
Repeated iteration of this diffusion process results in a stable distribution of population across countries. This stable distribution
\mathbf{v}
satisfies:\mathbf{v} = (1-\alpha) M^T \mathbf{v} + \alpha \mathbf{v}_0
This is the solution to the TrustRank algorithm. Thus, one can think of the TrustRank algorithm as giving us the stable distribution of trusts when trust is allowed to repeatedly diffuse across nodes according to the trust graph and seed vector.
Example
Let the initial population of countries be
\mathbf{v_1} = (0.3, 0.3, 0.4)
and let the seed vector be \mathbf{v_0} = (1, 0, 0)
(thus, when people teleport they always teleport to country 1). Let the seed weight be \alpha = 0.85
.Finally, let
M
be:M = \left(
\begin{array} ~
0 & 0.5 & 0.5 \\
0.5 & 0 & 0.5 \\
0 & 1 & 0
\end{array}
\right)
This says that people in country 1, when getting a chance to migrate, will choose country 2 with 50% probability and country 3 with 50% probability. People in country 2 are 50/50 with countries 1 and 3. And people in country 3, when migrating, always move to country 2.
After the first iteration:
v_2 = 0.15 \left(
\begin{array} ~
0 & 0.5 & 0.5 \\
0.5 & 0 & 0.5 \\
0 & 1 & 0
\end{array}
\right)^T
\left[
\begin{array} ~
0.3 \\ 0.3 \\ 0.4
\end{array}
\right] +
0.85 \left[
\begin{array} ~
1 \\ 0 \\ 0
\end{array}
\right]
=
\left[
\begin{array} ~
0.8725 \\ 0.0825 \\ 0.045
\end{array}
\right]
After the second iteration:
v_3 = 0.15 \left(
\begin{array} ~
0 & 0.5 & 0.5 \\
0.5 & 0 & 0.5 \\
0 & 1 & 0
\end{array}
\right)^T
\left[
\begin{array} ~
0.8725 \\ 0.0825 \\ 0.045
\end{array}
\right] +
0.85 \left[
\begin{array} ~
1 \\ 0 \\ 0
\end{array}
\right]
=
\left[
\begin{array} ~
0.8562 \\ 0.0722 \\ 0.0716
\end{array}
\right]
If you kept going, or if you solved for the exact solution, you would get the stable population distribution:
v =
\left[
\begin{array} ~
0.8556 \\ 0.0746 \\ 0.0698
\end{array}
\right]
What seed vector and seed weight does SN use?
SN does the trust calculations separately for each territory. For territories with at least 50 users who have ever zapped, downzapped, or posted, the seed vector is 1 for the territory founder and 0 for everyone else. For territories with less than 50 users who ever zapped, downzapped, or posted, the seed vector is 1 for @k00b, @ek, and the territory founder, and 0 for everyone else.
The seed weight is always
\alpha = 0.83
.How does SN calculate its trust graph?
This is the most complicated part to explain. The trust graph is calculated on a per-territory basis. It is also calculated separately for posts and for comments.
To calculate the trust graph, first define the following metrics for each user
i
and item k
:z_{ik}
: the total amount of sats that useri
zapped itemk
with (negative if downzapping)n_{ik}
: the total number of times that useri
zapped itemk
T_{ik}
: the timestamp when useri
first zapped or downzapped itemk
Now define the following second order metrics for users
i
and j
:before_{ij} = \sum_{k : T_{ik} > T_{jk}}
1\left[ z_{ik} z_{jk} > 0 \right] \frac{
\min( |z_{ik}|, |z_{jk}| ) }{
\max( |z_{ik}|, |z_{jk}| ) }
after_{ij} = \sum_{k : T_{ik} < T_{jk}}
1\left[ z_{ik} z_{jk} > 0 \right] \frac{
\min( |z_{ik}|, |z_{jk}| ) }{
\max( |z_{ik}|, |z_{jk}| ) }
disagree_{ij} = \sum_k 1\left[ z_{ik} z_{jk} < 0 \right]
total_{j} = \sum_k n_{jk}
-
before_{ij}
is a measure of the correlation betweeni
andj
's zaps for items in which userj
zapped before useri
. It is used as measure for how muchi
trustsj
. -
after_{ij}
is a measure of the correlation betweeni
andj
's zaps for items in which userj
zapped after useri
. It is removed from the denominator for trust calculations. -
disagree_{ij}
is a measure of how many itemsi
andj
had disagreements on (one up-zapped while the other down-zapped). -
total_{j}
is simply the total number of times userj
zapped any items.
Roughly speaking, the idea is to measure the trust that user
i
has for user j
as:\frac{ before_{ij} - disagree_{ij} }{ total_{j} - after_{ij}}
That is, out of all the times user
i
voted on items after user j
, in what fraction did they agree vs. disagree?1The actual calculation is a fair bit more complicated. Trust from
i
to j
is actually calculated using the equation below, then normalized to sum up to 1 row-wise:m_{ij} = H\left( before_{ij} - disagree_{ij}, ~ total_{j} - after_{ij}, ~ z \right)
Here,
H(x,y,z)
is the lower bound of the binomial proportion confidence interval, for an experiment with y
trials and x
successes, for a confidence level given by the Gaussian z-score z
. The z
score is chosen for a confidence level of 99.9999999%.To explain why the lower bound of a binomial proportion confidence interval is used, we can think of each item in which user
i
zapped after user j
as a trial: did user i
disagree or agree with user j
's zaps? The estimated probability that they agree is simply the number of agrees divided by the number of such items.However, if the number of items is small, we can't be confident that user
i
and user j
are actually similar in preferences. For example, if user i
only zapped two items after user j
, and both are agreements, then the number of trials is 2 and the number of successes is 2. The best estimate of the similarity between i
and j
is 100%, but because the sample size is so small, we can't actually be very confident of that estimate. The lower bound of the confidence interval would be much lower than 100%. Therefore we use that lower bound to calculate trust, in order to shade conservative in the trust scores.The above presentation is greatly simplified. There are many technical details that are glossed over, like thresholds for which zaps and downzaps are included in the calculation, and nuances in the code for how before and after are calculated. Moreover, calculating the exact TrustRank solution by matrix inversion is computationally expensive, so an iterative approximate solution is used instead. For the actual code, one can take a look at
worker/trust.js
in the SN Github Repo.How can I boost my trust score?
There is no straightforward way to boost your own trust score. This is because other users' trust of you is calculated based on their behavior after you already zapped an item. Thus, zapping items that other people already zapped doesn't directly boost your own trust score, it actually increases your trust in them.
The best way to increase your own trust score is to zap content early that you think other users will also zap in the future. Zapping content that you think @k00b, @ek, and territory founders will like is especially helpful, since these users are seeded with higher trust--though @k00b and @ek's influence only extends to the less active territories. In more active territories, only the territory founder is seeded with initial trust. So, if you want @Undisciplined to trust you, consider zapping libertarian opinions early in the ~econ territory!
Now that you have a deeper understanding of how SN's Web of Trust works, calibrate your behavior accordingly and build that trust! Happy stacking and happy zapping.
p.s. @k00b, @ek, let me know if I got anything wrong. I'll also follow up with more technical feedback in a GitHub issue.
Footnotes
-
This is an over-simplification. In practice, it's not just the number of agreements that's counted, but the number of agreements weighted by the ratio of the zap amounts. The number of disagreements is also subtracted, which could result in a negative numerator. The denominator also cannot be strictly interpreted as the number of times user
i
voted after userj
, both due to the weighting of the agreements by zap ratio, but also due to nuances in the code about how before/after is calculated. ↩
\sum
!