pull down to refresh

SN release preview: Web of Trust ranking algorithm

1358 sats \ 29 comments \ @k00b 18 Jan 2022 meta

Overview

I shipped the Web of Trust ranking algorithm. You can view it at https://stacker.news/wot. You'll either have to click that link or manually type the /wot into the url bar. Eventually this ranking algo will go live on the main feed but for now, I'll just be evaluating its performance to make sure it's strictly increasing signal.

Why

For the main feed, we need more defense against Sybils than sats at reasonable amounts can provide. As described in [RFC] Ranking SN posts, we either need to quantify trust or develop a model that's trustless. In the end we felt like a trust based model was better for a community than a trustless one - although, in the future some subs (job boards, product hunts, etc.) might be given trustless ranking algorithms.

How it works

Each user is a node in a graph and directed edges in the graph represent how much a user, Alice, trusts another user, Bob, on a scale of 0-0.9. For now, we determine this edge value based on how many upvotes Alice has given Bob. We then use the model described in Modelling a Public-Key Infrastructure (1996) to assign each user a single trust score of 0-0.9 where the source node is the moderator fo the sub (i.e. me on the main feed). We compute these trust scores daily.

That trust score is then used to weight a user's upvote in the ranking algorithm. For instance if Alice has a trust of 0.9 and Bob has a trust value of .45, Bob's upvotes count for half as much in ranking. Carol who is a Sybil created by Bob and hasn't received any upvotes, has a trust of 0 and consequently their upvotes don't count at all.

How this might evolve

As the site grows, eventually logged in users will have their own WoT and not rely solely on the mod's. The consequence of this is that a user's feed won't be a commons subject to its tragedies - it will be "private property" which we know scales much better.

We might also make the model more elaborate with time, taking into account more than just upvotes between users.

We can use this for more than just ranking

active, trusted user airdrops
- we can take all the Sybil fees/revenue subs earn at the end of each day and give the sats back to the most trusted, active users
feature unlocks
- e.g. only trusted users can boost or report spam

Additional research

It occurred to me while implementing the above model, that this is basically PageRank, Google's search algorithm, but for trust. Instead of the random surfer choosing a webpage based on backlink quality and quantity, our random surfer is a user choosing a post based on upvote quality and quantity! Unfortunately, as I set out to implement this, I found this patented paper using this exact markov model approach.

Again, it's sadly patented. I'm not sure it'd hold up in court as it is just an application of Page Rank - whose patent has long expired - but I don't want to poke a patent troll. Our model works well enough anyway and their patent will expire in a couple years; we can probably license it too in the meantime if we want to (afaik no one is using it).

For the nerds

Once we build the web of trust, we basically do a breadth first search from the mod user (me), building up all disjoint, acyclic paths (and their accumulated trust) from me to every other user. We then take these trust values and "compress" them into a single trust value.

I believe the resulting algorithm is O(NlogN) where N is the number of users in the graph with inbound edges. We do not consider cycles or overlapping paths. Intuitively, overlapping trust relationships have a marginal impact on a user's trust relative to independent ones. Additionally, we limit the depth of bfs to 6 because that should be far enough to capture every user. We also don't compute the single trust value using the inclusion-exclusion principle which is O(k!) where k is the number of terms or inbound paths to a particular node. Instead we approximate it.

It's all subject to change but that's the bulk of it.

What's next

I'll continue to monitor the algorithm and it'll go live if it seems to perform well. I'll simultaneously begin working on some of the smaller items from the gh issue backlog and probably take a swing at search starting next week.

view all related items

10 sats \ 0 replies \ @el_zonte 23 Jan 2022

interesting to hear about this algorithm - thanks

3 sats \ 0 replies \ @phaedrus 19 Jan 2022

Great work!

2 sats \ 0 replies \ @JustSomePleb 18 Jan 2022

Wow, this sounds amazing! Great work again. It sounds really interesting.

2 sats \ 0 replies \ @relc 18 Jan 2022

i really like the way you approach this! keep on testing!

1 sat \ 0 replies \ @And1 19 Jan 2022 freebie

Thanks for the insights. Love to read posts like that!

1 sat \ 13 replies \ @phaedrus 19 Jan 2022

I’ve started to notice duplicate posts recently. Often I think it’s non-intentional, but I wonder if deduping would help the ranking/quality?

12 sats \ 12 replies \ @k00b OP 19 Jan 2022

When users posts they’re alerted that there are duplicates. They seem to be ignoring it.

I’ll try to figure something else out. @nout suggested I charge more when a duplicate is posted.

3 sats \ 1 reply \ @jeff 19 Jan 2022

Charge more based on age too. Not sure if that's possible or not. I wonder if there is some API out there that can tell when a site was last modified somehow or scan the page for a publication date. Maybe a hard problem.

2 sats \ 0 replies \ @k00b OP 19 Jan 2022

That’s an interesting idea. I found a python library that makes a decent attempt at grabbing a publication date.

2 sats \ 1 reply \ @cameri 20 Jan 2022

If a link was already posted by you or someone else, perhaps suggest boosting that link instead in the UI.

0 sats \ 0 replies \ @k00b OP 20 Jan 2022

That’s an interesting idea!

1 sat \ 3 replies \ @nout 19 Jan 2022

"Undesirables" should cost more and sats should go to SN :)

0 sats \ 2 replies \ @kr 19 Jan 2022

Agree that duplicate posts should cost more. Not because they're undesirable though, just because they should come with a higher burden of proof.

I purposely posted a duplicate the other day, simply because the OP chose a poor title. My post should have definitely cost more to create, but the fact that I earned more from my post (with a better title) should be a signal that my duplicate wasn't necessarily undesirable... one could even argue that the first post was the one that was undesirable.

0 sats \ 1 reply \ @nout 19 Jan 2022

I agree that your post was desirable in this situation. I mean it more that it's undesirable to have duplicate posts from SN perspective.

0 sats \ 0 replies \ @kr 19 Jan 2022

Yup, makes sense.

reply on another page

1 sat \ 3 replies \ @deleted231216 19 Jan 2022

deleted by author

1 sat \ 1 reply \ @nout 19 Jan 2022

I think you should still be able to post it again, it would just cost you extra sats (e.g. 21 sats) to post it. If the post gains popularity and votes as you are predicting you would still be doing ok.

0 sats \ 0 replies \ @deleted231216 19 Jan 2022

deleted by author

1 sat \ 0 replies \ @jeff 19 Jan 2022

I think "valuable" re-posts, belong in a wiki somewhere. It's not news. It's reference material. Maybe an entirely new site. "https://stacker.reference" where the voting caters to that.

1 sat \ 1 reply \ @k00b OP 18 Jan 2022

I should've mentioned: we don't really need this on a day to day basis right now but 10x from now this will likely make a huge difference. It should simply increase our confidence we can grow without the feed becoming polluted.

2 sats \ 0 replies \ @cointastical 20 Jan 2022

It's only a matter of time before scammers and altcoin shills (but I repeat myself) discover SN. Right now, below their radar, but as soon as they discover that 10 sats gets them a front page, and with that a Tweet (on @StackerNewsFeed), ... that WoT ranking and/or other measures will be crucial.

1 sat \ 0 replies \ @melvincarvalho 18 Jan 2022

Great post! Hopefully some of this can be reused in nostr, too

1 sat \ 2 replies \ @cameri 18 Jan 2022

Thanks for the write-up! Hopefully this works wonderfully and inspires others to build a similar system.

Is there any thought on raising the minimum say for untrustworthy user from commenting on one's posts?

Will our trust score (and others) be visible and transparent?

1 sat \ 1 reply \ @k00b OP 18 Jan 2022

I hadn't thought of preventing untrustworthy users from doing certain things. That's certainly an option.

Trust scores might eventually be visible once we really refine the model. For now it's kind of crude and would probably just be confusing to the users.

1 sat \ 0 replies \ @cameri 19 Jan 2022

If Michael Saylor were here, he could have an inbox. The only thing you need is to set the bar based on that trust score.

0 sats \ 0 replies \ @cointastical 4 Feb 2022

And, it's live (for the front page, and comments)!

wot is now the ranking mechanism we use on the homepage and in comments

SN release: wot ranking is live, search enhancements, top spenders, bug fixes #10036

0 sats \ 1 reply \ @el_zonte 23 Jan 2022

notifying a use that their post is duplicate would be a simple start should be easy for a URL post

i say this because i have posted at least one duplicate without realising!

0 sats \ 0 replies \ @k00b OP 23 Jan 2022

When you post a duplicate it shows a list of the duplicates, but perhaps there need to be something more obvious

0 sats \ 0 replies \ @cointastical 20 Jan 2022

[deleted]