pull down to refresh

Overview

I shipped the Web of Trust ranking algorithm. You can view it at https://stacker.news/wot. You'll either have to click that link or manually type the /wot into the url bar. Eventually this ranking algo will go live on the main feed but for now, I'll just be evaluating its performance to make sure it's strictly increasing signal.

Why

For the main feed, we need more defense against Sybils than sats at reasonable amounts can provide. As described in [RFC] Ranking SN posts, we either need to quantify trust or develop a model that's trustless. In the end we felt like a trust based model was better for a community than a trustless one - although, in the future some subs (job boards, product hunts, etc.) might be given trustless ranking algorithms.

How it works

Each user is a node in a graph and directed edges in the graph represent how much a user, Alice, trusts another user, Bob, on a scale of 0-0.9. For now, we determine this edge value based on how many upvotes Alice has given Bob. We then use the model described in Modelling a Public-Key Infrastructure (1996) to assign each user a single trust score of 0-0.9 where the source node is the moderator fo the sub (i.e. me on the main feed). We compute these trust scores daily.
That trust score is then used to weight a user's upvote in the ranking algorithm. For instance if Alice has a trust of 0.9 and Bob has a trust value of .45, Bob's upvotes count for half as much in ranking. Carol who is a Sybil created by Bob and hasn't received any upvotes, has a trust of 0 and consequently their upvotes don't count at all.

How this might evolve

As the site grows, eventually logged in users will have their own WoT and not rely solely on the mod's. The consequence of this is that a user's feed won't be a commons subject to its tragedies - it will be "private property" which we know scales much better.
We might also make the model more elaborate with time, taking into account more than just upvotes between users.

We can use this for more than just ranking

  1. active, trusted user airdrops
    • we can take all the Sybil fees/revenue subs earn at the end of each day and give the sats back to the most trusted, active users
  2. feature unlocks
    • e.g. only trusted users can boost or report spam

Additional research

It occurred to me while implementing the above model, that this is basically PageRank, Google's search algorithm, but for trust. Instead of the random surfer choosing a webpage based on backlink quality and quantity, our random surfer is a user choosing a post based on upvote quality and quantity! Unfortunately, as I set out to implement this, I found this patented paper using this exact markov model approach.
Again, it's sadly patented. I'm not sure it'd hold up in court as it is just an application of Page Rank - whose patent has long expired - but I don't want to poke a patent troll. Our model works well enough anyway and their patent will expire in a couple years; we can probably license it too in the meantime if we want to (afaik no one is using it).

For the nerds

Once we build the web of trust, we basically do a breadth first search from the mod user (me), building up all disjoint, acyclic paths (and their accumulated trust) from me to every other user. We then take these trust values and "compress" them into a single trust value.
I believe the resulting algorithm is O(NlogN) where N is the number of users in the graph with inbound edges. We do not consider cycles or overlapping paths. Intuitively, overlapping trust relationships have a marginal impact on a user's trust relative to independent ones. Additionally, we limit the depth of bfs to 6 because that should be far enough to capture every user. We also don't compute the single trust value using the inclusion-exclusion principle which is O(k!) where k is the number of terms or inbound paths to a particular node. Instead we approximate it.
It's all subject to change but that's the bulk of it.

What's next

I'll continue to monitor the algorithm and it'll go live if it seems to perform well. I'll simultaneously begin working on some of the smaller items from the gh issue backlog and probably take a swing at search starting next week.
interesting to hear about this algorithm - thanks
reply
Great work!
reply
Wow, this sounds amazing! Great work again. It sounds really interesting.
reply
i really like the way you approach this! keep on testing!
reply
I’ve started to notice duplicate posts recently. Often I think it’s non-intentional, but I wonder if deduping would help the ranking/quality?
reply
When users posts they’re alerted that there are duplicates. They seem to be ignoring it.
I’ll try to figure something else out. @nout suggested I charge more when a duplicate is posted.
reply
Charge more based on age too. Not sure if that's possible or not. I wonder if there is some API out there that can tell when a site was last modified somehow or scan the page for a publication date. Maybe a hard problem.
reply
That’s an interesting idea. I found a python library that makes a decent attempt at grabbing a publication date.
reply
If a link was already posted by you or someone else, perhaps suggest boosting that link instead in the UI.
reply
That’s an interesting idea!
reply
What do you suggest if you want to share a useful resource (e.g. a KYC-free Bitcoin freelance website) but it was already posted e.g. in September with 0 comments/upvotes? If you comment/upvote the OP in September, would it display on the homepage again?
reply
I think you should still be able to post it again, it would just cost you extra sats (e.g. 21 sats) to post it. If the post gains popularity and votes as you are predicting you would still be doing ok.
reply
That would be fine. For me, it's less about earning sats and more about trying to share & increase adoption of KYC-free services (e.g. Microlancer, PayWithMoon, Azteco, SilentLink). These platforms are great and work well, but sadly not many people know about them.
reply
I think "valuable" re-posts, belong in a wiki somewhere. It's not news. It's reference material. Maybe an entirely new site. "https://stacker.reference" where the voting caters to that.
reply
"Undesirables" should cost more and sats should go to SN :)
reply
Agree that duplicate posts should cost more. Not because they're undesirable though, just because they should come with a higher burden of proof.
I purposely posted a duplicate the other day, simply because the OP chose a poor title. My post should have definitely cost more to create, but the fact that I earned more from my post (with a better title) should be a signal that my duplicate wasn't necessarily undesirable... one could even argue that the first post was the one that was undesirable.
reply
I agree that your post was desirable in this situation. I mean it more that it's undesirable to have duplicate posts from SN perspective.
reply
Yup, makes sense.
reply
I should've mentioned: we don't really need this on a day to day basis right now but 10x from now this will likely make a huge difference. It should simply increase our confidence we can grow without the feed becoming polluted.
reply
It's only a matter of time before scammers and altcoin shills (but I repeat myself) discover SN. Right now, below their radar, but as soon as they discover that 10 sats gets them a front page, and with that a Tweet (on @StackerNewsFeed), ... that WoT ranking and/or other measures will be crucial.
reply
Great post! Hopefully some of this can be reused in nostr, too
reply
Thanks for the write-up! Hopefully this works wonderfully and inspires others to build a similar system.
Is there any thought on raising the minimum say for untrustworthy user from commenting on one's posts?
Will our trust score (and others) be visible and transparent?
reply
I hadn't thought of preventing untrustworthy users from doing certain things. That's certainly an option.
Trust scores might eventually be visible once we really refine the model. For now it's kind of crude and would probably just be confusing to the users.
reply
If Michael Saylor were here, he could have an inbox. The only thing you need is to set the bar based on that trust score.
reply
Thanks for the insights. Love to read posts like that!
And, it's live (for the front page, and comments)!
wot is now the ranking mechanism we use on the homepage and in comments
SN release: wot ranking is live, search enhancements, top spenders, bug fixes #10036
reply
notifying a use that their post is duplicate would be a simple start should be easy for a URL post
i say this because i have posted at least one duplicate without realising!
reply
When you post a duplicate it shows a list of the duplicates, but perhaps there need to be something more obvious
reply
[deleted]
reply