100 sats \ 1 reply \ @nikicat 7 Dec 2022 \ parent \ on: Building a new Web Search Engine. Just for you, Stackers! bitcoin
This is a bit strange because of both click-factor and link-factor are user-behavior based factors, the difference is that links to webpages are posted by users much rarely than users search for the same webpages, and, moreover, click-factor provides a relation information between search query the page, and links do not.
So, in my opinion, PageRank is a good tool to make initial draft of ranking algorithm and as soon as you have enough auditory you should use user behavior data as much as possible, but it's a classic chicken-and-the-egg problem.
What do you think about buying user data from companies that collect it or directly from users willing to sell it?
Links have anchors (link text) - it relates target page with a query.
I think the catch is with cost - it's very easy to abuse click-factor, and the only way to fight that is to track and filter very aggressively and still not be sure how much you were 'gamed'. Link-factor is much costlier - you have to buy a domain, a server, get your own site promoted first, before you can signal anything. Link farms that don't have trusted in-links are worthless, no matter how big.
In general, 'user data' might be costly and thus higher quality, or cheap - noise, so whether it's worth buying depends on that. Anonymous free click streams are noise. If we get to scale with an anonymous click-stream paid for with LN then maybe we might get a good signal, but I'm not sure about that yet.
Google probably started paying attention to click-stream only when they've managed to get many people log-in to their gmail accounts in their browsers. At least now they could filter out anonymous stuff and also normalize the effects of 'heavy-clickers'. Privacy suffered though. I hope that anonymous paid click stream could work too, but we'll see.
reply