Building a new Web Search Engine. Just for you, Stackers! \ stacker news

pull down to refresh

Building a new Web Search Engine. Just for you, Stackers!

14.4k sats \ 47 comments \ @brugeman 5 Dec 2022 bitcoin

zaps forwarded to @dergigi (100%)

Hello everyone!

Many of us hate the idea of our attention being used as a currency.

How about we experiment with our own web search engine, paid for with LN, without ads?

I've got a prototype you could play with here: https://realsearch.cc

What are your thoughts on the micropayment-based web search?

Could Google look different today if LN existed long ago?

Your 2 cents (sats?) are much appreciated!

view all related items

11 sats \ 1 reply \ @gandlaf21 12 Dec 2022

Is the code open source?

1 sat \ 0 replies \ @brugeman OP 12 Dec 2022

No it's not, no plan for that atm

101 sats \ 1 reply \ @Scholarhacker 9 Dec 2022

I like it. One thing that Stacker News has shown me, and your search engine is very similar, is that even micropayments (a few sats) is enough of a cost to deter unwarranted use. I really think twice about posting something that might not get rewarded on SN because then I'm at a net loss. With your search engine too, I don't want to spend sats on useless, frivolous energy-draining searches. Having to pay, keeps that usage focused and in check. Using the lightning network is really a remarkable way for developers to build a better internet experience. My hat's off to all of you pioneering that work.

2 sats \ 0 replies \ @brugeman OP 9 Dec 2022

Thank you your kind feedback! I totally agree that paying for things it is much better way to get mindful of what you're doing. I guess the problem of time wasted on useless behavior is more prominent on social media, not search, but the general principle should still apply.

101 sats \ 1 reply \ @apxhard 6 Dec 2022

Is it possible to build this as a totally decentralized project running on LN rather than as a single centralized site?

Lightning nodes could all host and expose 'indices', which could link to each other.

You could imagine people competing to run the best index on various topics (i.e. a jazz index, a beer index, a scifi index, etc), and then various 'metaindices' competing to store indices of indices.

So when you did a search, your own local node would consult its index and the index of everyone you were connected to, etc. People could add or drop nodes that gave results they did, or didn't like. Censorship would become basically impossible, and the search engine would learn people's preferences over time, locally, on your own machine. Personalization without surveillance.

API requests could contain tiny lightning invoices for payment, and API operates could decide what sort of payment they requested.

11 sats \ 0 replies \ @brugeman OP 6 Dec 2022

That's a great question, I've thought about it, and I don't really have the answer (

There exists a distributed search engine, it's called YaCy. Here is their public demo web UI (no need to setup the client): https://yacy.searchlab.eu/

As far as I can see, a decade of work on this thing, and it's search results are still really bad. Even my toy project seems to do better. Why is that? I'm not sure.

My guess is that the most important thing with search is fighting abuse (spam, etc). With many servers operated by independent entities, each having their own view on quality or shilling different sites, getting consistently good results might be a really big problem.

I'd say we should start with our goals. Why decentralize it?

To avoid censorship - that's a very broad idea, must dig very deep into what, who, how etc.
To get personalization without surveillance - that can be done with a local app (like a browser extension), that will do query-rewriting and post-filtering of the non-personalized search result on the client, without decentralizing the engine.

I agree this is an attractive direction to think about, but at this point I'm not ready to invest in experiments there :)

11 sats \ 1 reply \ @go 6 Dec 2022

Can this be implemented into SN?

1 sat \ 0 replies \ @brugeman OP 7 Dec 2022

Can you please elaborate? Do you mean search feature?

101 sats \ 1 reply \ @pi 6 Dec 2022

I very much like what you are building, I like the plan model too, but your implementation of it adds unnecessary friction, in my humble opinion.

I think the data should be stored on the server, to avoid having to export and import from one client to another, which is not practical, my wife would never use it in this form.

1 sat \ 0 replies \ @brugeman OP 6 Dec 2022

Thank you for your feedback! I totally agree that this adds friction, but creating an account to store your data would mean I'd be tracking your activity. I guess I'm gonna have to allow for this account-less option for a while.

1044 sats \ 4 replies \ @brugeman OP 5 Dec 2022

My personal favorite ideas:

Revenue must be shared with the content providers. No marketplace takes 100% of merchants' revenues, web-search shouldn't either.
People should be able to influence the search results. LN should help fight the sybil attacks. This whole AI thing makes me feel humans should have more to say about what comes up in search results. Not sure about the exact mechanism though.

649 sats \ 3 replies \ @bumi 5 Dec 2022

yes, sharing the revenue with the publishers would be a nice USP and solves some of the criticism of how it currently works.
maybe the user even only pays when they actually click the link - assuming this means the result looks relevant to them.

1 sat \ 2 replies \ @brugeman OP 6 Dec 2022

Thanks for the feedback!
Regarding the 'user only pays when clicks' - need to think about that, this sounds nice, but at least I see a problem that search will then be abused by bots that load search results for free but don't 'click', as Google is abused today. Anyways, thanks a lot!

31 sats \ 1 reply \ @bumi 7 Dec 2022

What would bots do with the search results? bots are generally a problem I guess?

1 sat \ 0 replies \ @brugeman OP 8 Dec 2022

Bots do DDoS, they use search to provide various services for marketers and SEO industry (without sharing any revenue with the search engine), they try to influence search engine's internal metrics, etc.

I think any action should have a proper cost, which should solve or at leave heavily reduce various unwanted behaviors.

I like the idea of taking sats for a search, and then redistributing these sats to both those who received views and those who received clicks. In what proportion, and how to do accounting properly is the question - views happen right after you pay for search, but clicks on that search might happen days later, not trivial to setup accounting for this.

380 sats \ 3 replies \ @Kaffi 5 Dec 2022

Pretty cool, but kind of weird that I have to buy a "plan" to search like I don't want to be working and have to re-up my account. You should probably use the stacker news model and just let me fill up a wallet and then you can just charge 10 sats per search and if I don't like the product I can withdraw my sats at any time. Sent you some sats though because I have decided to always gift sats to any builders.

11 sats \ 2 replies \ @brugeman OP 5 Dec 2022

Thanks a lot for your support! I agree that the payment model isn't perfect. The 'plan' you buy is a way to 'fill up' and then run searches at the specific price. There is no withdraw feature atm, but that sounds nice, gonna add that! I didn't want to force you into creating an account on a search engine that doesn't track you :)

687 sats \ 1 reply \ @Kaffi 5 Dec 2022

I mean I would just log in using LNURL right so while it creates an account it wouldn't be the normal kind? based on your other comments though seems like you don't want to manage accounts. which makes sense but seems very limiting

1 sat \ 0 replies \ @brugeman OP 6 Dec 2022

Not that I'm totally against accounts, but people should at least be able to use this without an account to avoid being tracked. I'll think about that, maybe having an LNURL-based account would allow me to build some unique features, like personal filters to remove mass-media sites etc... Thank you!

321 sats \ 6 replies \ @Jon_Hodl 5 Dec 2022

What I really want from a search engine is to be able to omit and prioritize certain websites from the search results.

For example, I don't ever want to see search results from mainstream media outlets and I want to prioritize certain other sites.

If you can build that, I will use it.

1 sat \ 5 replies \ @brugeman OP 6 Dec 2022

Thank for your input @Jon_Hodl! For that, I'd have to let you create an account and have your personal settings stored there. That would mean I'd be tracking your search history. Your acc would be anonymous of course (LN-based), but still. Would that be ok with you?

21 sats \ 4 replies \ @Jon_Hodl 6 Dec 2022

I'm not following why there is a need to track anyone's search history.

For blocking a URL, all there would need to be is a row of boxes that I paste a URL into and the search engine never shows me those sites.

For prioritized sites, another row of boxes I enter sites that I want to be shown at the top if they have content related to my search queries.

When I search, the site searches the prioritized sites first and if any of the prohibited sites have content related to my search query, they wouldn't be shown to me.

Does that make sense?

1 sat \ 3 replies \ @brugeman OP 6 Dec 2022

This makes total sense!
I understand the idea of a whitelist and a blacklist of sites. You'd have to create an account on my server, I would store your lists there, and for every query you make I look into those lists to filter the results for you. You get better results, but I get the full list of your searches, all linked to your account.

I wonder if some perceived loss of privacy here is actually negligible compared to the positive effects this would give.

101 sats \ 2 replies \ @Jon_Hodl 8 Dec 2022

Is there a way to keep the whitelist/blacklist client side?

1 sat \ 1 reply \ @brugeman OP 8 Dec 2022

Need to experiment here, not sure atm. Keeping the list is simple, filtering on the client means much more data has to be served to the client, that might be an issue. OTOH, if I make the price higher for this feature, then I might get properly compensated. Thank you for pushing on this!

1 sat \ 0 replies \ @gandlaf21 12 Dec 2022

You could store templates (list of blackisted sites) on the server, which the user can download onto his client, or just simply let the user add his own entries. you can store the list on the client, inside local storage. you can do the filtering client side also, just match the set from the server with the urls in the local storage, and throw out any matches. no need for the server to know anything.

I think this is a niche feature though

301 sats \ 3 replies \ @Neo 5 Dec 2022

This article contains some great ideas, how we could potentially disrupt google with lightning etc. https://hivemind.vc/howtodisruptgoogle/

1 sat \ 2 replies \ @brugeman OP 6 Dec 2022

Wow, that's a lot to unpack! Thank you very much for this!

51 sats \ 1 reply \ @Neo 6 Dec 2022

Yes, a lot of great ideas in the article. I hope they give you some useful inspirations, really excited to see how your search engine will evolve. Keep up the great work.

11 sats \ 0 replies \ @brugeman OP 6 Dec 2022

Thanks, I really appreciate your kind words!

231 sats \ 1 reply \ @bumi 5 Dec 2022

I would like to drop sats for each search or maybe also more sats for some special features.
To me buying a subscription and topping up an account then can also be done with card and is a big barrier of entry for me.

with https://www.webln.guide/ you could simply requests sats from the user onclick.

1 sat \ 0 replies \ @brugeman OP 6 Dec 2022

Thanks @bumi, gonna look into webln and a way to pay-per-search without topping it up. And it's not a 'subscription' right now, it's a fixed amount of sats that you buy and then spend on searches. Still, thank you for this perspective!

31 sats \ 3 replies \ @kr 5 Dec 2022

great idea, awesome to see new variations on search engines being built on lightning.

i agree with @brugeman that revenue should be shared with content creators over time.

perhaps one way to do this is to detect whether a website that is surfaced on search results has a lightning address in their site header.

192 sats \ 2 replies \ @brugeman OP 5 Dec 2022

Thanks @kr! I'm brugeman, building this thing :)
For now I simply log which urls received views in top 10 of search results. My plan is to reach-out to websites that accumulate non-trivial amount of sats and have them publish their lightning address, one way or the other (http-header, or meta, or a separate file). Then I'd be able to stream sats their way.

11 sats \ 1 reply \ @kr 5 Dec 2022

lol i didn’t catch your username at the top 😅

192 sats \ 0 replies \ @brugeman OP 5 Dec 2022

Thanks for your feedback! :)

11 sats \ 5 replies \ @nikicat 5 Dec 2022

Good intention, but web search works well only if you already have A LOT OF users to train it's ranking algorithm on their data. So I don't believe that general web search is possible this way, and to make it possible we need, as community, in a first place start sharing our browsing histories (anonymized of course) to some public service, so that projects like this could use it to make a good algorithm.

1 sat \ 4 replies \ @brugeman OP 6 Dec 2022

Thank you for your feedback @nikicat! I will disagree though. There are at least two proxies for 'browsing histories' - web links and social mentions (the more external links a page has, the more 'popular' it is). It was used by Google from the very beginning (PageRank), and I use it too already. In fact, without the link data, ranking the web is plain impossible - way too much spam. Of course, Google now has access to huge volumes of clickstream data, but having worked in SEO industry for many years I can tell that PageRank is still by far the biggest ranking factor.

21 sats \ 3 replies \ @nikicat 6 Dec 2022

I far as I know Google as well as other search engines stopped to use PageRank like 10 years ago (or significantly lowered this factor weight), because it's pretty easy to fake.

1 sat \ 2 replies \ @brugeman OP 6 Dec 2022

They don't use the exact PageRank formula, that's for sure, and they stopped publishing pagerank values, but links are still the major factor. And they're not that easy to game, all the SEO link farms are trivial to detect and the real impact is impossible without the actual content marketing and promotion. The whole 'we use 200 ranking factors' is mostly public relations story - attempts to mislead SEO industry and make search look more defensible as a business. Number of factors don't matter, their weight matters, and links are still at the top.

101 sats \ 1 reply \ @nikicat 7 Dec 2022

This is a bit strange because of both click-factor and link-factor are user-behavior based factors, the difference is that links to webpages are posted by users much rarely than users search for the same webpages, and, moreover, click-factor provides a relation information between search query the page, and links do not.
So, in my opinion, PageRank is a good tool to make initial draft of ranking algorithm and as soon as you have enough auditory you should use user behavior data as much as possible, but it's a classic chicken-and-the-egg problem.
What do you think about buying user data from companies that collect it or directly from users willing to sell it?

1 sat \ 0 replies \ @brugeman OP 7 Dec 2022

Links have anchors (link text) - it relates target page with a query.

I think the catch is with cost - it's very easy to abuse click-factor, and the only way to fight that is to track and filter very aggressively and still not be sure how much you were 'gamed'. Link-factor is much costlier - you have to buy a domain, a server, get your own site promoted first, before you can signal anything. Link farms that don't have trusted in-links are worthless, no matter how big.

In general, 'user data' might be costly and thus higher quality, or cheap - noise, so whether it's worth buying depends on that. Anonymous free click streams are noise. If we get to scale with an anonymous click-stream paid for with LN then maybe we might get a good signal, but I'm not sure about that yet.

Google probably started paying attention to click-stream only when they've managed to get many people log-in to their gmail accounts in their browsers. At least now they could filter out anonymous stuff and also normalize the effects of 'heavy-clickers'. Privacy suffered though. I hope that anonymous paid click stream could work too, but we'll see.

11 sats \ 1 reply \ @Tristan 5 Dec 2022

This is pretty cool. Hope this does well

1 sat \ 0 replies \ @brugeman OP 6 Dec 2022

Thanks!

11 sats \ 1 reply \ @siggy47 5 Dec 2022

I agree with the revenue comments below. It must be structured using a model such as fountain or brave. Searchers shouldn't be charged to search. They should be paid to search. Perhaps mimic the brave model substituting lightning for the shitcoin.

11 sats \ 0 replies \ @brugeman OP 5 Dec 2022

I'm not sure the 'get paid to view ads' model can work. Do you see a way to do that without massive tracking effort, to prevent people from abusing the system and using bots to get paid?

0 sats \ 0 replies \ @ThereIsNoSecondBest 6 Dec 2022 freebie

Guess you're working on it cause I'm getting a "Search failed: Service Unavailable" error :) Will try again later.