It feels to me like the `llm-to-app` interface is both more powerful and riskier than `app-to-llm`, but, `app-to-llm` is easier to both standardize and optimize. I think it really depends on what you want to achieve. 

There was a nice post coming in via HN this Monday, https://stacker.news/items/1057610/r/optimism, that basically discusses that chatbot interfaces suck. I subscribe to that thought and feel that prompt writing equals inefficiency, but it's how the LLMs are trained: to be a chatbot, a companion.

However, I believe, like the author of that article, that the better application of the technology is not interactive, but a background task incorporated into the process, rather than besides it.

If you want a chatbot, the mechanism I propose will probably hinder adoption because it requires adoption per-app. It's always cheaper to just circumvent everything and not ask for permission, but then you will quickly run into shenanigans like https://stacker.news/items/1052744/r/optimism. I'd really not want any unchecked capability that can do this on any of my devices, so the slower adoption is imho worth it. [^1]

[^1]: One of my favorite things nowadays is that I get a "DCL attempted by `<bad app>` and prevented" message from [GrapheneOS](https://grapheneos.org/features#exploit-protection), just like I've always loved SELinux, despite its complexity. It's always nice to have OS-level (and hardware) protections against naughty software.

> not give the model access to everything, but instead, allow everything to access the model. This may feel counter-intuitive, but I see this as "multi-modal LLM" being a (permissioned) API with a service worker behind it

Interesting.

I mean both approaches would be behind a "safe" API. But I hadn't thought about whether it would be nicer if the application uses the LLM as an API or the LLM retrieves information through an API.

Is one of these inherently more powerful than the other? Is one inherently safer than the other? If so, which way and why?

zuspotirko

Yes I would, but I don't think I'd like it in the way you're describing. Instead I'd want to augment specific processes: not give the model access to everything, but instead, allow everything to access the model. This may feel counter-intuitive, but I see this as "multi-modal LLM" being a (permissioned) API with a service worker behind it, just like 

This can then be extended to have also a 

 cache in the same way, so that an app (not a centralized process) can submit new knowledge (for processing and then caching) and query it, much like your "second-brain" idea:

You made a picture of a menu and allowed the "knowledge" it contained to be added that cache last year, and then when you make a picture of the menu for the same place this year it will tell you that your fish taco is now only 3k sats instead of 10k, but you also get only 1 instead of 2 for that money.

Would you use AI on phone if it was actually good?

Yes I would, but I don't think I'd like it in the way you're describing. Instead I'd want to augment specific processes: not give the model access to everything, but instead, allow everything to access the model. This may feel counter-intuitive, but I see this as "multi-modal LLM" being a (permissioned) API with a service worker behind it, just like `camera` or `microphone`.

For example:

- `Amber` doesn't need LLM, so that doesn't need the permission.
- `Obsidian` *could* use LLM, so that does need the permission, optionally, and when I enable it, it will use it.

This can then be extended to have also a `knowledge` cache in the same way, so that an app (not a centralized process) can submit new knowledge (for processing and then caching) and query it, much like your "second-brain" idea:

_You made a picture of a menu and allowed the "knowledge" it contained to be added that cache last year, and then when you make a picture of the menu for the same place this year it will tell you that your fish taco is now only 3k sats instead of 10k, but you also get only 1 instead of 2 for that money._