pull down to refresh

17 sats \ 3 replies \ @OT 25 Mar

Hmmm... Unauthorized?

reply

Think the the issue here might have been (erroneously) allowing 8192 output tokens on a smaller context window model (bug on my part). Should be fixed now if you want to give it another shot. Sending some sats your way too.

reply
140 sats \ 1 reply \ @OT 25 Mar

This time it worked!

How do you think about work flow when you have to pay an invoice every query? Consider using NWC and possibly a limit on the amount spent. That way you won't need to leave the webpage.

Also, I'm a bit confused with the box with all the models and next to it is another box with ollama or openai.

reply

Great feedback, thanks for trying it again!

On workflow, for programmatic use, lnget from Lightning Labs handles the full L402 flow automatically (pays the invoice and retries with credentials). The browser demo is really just a proof-of-concept to show the L402 protocol in action with a real LLM. The actual target is agents and developer tools that handle the pay-and-retry loop natively. That said, NWC auto-pay in the browser for human use is an interesting idea worth exploring.

The system is stateless by design. No sessions, no memory between calls. You pay for a single inference, get the response, and that's it. No accounts, no server-side conversation history. Building a proper chat experience for humans where context and conversation history carry over would need a stateful layer on top, which is a different product. Interesting vertical though.

On the dropdown, those toggle between two API formats. /api/generate takes a plain prompt string, /api/chat takes a messages array. Both do the same thing for the demo. For programmatic use, pick whichever matches your client.

Appreciate the feedback!

reply

Just checked out https://ppq.ai and https://cypherflow.ai - these look awesome! Appreciate the info.

reply