reply on: What creative ideas have you been rambling on? \ stacker news

pull down to refresh

55 sats \ 5 replies \ @bounty_hunter 2 Sep 2025 \ on: What creative ideas have you been rambling on? Design

Still working on Innocuous, a way to encode/decode messages in LLM generated output: https://github.com/sutt/innocuous

I like my new example. This text:

Amidst the ancient forest, dwelt a wondrous Wizard renowned for his arcane might. One day, he unearthed true power not in enchantments, but in wellaimed words. Fearing misuse, he penned this power three ways in his magnum opus obscuring it with tedious trifles.

Time passed, the wizard departed, while countless seekers puzzled over his laborious tome. Until one sage, whose curiosity was matched only by fortitude, finally discovered the cryptic keys tucked deep within these words un

Decodes to: "pip install innocuous"

110 sats \ 4 replies \ @deSign_r OP 3 Sep 2025

Still need to wrap my head around this. In simple terms, are initial_prompt, chunk_size, num_logprobs, and encoded_prompt the only things I'll need to make sure to remember or save somewhere when I decide to decode the output?

LLM are something that is changing really fast. Is this compatible with any model?

You mention as use case examples the encoding of PGP keys, url, cryptocurrency addresses, nostr pubkeys... You'd trust this method to hide wallet seeds like traditional stenography do?

30 sats \ 3 replies \ @bounty_hunter 3 Sep 2025

Thanks for taking a look! The idea that you're getting at is quite important. I've imagined there's a "standard" where the first two bytes will represent a "version number" which will set a value for the free floating paramteres, and have lots of different initial_prompts to produce a variety of texts.

version=96version=96

prompt = "Once upon a time, in a kingdom far away, there lived a"
chunk_size = 3
model = Mistral7Bv0.2Q4

version=154version=154

inital_prompt="The algorithm processes data by first analyzing the input and then"
chunk_size=2
model = Llama4.1-8B-Q6

You will only need to remeber encoded_prompt which will have the data + version number encoded in it. So it should eventually be able to work like opening up .docx with MSWord. If it's a valid text create by the encode, it will open up the message, if it's invalid it will fail or show random characters.

I get that people's instincts are let me hide my seedphrase in there because it's the most obvious thing to hide but it's not really the correct fit (IMO). There's other opportunities that I see opening up after people think about this concept for a week...

100 sats \ 2 replies \ @deSign_r OP 4 Sep 2025

You will only need to remeber encoded_prompt which will have the data + version number encoded in it.

Are you sure will be enough? I feel another detail to remember is the model? Or any model can be used?

Why not seedphrase and what are the other opportunities you see at the horizon?

0 sats \ 1 reply \ @bounty_hunter 4 Sep 2025

Yes models must match, and it's trickier because there are often dozens of "levels of quantization" for each every model. But the particular model / quant-level expected can come baked into a version code-number or its own meta-paramater and be checked rather rigorously by asserting on the model-weight's hash at load time.

The opportunities are pretty wide in my opinion, but will take years and many researchers to ideate on. For example, take the Spaces Protocol below, the what, how, why of creating identities or address space on a blockchain took a decade from when the earliest of adopters started thinking about blockchain technology.

Basically if more of communications continues to go into/out-of/in-between LLM's there's this secondary "data layer" that can be hidden or exposed with ideas like this. There are freedrom and opensource concerns too since the major similar art is Google and the like developing internal system to watermark their model output: https://deepmind.google/science/synthid/

0 sats \ 0 replies \ @deSign_r OP 7 Sep 2025

I feel this detail complicate things, especially because LLMs are evolving really quickly. A good way to solve is maybe keeping track of all compatible models so end users can find out if the model used is compatible or find a compatible one that fits their needs.

And in the description phase? Would the user limited to use the original model used for the encryption? Or any other compatible model will do the job?