pull down to refresh

The sudden concern with illegal data in blocks and mempools feels like pearl clutching to me (#1263156), but if it is of concern to folks, here is an interesting proposal by Roasbeef to use fountain codes to create a new kind of pruned node that can still help other nodes with IBD.

What are fountain codes?

Fountain codes provide a way for data to be transmitted across a lossy network connection from a single source to many users. The name "fountain code" arises because fountain codes behave analogously to a fountain. A file being transmitted is analogous to a glass of water from the fountain. To fill the glass (or reconstruct the file), you need to catch enough droplets from the fountain such that your glass becomes full. It isn't important which droplets of water you catch in your glass; once you have enough, the glass is full (and the file can be reconstructed). Fountain codes work in the same way, but with information instead of water. source

How could they work in Bitcoin?

In a Bitcoin context this means a node could specify a buffer number of the most recent blocks that it will always keep. For blocks prior to these, the node could assemble them into chunks and encode these blocks into "droplets." After encoding the node could delete the actual blocks, only keeping the encoded droplets. It would repeat this all the way back to genesis.
droplet nodes are the nodes that can save space by storing the encoded chain, while still being able to help nodes with IBD
droplet nodes use a random seed to select d of the k blocks in each epoch to encode, this is a decentralized process
bucket nodes are new nodes that join the network, which are able to recover the original chain by sampling K public nodes with overwhelming probability
γ is a system param, which represents the fraction of blocks that nodes actually keep
if we set γ to 100, then we reduce the required storage from ~700 GB to ~7GB, but we need to contact 110 nodes to do IBD
if we set γ to 1000, then each node stores ~700 MB, but you need 1k nodes to sync
γ can be tuned to navigate a tradeoff between storage savings, and the number of bootstrapping nodes required
There is a paper about this from 2019 and a GMax post on the Bitcoin Wiki from 2015.
I don't know if such a scheme is very practicable, but perhaps the pearl clutchers would do well to spend some time on things like this rather than screaming about things on X.
from the thread
phroo: Can this be combined with utreexo? Say a node has a slider option how much percent of random data it prunes. If the slider is at 100% then you have a compact utreexo node, if it's at 0% you have a fat utreexo (with full accumulator data). Anything in between is fountain coding.
roasbeef: so the chunks you encode can actually be blocks with the full utreexo inclusion data
I think this is complementary to utreexo, this let's you store less block data
utreexo lets you not store the entire utxo set
phroo: phyroo Great. State pruning while able to validate + history pruning while able to (partially) bootstrap. Sounds like a good combo.
reply
interesting! I didn't see that part of the thread
reply