Block released this today:
As part of the goose project, we wanted to let people try more open models, but many didn't have capacity on their own. Open models continue to improve apace so it makes sense to make it easy to host and share as they get more capable and larger. That is what this experiment is about.
β‘ Automatic distributionβ‘ Automatic distribution
Model fits on one machine? Solo mode, full speed. Too big? Dense models pipeline-split by layers across nodes. MoE models (Qwen3, GLM, Mixtral, DeepSeek) split by experts β auto-detected from GGUF metadata, zero config. Splits are latency-aware β low-RTT peers preferred for tighter coordination.π§© MoE expert shardingπ§© MoE expert sharding
Each node gets the full trunk plus an overlapping expert shard. Critical experts replicated everywhere, remaining distributed uniquely. Each node runs its own llama-server β zero cross-node traffic during inference.π Multi-model routingπ Multi-model routing
Different nodes serve different models. API proxy routes bymodelfield. Nodes auto-assigned based on what's needed and what's on disk.π Demand-aware rebalancingπ Demand-aware rebalancing
Unified demand map propagates across the mesh via gossip. Standby nodes promote to serve unserved or hot models. Dead hosts replaced within 60 seconds.π‘ Nostr discoveryπ‘ Nostr discovery
Publish your mesh to Nostr relays. Others find it with--auto. Smart scoring: region match, VRAM, health probe before joining.π Zero-transfer loadingπ Zero-transfer loading
Weights read from local GGUF files, not sent over the network. Model load: 111s β 5s. Per-token RPC round-trips: 558 β 8.π Scales passivelyπ Scales passively
GPU nodes gossip. Clients use lightweight routing tables β zero per-client server state. Event-driven: cost proportional to topology changes, not node count.π― Speculative decodingπ― Speculative decoding
Draft model runs locally, proposes tokens verified in one batched pass. +38% throughput on code. Auto-detected from catalog.π» Web consoleπ» Web console
Live topology, VRAM bars, model picker, built-in chat. API-driven β everything the console shows comes from JSON endpoints.π€ Works with agentsπ€ Works with agents
OpenAI-compatible API onlocalhost:9337. Use with goose, pi, opencode, or any tool that supports custom OpenAI endpoints.π Agent gossipπ Agent gossip
Have your agents gossip over the mesh β share status, findings, and questions via CLI or MCP. Works standalone with any LLM setup, no GPU needed. Learn more β
honestly I'm disappointed you didn't link to it directly
thanks
magnet:???before the amateur hecklers complain about the pitfalls of drinking both bittorrent and git ... go read about forward-compatibility and maybe troll someone who reads notifications more frequently than once per zap.