pull down to refresh

Block released this today:

As part of the goose project, we wanted to let people try more open models, but many didn't have capacity on their own. Open models continue to improve apace so it makes sense to make it easy to host and share as they get more capable and larger. That is what this experiment is about.

⚑ Automatic distribution⚑ Automatic distribution

Model fits on one machine? Solo mode, full speed. Too big? Dense models pipeline-split by layers across nodes. MoE models (Qwen3, GLM, Mixtral, DeepSeek) split by experts β€” auto-detected from GGUF metadata, zero config. Splits are latency-aware β€” low-RTT peers preferred for tighter coordination.

🧩 MoE expert sharding🧩 MoE expert sharding

Each node gets the full trunk plus an overlapping expert shard. Critical experts replicated everywhere, remaining distributed uniquely. Each node runs its own llama-server β€” zero cross-node traffic during inference.

πŸ”€ Multi-model routingπŸ”€ Multi-model routing

Different nodes serve different models. API proxy routes by model field. Nodes auto-assigned based on what's needed and what's on disk.

πŸ“Š Demand-aware rebalancingπŸ“Š Demand-aware rebalancing

Unified demand map propagates across the mesh via gossip. Standby nodes promote to serve unserved or hot models. Dead hosts replaced within 60 seconds.

πŸ“‘ Nostr discoveryπŸ“‘ Nostr discovery

Publish your mesh to Nostr relays. Others find it with --auto. Smart scoring: region match, VRAM, health probe before joining.

πŸš€ Zero-transfer loadingπŸš€ Zero-transfer loading

Weights read from local GGUF files, not sent over the network. Model load: 111s β†’ 5s. Per-token RPC round-trips: 558 β†’ 8.

πŸ“ˆ Scales passivelyπŸ“ˆ Scales passively

GPU nodes gossip. Clients use lightweight routing tables β€” zero per-client server state. Event-driven: cost proportional to topology changes, not node count.

🎯 Speculative decoding🎯 Speculative decoding

Draft model runs locally, proposes tokens verified in one batched pass. +38% throughput on code. Auto-detected from catalog.

πŸ’» Web consoleπŸ’» Web console

Live topology, VRAM bars, model picker, built-in chat. API-driven β€” everything the console shows comes from JSON endpoints.

πŸ€– Works with agentsπŸ€– Works with agents

OpenAI-compatible API on localhost:9337. Use with goose, pi, opencode, or any tool that supports custom OpenAI endpoints.

πŸ“ Agent gossipπŸ“ Agent gossip

Have your agents gossip over the mesh β€” share status, findings, and questions via CLI or MCP. Works standalone with any LLM setup, no GPU needed. Learn more β†’
15 sats \ 0 replies \ @adlai 4 Apr

honestly I'm disappointed you didn't link to it directly

reply
15 sats \ 0 replies \ @adlai 3 Apr

thanks

magnet:???


before the amateur hecklers complain about the pitfalls of drinking both bittorrent and git ... go read about forward-compatibility and maybe troll someone who reads notifications more frequently than once per zap.

reply