pull down to refresh
100 sats \ 2 replies \ @optimism 3h \ parent \ on: From the distributed dream to the digital feedlot AI
Yes. I do all my "production" inference either locally or on an encrypted spot aws g4dn instance for large models (which was a headache to work out and i still think i should work on tuning it to get more juice for it - its very expensive)
I do test some of the commercial models at times but honestly the only one I've used that outperforms on coding is claude 3.7 sonnet (claude 4 regresses on coding for me) and not by enough margin to not use qwen3-coder. They both get into endless logic loops when dealing with complex code beyond the trivial in which thrir bad vibes created dumb bugs - very profitable for the provider too when youre paying or are capped per token.
I have an i7 laptop where I test models with ollama, mistral 7b and phi have run well, but they fall short and the times get long, testing paid models I do have to agree with you that anthropic's have been the best, but they really go off the rails and get too lost, I hope to be enjoying a local llama 3 very soon
reply
I run locally on an Apple M4 macbook using their NLP chip (and have an old M1 to test "budget", which still works relatively decent up to 8b models) but it's much slower than a dedicated nvidia GPU. I can run 24b mistral using ollama locally - still my allround favorite model - actually runs okay-ish and I've been trying different distills of qwen3-coder 30b with mlx, but I'm not super happy with mlx yet.
I also used to use whisper a lot... but it is broken for me since the last macOS update and I can't seem to get it back to work quickly, ugh!
reply