reply on: From the distributed dream to the digital feedlot \ stacker news ~AI

pull down to refresh

100 sats \ 3 replies \ @optimism 7 Sep \ parent \ on: From the distributed dream to the digital feedlot AI

Yes. I do all my "production" inference either locally or on an encrypted spot aws g4dn instance for large models (which was a headache to work out and i still think i should work on tuning it to get more juice for it - its very expensive)

I do test some of the commercial models at times but honestly the only one I've used that outperforms on coding is claude 3.7 sonnet (claude 4 regresses on coding for me) and not by enough margin to not use qwen3-coder. They both get into endless logic loops when dealing with complex code beyond the trivial in which thrir bad vibes created dumb bugs - very profitable for the provider too when youre paying or are capped per token.

100 sats \ 2 replies \ @roistrdn OP 7 Sep

I have an i7 laptop where I test models with ollama, mistral 7b and phi have run well, but they fall short and the times get long, testing paid models I do have to agree with you that anthropic's have been the best, but they really go off the rails and get too lost, I hope to be enjoying a local llama 3 very soon

100 sats \ 1 reply \ @optimism 7 Sep

I run locally on an Apple M4 macbook using their NLP chip (and have an old M1 to test "budget", which still works relatively decent up to 8b models) but it's much slower than a dedicated nvidia GPU. I can run 24b mistral using ollama locally - still my allround favorite model - actually runs okay-ish and I've been trying different distills of qwen3-coder 30b with mlx, but I'm not super happy with mlx yet.

I also used to use whisper a lot... but it is broken for me since the last macOS update and I can't seem to get it back to work quickly, ugh!

100 sats \ 0 replies \ @roistrdn OP 7 Sep

I'll keep testing models as I'm able to, my main laptop is an m2 but I never downloaded any llm on it, I'll venture to do it, with 16gb of ram I can test up to 13b