pull down to refresh

I run locally on an Apple M4 macbook using their NLP chip (and have an old M1 to test "budget", which still works relatively decent up to 8b models) but it's much slower than a dedicated nvidia GPU. I can run 24b mistral using ollama locally - still my allround favorite model - actually runs okay-ish and I've been trying different distills of qwen3-coder 30b with mlx, but I'm not super happy with mlx yet.
I also used to use whisper a lot... but it is broken for me since the last macOS update and I can't seem to get it back to work quickly, ugh!
I'll keep testing models as I'm able to, my main laptop is an m2 but I never downloaded any llm on it, I'll venture to do it, with 16gb of ram I can test up to 13b
reply