is a 4B-parameter model obtained via post-training distillation from a larger teacher, transferring capabilities while preserving general-purpose performance on standard benchmarks. The result is a compact, ownable base that is straightforward to fine-tune, broadly applicable and minimizing the usual capacity–capability trade-offs.

, a code-tuned variant, will be released soon.

 in a low-memory, local-only RAG workflow I was developing and while I was A/B testing multiple alternatives, I found that this recently got released. I've been using it's predecessor, 

), before the second Qwen3 update made that the better choice.

 in consistency and instruction following (which Nvidia advertises to be great at). Where most of the smaller open weight models (including Qwen3, Nemotron3, gpt-oss, Mistral 3.1 small.. and so on) can be pretty awful in their ways of making stuff up when quoting and editing source text (which is unacceptable in zero-fail RAG), 

 shone. I can even run this on my old Macbook M1 with a 20k context window, without it needing any memory compression.

Thus far, this model has been a blessing to my project but I want to test it further. Perhaps this can serve as 

Wouldn't it be awesome if "we can actually have nice things"?

> `Jan-v3-4B-base-instruct` is a 4B-parameter model obtained via post-training distillation from a larger teacher, transferring capabilities while preserving general-purpose performance on standard benchmarks. The result is a compact, ownable base that is straightforward to fine-tune, broadly applicable and minimizing the usual capacity–capability trade-offs.
>
> Building on this base, `Jan-Code`, a code-tuned variant, will be released soon.

----

Last week I needed something to replace `Qwen3-8B` in a low-memory, local-only RAG workflow I was developing and while I was A/B testing multiple alternatives, I found that this recently got released. I've been using it's predecessor, `Jan-v2`, for MCP mid last year (https://stacker.news/items/1016343/r/optimism), before the second Qwen3 update made that the better choice. 

It beat other, larger models, like `nemotron-3` in consistency and instruction following (which Nvidia advertises to be great at). Where most of the smaller open weight models (including Qwen3, Nemotron3, gpt-oss, Mistral 3.1 small.. and so on) can be pretty awful in their ways of making stuff up when quoting and editing source text (which is unacceptable in zero-fail RAG), `jan-v3` shone. I can even run this on my old Macbook M1 with a 20k context window, without it needing any memory compression.

Thus far, this model has been a blessing to my project but I want to test it further. Perhaps this can serve as `pleb-claude`? 

***Wouldn't it be awesome if "we can actually have nice things"?***