pull down to refresh

I really need to stop being so lazy and figure out how to run these myself and not just wait for ollama to implement it
100 sats \ 0 replies \ @optimism 18h
what I do to simply test a safetensors model (though this one is huge so you need proper hardware) through hf/torch:
env, prereqs and model download:
uv venv
. .venv/bin/activate
uv pip install torch transformers accelerate
# optional: hf auth login
hf download "<org/repo>" # i.e. "google/gemma-3-270m-it"
example usage:
import torch
from transformers import pipeline

model_name = "google/gemma-3-270m-it" # org/repo format as used in hf download

chat = [
  {"role": "system", "content": "You're a helpful assistant."},
  {"role": "user", "content": "Explain consciousness in simple, concise terms."},
]

pipeline = pipeline(task="text-generation", model=model_name, device_map="auto")
response = pipeline(chat, max_new_tokens=512)
print(response[0]["generated_text"][-1]["content"])
example output: uv run yourfile.py
% uv run test.py
Consciousness is the state of being aware of yourself and your surroundings. 
It's like having a personal identity and internal world.
reply