Local Inference

When Ollama is detected, agntk checks your hardware and picks the largest model your system can run comfortably:

Your RAM	Model Selected	Why
8 GB	`qwen3:8b` for everything	Fits in memory with room for OS
16 GB	`qwen3:14b` standard, `qwen3:8b` fast	Best balance of quality and speed
32+ GB	`qwen3:32b` reasoning, `qwen3:14b` standard	Full power for complex tasks

Apple Silicon unified memory, NVIDIA VRAM, and CPU-only systems are all detected automatically. The agent tells you what it picked:

  provider: ollama (http://localhost:11434)
  models:   32 GB RAM → qwen3:32b for reasoning/powerful, qwen3:14b for standard