Skip to content

Local Inference

When Ollama is detected, agntk checks your hardware and picks the largest model your system can run comfortably:

Your RAMModel SelectedWhy
8 GBqwen3:8b for everythingFits in memory with room for OS
16 GBqwen3:14b standard, qwen3:8b fastBest balance of quality and speed
32+ GBqwen3:32b reasoning, qwen3:14b standardFull power for complex tasks

Apple Silicon unified memory, NVIDIA VRAM, and CPU-only systems are all detected automatically. The agent tells you what it picked:

provider: ollama (http://localhost:11434)
models: 32 GB RAM → qwen3:32b for reasoning/powerful, qwen3:14b for standard