The open-weight field is large โ this page narrows it to models that actually run well on real hardware. Picks are organized by VRAM tier and use case, so you skip straight to what works on your machine.
๐ฅ
Kimi
๐ฅ
Xiaomi
๐ฅ
DeepSeek
Local LLM choice is mostly a hardware question first. Start with memory, then optimize for coding, reasoning, or general use.
M-series laptops, RTX 4060/4070, compact desktops
RTX 3090/4090, M2 Pro/Max, stronger workstation builds
Large local rigs, Mac Studio Ultra, server-class setups
If you care most about coding locally, prioritize LiveCodeBench and community deployment support.
| Model | LiveCodeBench | Quality | Best fit |
|---|---|---|---|
Kimi K2.5 (Reasoning) Kimi | 85% | 46.8 | Best local coding pick overall |
First, size the model to your memory budget. Then check benchmark fit for your use case. After that, prioritize community support, quantization quality, and the maturity of the deployment path you want to use.
For local coding, smaller models with stronger code benchmarks often beat bigger generalist models. For research or document work, context length and long-context reasoning matter more.
Ollama is the easiest local entry point for most developers. It is ideal for trying models quickly, iterating on prompts, and validating whether a model is good enough before you build a more production-grade serving stack.
If your primary intent is specifically Ollama, use Best Ollama Models for the dedicated page.
Model rankings
Browse the latest ranking pages for overall models, coding, open source, Ollama, long context, and agentic workflows.
Live ranking of the best overall AI models by quality, price, speed, and context window.
Current coding leaderboard using LiveCodeBench, Terminal-Bench, and SciCode.
Top open-weight models for self-hosting, Ollama, and low-cost API use.
Ollama-first picks for coding, chat, reasoning, and low-friction local inference.
Best long-context models for large documents, codebases, and retrieval-heavy workflows.
Rankings for tool use, multi-step execution, and autonomous agent workflows.
The best local model depends on your memory budget. The top overall open-weight model is not always the best choice for a 16GB or 24GB machine. Use the hardware tier guide above to find the right fit.
Qwen2.5-Coder 32B and DeepSeek Coder V2 are strong local coding baselines. Check the coding table above and size to your VRAM โ smaller setups should target the 7Bโ16B range.
Yes. Apple Silicon machines (M1/M2/M3/M4) are excellent for 7B to 32B class local models, especially with quantized weights. The unified memory architecture gives you much more headroom than a comparable GPU-only setup.
Self-host when privacy, data sovereignty, or predictable cost at scale matters. Use APIs when you want zero ops overhead, instant access to the latest models, or higher raw quality than local hardware can support.
8โ16GB covers 7B class models. 24GB opens up the 32B range for serious coding. 40GB+ with quantization lets you run 70B models for near-frontier local quality. See the hardware tier section above for specific picks.
Ollama is the easiest starting point for most developers. Once you have validated a model and need production throughput, move to vLLM or TGI for more advanced serving.