Every pick below is matched to a specific use case and VRAM tier โ so you can go straight from decision to a running model without wading through specs that don't apply to your hardware.
Best Ollama model for coding
Best blend of local coding strength, community support, and realistic hardware requirements.
Best Ollama model for general use
Strong general-purpose local model when you have enough memory for a serious setup.
Best small Ollama model
A practical choice for smaller laptops and lower-memory machines.
Best Ollama model for reasoning
One of the strongest local reasoning-oriented models if you can afford the memory footprint.
Match the model to the box you actually own. Ollama works best when you avoid oversized models that turn every response into a latency test.
Best when you want fast experimentation on mainstream hardware.
Best for local coding assistants and higher-quality daily use.
Best when you want near-frontier local quality and can pay the memory cost.
Start with use case and memory budget. If you want local coding, pick a code-specialized model first. If you want a general local assistant, pick a model with stronger overall quality and good community support.
Then optimize for speed. A slightly smaller model that runs well locally is often better than a much larger model you avoid using because it is too slow.
Ollama is the easiest local runtime for exploration, prototyping, and small-scale daily use. It removes a lot of friction compared with heavier local stacks.
For broader self-hosted rankings, use Best Local LLM. For overall open-weight quality, use Best Open Source LLM.
Ollama is one deployment path. The broader open-weight quality leader on WhatLLM right now is Kimi K2.6, which is why you should still check the full open-source ranking before deciding whether a local/Ollama-first compromise is worth it.
Model rankings
Browse the latest ranking pages for overall models, coding, open source, Ollama, long context, and agentic workflows.
Live ranking of the best overall AI models by quality, price, speed, and context window.
Current coding leaderboard using LiveCodeBench, Terminal-Bench, and SciCode.
Top open-weight models for self-hosting, Ollama, and low-cost API use.
Best local AI models by hardware tier for self-hosting on Macs, RTX GPUs, and workstations.
Best long-context models for large documents, codebases, and retrieval-heavy workflows.
Rankings for tool use, multi-step execution, and autonomous agent workflows.
Top picks include Qwen2.5-Coder 32B for coding, Llama 3.3 70B for general use on larger setups, Gemma 3 4B for small hardware, and DeepSeek R1 distills for local reasoning workflows.
Qwen2.5-Coder 32B is one of the best local coding picks for Ollama if you have enough memory. Smaller setups can use Qwen2.5 7B or DeepSeek Coder V2 16B.
Llama 3.3 70B is a strong general-purpose Ollama choice when you can support it locally. Smaller machines should bias toward smaller Gemma or Qwen options.
Stay in the smallest tier. Gemma 3 4B and Qwen2.5 7B are much more realistic than trying to force larger models onto underpowered hardware.
Yes. Apple Silicon Macs (M1/M2/M3/M4) are excellent for Ollama. The unified memory architecture means you can run larger models than you could on a comparable GPU-only machine.
Use Ollama for convenience and fast local iteration. Switch to vLLM or TGI when you need production-grade batching, higher throughput, or multi-GPU serving.