February 2026 Rankings Available
See the latest open source LLM rankings featuring Kimi K2.5 and updated scores.
Self-host, fine-tune, and deploy without restrictions. Open source models now match proprietary alternatives—hitting 90% on LiveCodeBench and 97% on AIME 2025.
Keep all data on your infrastructure. No API calls to third parties. Critical for healthcare, legal, and enterprise applications.
Fine-tune for your specific use case. Modify behavior, remove guardrails, or train on proprietary data. No terms of service limitations.
At high volumes, self-hosting becomes dramatically cheaper. No per-token fees—just infrastructure costs.
| Rank | Model | Quality | LiveCodeBench | AIME 2025 | MMLU-Pro |
|---|---|---|---|---|---|
| 1 | GLM-5 (Reasoning) Z AI | 49.64 | - | - | - |
| 2 | Kimi K2.5 (Reasoning) Kimi | 46.73 | 85% | 96% | - |
| 3 | MiniMax-M2.5 MiniMax | 41.97 | - | - | - |
| 4 | GLM-4.7 (Thinking) Z AI | 41.7 | 89% | 95% | 86% |
| 5 | DeepSeek V3.2 DeepSeek | 41.2 | 86% | 92% | 86% |
| 6 | Kimi K2 Thinking Kimi | 40.3 | 85% | 95% | 85% |
| 7 | MiniMax-M2.1 MiniMax | 39.3 | 81% | 83% | 88% |
| 8 | MiMo-V2-Flash Xiaomi | 39 | 87% | 96% | 84% |
| 9 | Llama Nemotron Ultra NVIDIA | 38 | 64% | 64% | 83% |
| 10 | MiniMax-M2 MiniMax | 35.7 | 83% | 78% | 82% |
| 11 | DeepSeek V3.2 Speciale DeepSeek | 34.1 | 90% | 97% | 86% |
| 12 | DeepSeek V3.1 Terminus DeepSeek | 33.4 | 80% | 90% | 85% |
The verdict: For coding tasks, open source models like GLM-5 (Reasoning) now match or exceed proprietary alternatives. The gap has effectively closed for most practical applications.
Use hosted APIs for open models. Get the benefits of open source with the ease of SaaS.
Together.ai — Competitive pricing, great selection
Fireworks.ai — Fast inference, good for production
Groq — Fastest inference available
Run models on your own infrastructure for maximum control and privacy.
Ollama — Easiest local setup, great for development
vLLM — Production-grade serving, excellent throughput
Text Generation Inference — HuggingFace's production server
Use our interactive tool to compare benchmarks, parameters, and provider pricing for all 12 open source models.
As of January 2026, GLM-5 (Reasoning) leads our open source rankings with exceptional performance across coding () and reasoning () benchmarks. It's completely free to download and use under an open license.
The best free AI models in 2026 are GLM-5 (Reasoning) (Quality Index 49.64), DeepSeek V3.2 (Quality 57), and Qwen3-235B (Quality 57). All can be self-hosted without licensing fees. For API access without self-hosting, providers like Together.ai and Fireworks.ai offer competitive pricing starting at $0.20-0.50 per million tokens.
Yes, for most tasks. GLM-5 (Reasoning) achieves on LiveCodeBench, matching GPT-5 on coding tasks. The gap has closed dramatically—open models now trail proprietary ones by only 5-7 quality index points on average.
For Ollama users, we recommend: Qwen2.5-72B for general tasks, DeepSeek-Coder-V2 for coding,Llama-3.3-70B for reasoning, and Mixtral-8x22B for cost-effective performance. All run efficiently with Ollama's quantization.
7B-13B models run on consumer GPUs (16GB+ VRAM like RTX 4090).70B+ models need enterprise GPUs (A100/H100) or multi-GPU setups. For cost-effective local deployment, use quantized versions (GGUF, AWQ) which reduce memory 50-75% with minimal quality loss.
Open weights means you can download and use the model, but may have license restrictions.Fully open source (like Llama 3.1, Qwen 2.5) includes training code and permissive licenses. All models in our ranking allow commercial use—check specific licenses for fine-tuning and redistribution terms.
Llama 4 Scout supports up to 10 million tokens of context. MiniMax-Text-01 offers 4 million tokens. For practical use, most models like GLM-5 (Reasoning) offer 128K-256K tokens, which handles most real-world document processing needs.
LiveCodeBench rankings
🧮AIME 2025 rankings
📄Largest context windows
🏆Expert picks
Data sources: Rankings based on the Artificial Analysis Intelligence Index. Explore all models in our interactive explorer or compare models side-by-side.