
โก Loading model data...
The definitive ranking of open-weight AI models you can self-host, fine-tune, and deploy without restrictions. This page now uses the same live dataset and open-source filtering logic as the Explore leaderboard, so the ordering matches the table view across the app.
Quality Index
56.584
Alibaba
Quality Index
53.905
Kimi
Quality Index
53.829
Xiaomi
| Rank | Model | Quality | Best Price | Top Speed | Max Context | Providers |
|---|---|---|---|---|---|---|
| 1 | Qwen3.7 Max Alibaba | 56.584 | $3.75/M | 195 tok/s | 1M | 3 |
| 2 | Kimi K2.6 Kimi | 53.905 | $1.44/M | 339 tok/s | 262K | 15 |
| 3 | MiMo-V2.5-Pro Xiaomi | 53.829 | $0.54/M | 89 tok/s | 1M | 4 |
| 4 | Qwen3.6 Max Preview Alibaba | 51.814 | $2.92/M | 37 tok/s | 256K | 1 |
| 5 | DeepSeek V4 Pro DeepSeek | 51.509 | $0.54/M | 155 tok/s | 1M | 11 |
| 6 | GLM-5.1 (Reasoning) Z AI | 51.408 | $1.66/M | 159 tok/s | 205K | 10 |
| 7 | Qwen3.6 Plus Alibaba | 49.985 | $1.13/M | 52 tok/s | 1M | 1 |
| 8 | GLM-5 (Reasoning) Z AI | 49.77 | $1.24/M | 179 tok/s | 203K | 8 |
| 9 | MiniMax-M2.7 MiniMax | 49.615 | $0.52/M | 432 tok/s | 205K | 6 |
| 10 | MiMo-V2-Pro Xiaomi | 49.202 | $1.50/M | 55 tok/s | 131K | 1 |
| 11 | MiMo-V2.5 Xiaomi | 49.034 | $0.18/M | 242 tok/s | 1M | 3 |
| 12 | Kimi K2.5 (Reasoning) Kimi | 46.813 | $0.90/M | 405 tok/s | 262K | 13 |
Ollama makes it easy to run open-weight models locally. Here are the top picks by hardware tier:
RTX 3070 ยท M2 MacBook Air
RTX 3090/4090 ยท M2 Pro/Max
2ร RTX 4090 ยท Mac Studio M2 Ultra
Tip: Use 4-bit quantization (Q4_K_M) to roughly halve VRAM requirements with minimal quality loss. For example, Llama 3.3 70B at Q4_K_M runs in ~40GB.
Only models with openly available weights are included. This page uses the same grouped Explore dataset and open-source filter as the live leaderboard, then orders those model families by Artificial Analysis Quality Index. We still surface coding benchmarks here when they are available:
Comprehensive knowledge benchmark across 14 domains. Tests breadth of model capability.
Competition math โ tests advanced reasoning. Best signal for math and science tasks.
Contamination-free code generation. Best signal for software development capability.
See live pricing from self-hosting providers, latency, and full benchmark scores for all open source models.
Qwen3.7 Max leads the live open-source ranking with a Quality Index of 56.584. For price-sensitive deployments, MiMo-V2.5 is the current budget leader at $0.18/M.
For Ollama and other self-hosted setups, start from the strongest live open-weight models here, then narrow by hardware tier. MiMo-V2.5-Pro is the best option here when context size matters most.
For most tasks, yes. The top open source models in 2026 trail proprietary leaders by only a small Quality Index margin. The main gaps remain in instruction-following polish, multimodal capability, and very long contexts.
As a rule of thumb, 8GB VRAM is enough for smaller 7B to 8B models, 24GB VRAM is a more practical floor for 30B-class models, and 40GB+ is usually required once you move into 70B territory unless you quantize aggressively. Apple Silicon is also viable for smaller and mid-sized open-weight models when unified memory is high enough.
Qwen3.7 Max is the strongest coding-oriented open source model on this page. For cheap API access, MiMo-V2.5 is the most economical current pick. See the full coding LLM ranking for more detail.
All models ranked
๐งฎAIME 2025 rankings
๐ฅ๏ธSelf-hosted picks
๐ฆOllama-first picks
๐Best for documents
โกCompare any 2โ4 models
Data sources: Rankings use the same live Explore dataset and open-source filter as the methodology page describes. Explore the filtered table in our interactive leaderboard or compare models side by side.