
⥠Loading model data...
The top 30 models ranked by quality. Updated live from Artificial Analysis data. Last updated: April 2026
Top overall
GPT-5.5 (xhigh)
QI 60.238
Best value
DeepSeek V4 Flash
$0.01/M
Fastest
Llama 3.1 Instruct 8B
2503 tok/s
Largest context
Grok 4.20 0309 v2 (Reasoning)
2.0M
| # | Model | Quality | Price/M | Speed | Context |
|---|---|---|---|---|---|
| 1 | GPT-5.5 (xhigh)OpenAI | 60.238 | $11.3 | 64 tok/s | 1.1M |
| 2 | GPT-5.5 (high)OpenAI | 58.868 | $11.3 | 62 tok/s | 1.1M |
| 3 | Claude Opus 4.7Anthropic | 57.277 | $10.0 | 77 tok/s | 1.0M |
| 4 | Gemini 3.1 Pro PreviewGoogle | 57.175 | $4.5 | 138 tok/s | 1.0M |
| 5 | GPT-5.4 (xhigh)OpenAI | 56.8 | $5.6 | 81 tok/s | 1.1M |
| 6 | GPT-5.5 (medium)OpenAI | 56.711 | $11.3 | 55 tok/s | 1.1M |
| 7 | Gemini 3.5 Flash (high)Google | 55.329 | $3.4 | 232 tok/s | 1.0M |
| 8 | Kimi K2.6Kimi | 53.905 | $1.4 | 253 tok/s | 262K |
| 9 | MiMo-V2.5-ProXiaomi | 53.829 | $1.2 | 59 tok/s | 1.1M |
| 10 | GPT-5.3 Codex (xhigh)OpenAI | 53.561 | $4.8 | 67 tok/s | 400K |
| 11 | Grok 4.3 (high)xAI | 53.196 | $1.6 | 94 tok/s | 1.0M |
| 12 | Claude Opus 4.6Anthropic | 52.949 | $10.9 | 56 tok/s | 1.0M |
| 13 | Qwen3.6 Max PreviewAlibaba | 51.814 | $2.9 | 36 tok/s | 256K |
| 14 | Claude Sonnet 4.6Anthropic | 51.719 | $6.6 | 77 tok/s | 1.0M |
| 15 | DeepSeek V4 ProDeepSeek | 51.509 | $1.4 | 171 tok/s | 1.0M |
| 16 | GLM-5.1 (Reasoning)Z AI | 51.408 | $1.7 | 176 tok/s | 205K |
| 17 | GPT-5.2 (xhigh)OpenAI | 51.277 | $4.8 | 72 tok/s | 400K |
| 18 | GPT-5.5 (low)OpenAI | 50.779 | $11.3 | 54 tok/s | 1.1M |
| 19 | Qwen3.6 PlusAlibaba | 49.985 | $1.1 | 53 tok/s | 1.0M |
| 20 | GLM-5 (Reasoning)Z AI | 49.77 | $1.2 | 180 tok/s | 203K |
| 21 | Claude Opus 4.5 (Reasoning)Anthropic | 49.727 | $10.0 | 55 tok/s | 200K |
| 22 | MiniMax-M2.7MiniMax | 49.615 | $0.52 | 452 tok/s | 205K |
| 23 | Grok 4.20 0309 v2 (Reasoning)xAI | 49.332 | $3.0 | 230 tok/s | 2.0M |
| 24 | MiMo-V2-ProXiaomi | 49.202 | $1.5 | 65 tok/s | 131K |
| 25 | GPT-5.2 Codex (xhigh)OpenAI | 49.035 | $4.8 | 105 tok/s | 400K |
| 26 | MiMo-V2.5Xiaomi | 49.034 | $0.64 | 91 tok/s | 1.1M |
| 27 | GPT-5.4 mini (xhigh)OpenAI | 48.904 | $1.7 | 162 tok/s | 400K |
| 28 | Grok 4.20 0309 (Reasoning)xAI | 48.481 | $3.0 | 103 tok/s | 2.0M |
| 29 | Gemini 3 Pro Preview (high)Google | 48.394 | $4.5 | 140 tok/s | 1.0M |
| 30 | GPT-5.4 (low)OpenAI | 47.941 | $5.6 | 62 tok/s | 1.1M |
Showing top 30 of 309 ranked models
View all in Explore âEach guide goes deeper than the quick filters, with methodology, benchmarks, and picks per scenario.
Model rankings
Browse the latest ranking pages for overall models, coding, open source, Ollama, long context, and agentic workflows.
Current coding leaderboard using LiveCodeBench, Terminal-Bench, and SciCode.
Top open-weight models for self-hosting, Ollama, and low-cost API use.
Best local AI models by hardware tier for self-hosting on Macs, RTX GPUs, and workstations.
Ollama-first picks for coding, chat, reasoning, and low-friction local inference.
Best long-context models for large documents, codebases, and retrieval-heavy workflows.
Rankings for tool use, multi-step execution, and autonomous agent workflows.
Every model is scored using the Artificial Analysis Intelligence Index â a composite of GPQA Diamond, AIME 2025, LiveCodeBench, MMLU-Pro, and other benchmarks, weighted into a single 0-100 quality score. Speed, price, and context window are tracked live across providers.
The overall ranking is a starting point. For production decisions, narrow by use case using the category pages above, then compare finalists head-to-head on Compare.
GPT-5.5 (xhigh) leads on overall quality right now, but the best model depends on your priorities. Coding, cost, speed, and context length all shift the answer. Use the category rankings above to find the right fit.
DeepSeek V4 Flash currently offers one of the best quality-to-cost ratios. Open-source models on providers like Groq or Together can be even cheaper at strong quality levels.
Start with overall quality index, then narrow by what matters for your workload: cost per million tokens, output speed, context window, or a specific capability like coding or tool use. Use our Compare tool to put finalists head to head.
Llama 3.1 Instruct 8B leads on output speed right now at 2503 tokens/second. Speed matters most for real-time applications and agentic workflows with many sequential steps.
Grok 4.20 0309 v2 (Reasoning) has the biggest context window in this ranking at 2.0M. For a dedicated long-context comparison, see our largest context window page.
Data is pulled from Artificial Analysis and refreshed automatically. New models appear as soon as they have benchmark scores and provider endpoints. The ranking reflects the live state of the leaderboard.