This page targets the broad “best AI models”, “top AI models”, and “AI model ranking” intent. Use it to find the strongest models overall, then branch into dedicated rankings for coding, open source, long context, or agentic use.
Top overall
Gemini 3.1 Pro Preview
Quality Index 57.18
Best value
MiniMax-M2.7
$0.53/M blended
Fastest
Grok 4.20 0309 v2 (Reasoning)
151 tok/s
Largest context
Grok 4.20 0309 v2 (Reasoning)
2.0M tokens
Overall ranking is useful when you want the strongest frontier options before filtering for price, latency, or a specific workflow.
| Rank | Model | Quality | Price | Speed | Context |
|---|---|---|---|---|---|
| 1 | Gemini 3.1 Pro Preview | 57.18 | $4.50 | 125 tok/s | 1.0M |
| 2 | GPT-5.4 (xhigh) OpenAI | 56.8 | $5.63 | 78 tok/s | 1.1M |
| 3 | GPT-5.3 Codex (xhigh) OpenAI | 53.56 | $4.81 | 69 tok/s | 400K |
| 4 | GLM-5.1 (Reasoning) Z AI | 51.41 | $2.15 | 101 tok/s | 203K |
| 5 | GPT-5.2 (xhigh) OpenAI | 51.28 | $4.81 | 65 tok/s | 400K |
| 6 | Qwen3.6 Plus Alibaba | 49.98 | $1.13 | 50 tok/s | 1.0M |
| 7 | GLM-5 (Reasoning) Z AI | 49.77 | $1.55 | 45 tok/s | 203K |
| 8 | Claude Opus 4.5 (Reasoning) Anthropic | 49.73 | $10.00 | 54 tok/s | 200K |
| 9 | MiniMax-M2.7 MiniMax | 49.62 | $0.53 | 42 tok/s | 205K |
| 10 | Grok 4.20 0309 v2 (Reasoning) xAI | 49.33 | $3.00 | 151 tok/s | 2.0M |
Start with overall quality, but do not stop there. A model that leads the overall leaderboard can still be a bad fit if you need low latency, a huge context window, or open weights for self-hosting.
For production decisions, compare the final shortlist on Compare and validate provider-level tradeoffs on Compare Providers.
Best coding models
For software development, debugging, and code generation.
Best open source models
For self-hosting, Ollama, and budget-sensitive workloads.
Best local LLMs
For self-hosting on local hardware by VRAM tier.
Best Ollama models
For Ollama-first local inference and simple setup.
Largest context window models
For large documents, codebases, and retrieval-heavy pipelines.
Best agentic models
For tools, task execution, and autonomous workflows.
SEO Hubs
Start with the evergreen pages below. They align to the highest-intent SEO clusters and are built to stay current as model rankings change.
Current coding leaderboard using LiveCodeBench, Terminal-Bench, and SciCode.
Top open-weight models for self-hosting, Ollama, and low-cost API use.
Best local AI models by hardware tier for self-hosting on Macs, RTX GPUs, and workstations.
Ollama-first picks for coding, chat, reasoning, and low-friction local inference.
Best long-context models for large documents, codebases, and retrieval-heavy workflows.
Rankings for tool use, multi-step execution, and autonomous agent workflows.
The answer depends on whether you care most about raw quality, price efficiency, output speed, or context window. This page is the best starting point for the overall ranking, then you should move into the dedicated category pages for the final decision.
Use the coding ranking for that. Coding performance can diverge meaningfully from the overall leaderboard.
The best cheap model changes often, which is why WhatLLM tracks live pricing. A good value model is one that preserves strong quality while materially reducing blended token cost.
Start broad on this page, then compare finalists side by side. You want to evaluate benchmarks, provider pricing, output speed, and context window together rather than in isolation.