📊Live rankings + monthly archives

Best AI Models
Complete Rankings Hub

Find the best AI model for your use case. Start with the live evergreen rankings for coding, open source, agentic workflows, and long context, then use the monthly archive pages when you want dated snapshots and historical ranking context.

View Live Rankings Explore All 100+ Models Compare Side-by-Side

Model rankings

Current live rankings

Browse the latest ranking pages for overall models, coding, open source, Ollama, long context, and agentic workflows.

Overall

Best AI Models

Live ranking of the best overall AI models by quality, price, speed, and context window.

Open page →

Coding

Best LLM for Coding

Current coding leaderboard using LiveCodeBench, Terminal-Bench, and SciCode.

Open page →

Open Source

Best Open Source LLM

Top open-weight models for self-hosting, Ollama, and low-cost API use.

Open page →

Self-Hosted

Best Local LLM

Best local AI models by hardware tier for self-hosting on Macs, RTX GPUs, and workstations.

Open page →

Ollama

Best Ollama Models

Ollama-first picks for coding, chat, reasoning, and low-friction local inference.

Open page →

Long Context

Largest Context Window LLM

Best long-context models for large documents, codebases, and retrieval-heavy workflows.

Open page →

Agents

Best Agentic Models

Rankings for tool use, multi-step execution, and autonomous agent workflows.

Open page →

💻Live

Best Coding LLMs

Top AI models for programming ranked by LiveCodeBench, Terminal-Bench, and SciCode.

Top 3:

🥇 GPT-5.2 Codex🥈 Claude Opus 4.5🥉 GLM-4.7 Thinking

Key: LiveCodeBenchView Rankings

🔓Live

Best Open Source LLMs

Top open-weight models you can self-host, fine-tune, and deploy without restrictions. Featuring Kimi K2.5.

Top 3:

🥇 GLM-4.7 Thinking🥈 Kimi K2.5🥉 DeepSeek V3.2

Key: Quality IndexView Rankings

🖥️Live

Best Local LLMs

Best self-hosted models for local inference on consumer and workstation hardware.

Top 3:

🥇 Qwen2.5-Coder 32B🥈 Llama 3.3 70B🥉 DeepSeek Coder V2

Key: Self-hosting fitView Rankings

🦙Live

Best Ollama Models

Ollama-first recommendations for coding, general use, reasoning, and smaller local machines.

Top 3:

🥇 Qwen2.5-Coder 32B🥈 Llama 3.3 70B🥉 Gemma 3 4B

Key: Local fitView Rankings

💰January 2026

Best Budget LLMs

Cheapest AI APIs ranked by quality-per-dollar. Best value without sacrificing performance.

Top 3:

🥇 DeepSeek V3.2🥈 Gemini Flash🥉 Qwen3-235B

Key: Value ScoreView Rankings

👁️January 2026

Best Vision & Multimodal LLMs

Top AI models for image understanding ranked by MMMU Pro and LM Arena Vision.

Top 3:

🥇 Gemini 3 Pro🥈 GPT-5.1🥉 Claude Opus 4.5

Key: MMMU ProView Rankings

🧮January 2026

Best Math & Reasoning LLMs

Top models for mathematical reasoning ranked by AIME 2025 and GPQA Diamond.

Top 3:

🥇 o3🥈 GPT-5.2 (xhigh)🥉 Claude Opus 4.5

Key: AIME 2025View Rankings

🤖Live

Best Agentic AI Models

Top models for autonomous agents, tool use, and multi-step task completion.

Top 3:

🥇 Claude Opus 4.5🥈 GLM-4.7 Thinking🥉 GPT-5.2

Key: Tool UseView Rankings

📄Live

Best Long Context LLMs

AI models with the largest context windows for processing massive documents.

Top 3:

🥇 Llama 4 Scout (10M)🥈 Gemini 3 Pro (2M)🥉 Claude Opus 4.5 (1M)

Key: Context WindowView Rankings

🏆Live

Top 3 AI Models

Our editorial picks combining benchmarks with real-world testing and experience.

Top 3:

🥇 Claude Opus 4.5🥈 GLM-4.7 Thinking🥉 Gemini 3 Pro

Key: OverallView Rankings

How We Rank AI Models

📊

Real Benchmarks

GPQA Diamond, AIME 2025, LiveCodeBench, MMLU-Pro—not synthetic tests.

🔄

Weekly Updates

Rankings refresh weekly as new models launch and benchmarks update.

💰

Price Included

We track pricing from 30+ API providers so you can find the best value.

⚡

Speed Tracked

Tokens per second and latency measured across different providers.

Data source: All rankings use the Artificial Analysis Intelligence Index—the most comprehensive independent evaluation of AI model quality, pricing, and speed.

Frequently Asked Questions

What is the best AI model overall in 2026?

It depends on your use case. For overall quality, Claude Opus 4.5 and GPT-5.2 lead. For cost-efficiency, DeepSeek V3.2 and Qwen3-235B offer 90%+ quality at 1/10th the price. For self-hosting, GLM-4.7 Thinking provides frontier-level performance under an MIT license.

How often are these rankings updated?

We update rankings weekly as new models launch and benchmark data becomes available. Major updates (like new model releases from OpenAI, Anthropic, or Google) are reflected within 24-48 hours.

What benchmarks do you use?

We use the Artificial Analysis Intelligence Index, which combines: GPQA Diamond (PhD-level reasoning), AIME 2025 (competition math), LiveCodeBench (fresh coding problems), and MMLU-Pro (general knowledge).

Can't decide? Compare models side-by-side

Use our interactive tools to explore all 100+ models with custom filters for price, speed, and benchmarks.

Explore All Models Compare Side-by-Side LLM Selector Tool