🔓Live Ranking · Updated Weekly

Best Open Source LLM
2026 Ranking + Ollama Guide

The definitive ranking of open-weight AI models you can self-host, fine-tune, and deploy without restrictions. Ranked by real benchmarks — MMLU-Pro, AIME 2025, and LiveCodeBench. Includes Ollama setup recommendations. Updated weekly.

Self-hostableOllama CompatibleFree to UseFine-tunable

Top 3 Open Source Models

Full Open Source LLM Rankings 2026

RankModelQualityMMLU-ProAIME 2025LiveCodeBenchContext
1

GLM-5 (Reasoning)

Z AI

49.64---203K
2

Kimi K2.5 (Reasoning)

Kimi

46.73-96%85%256K
3

MiniMax-M2.5

MiniMax

41.97---205K
4

GLM-4.7 (Thinking)

Z AI

41.786%95%89%200K
5

DeepSeek V3.2

DeepSeek

41.286%92%86%128K
6

Kimi K2 Thinking

Kimi

40.385%95%85%256K
7

MiniMax-M2.1

MiniMax

39.388%83%81%205K
8

MiMo-V2-Flash

Xiaomi

3984%96%87%256K
9

Llama Nemotron Ultra

NVIDIA

3883%64%64%128K
10

MiniMax-M2

MiniMax

35.782%78%83%205K
11

DeepSeek V3.2 Speciale

DeepSeek

34.186%97%90%128K
12

DeepSeek V3.1 Terminus

DeepSeek

33.485%90%80%128K

Best Open Source Models for Coding

Filtered by LiveCodeBench score — the gold standard for coding benchmarks.

ModelLiveCodeBenchQuality IndexBest For
DeepSeek V3.2 Speciale

DeepSeek

90%34.1Top open source pick overall
GLM-4.7 (Thinking)

Z AI

89%41.7Best for API use (cheap)
MiMo-V2-Flash

Xiaomi

87%39Best for Ollama / local
DeepSeek V3.2

DeepSeek

86%41.2Best for reasoning-heavy tasks
Kimi K2.5 (Reasoning)

Kimi

85%46.73Strong alternative

Best Ollama Models 2026

Ollama makes it easy to run open-weight models locally. Here are the top picks by hardware tier:

8GB VRAM

RTX 3070 · M2 MacBook Air

  • Gemma 3 4B (general)
  • Qwen2.5 7B (coding)
  • Llama 3.2 8B (fast)

16–24GB VRAM

RTX 3090/4090 · M2 Pro/Max

  • Qwen2.5-Coder 32B (coding)
  • DeepSeek Coder V2 16B
  • Mistral Small 22B (general)

48GB+ VRAM

2× RTX 4090 · Mac Studio M2 Ultra

  • Llama 3.3 70B (best general)
  • DeepSeek R1 70B (reasoning)
  • Qwen2.5 72B (top performer)

Tip: Use 4-bit quantization (Q4_K_M) to roughly halve VRAM requirements with minimal quality loss. For example, Llama 3.3 70B at Q4_K_M runs in ~40GB.

Which Open Source LLM Should You Use?

By Use Case

  • Coding: GLM-4.7 Thinking or DeepSeek V3.2 — best LiveCodeBench scores in open source
  • Math & reasoning: Kimi K2.5 or DeepSeek R1 — competitive with o3 on AIME 2025
  • General tasks: Llama 3.3 70B or Qwen2.5 72B — best all-around open models
  • Long documents: Llama 4 Scout (10M ctx) or Kimi K2.5 (256K ctx)

By Deployment

  • Cheap API: DeepSeek V3.2 ($0.35/M tokens) — best cost-performance in 2026
  • Self-host / Ollama: Qwen2.5 32B or Llama 3.3 70B — best community support
  • Fine-tuning: Llama 3.1 8B or Mistral 7B — most LoRA adapters available
  • Commercial use: Check Apache 2.0 models — GLM-4.7, EXAONE 4.0, Mistral family

How This Ranking Works

Only models with openly available weights (Apache 2.0, MIT, Llama community license, or similar open licenses) are included. Rankings use the Artificial Analysis Quality Index as the primary metric, combined with:

MMLU-Pro

Comprehensive knowledge benchmark across 14 domains. Tests breadth of model capability.

AIME 2025

Competition math — tests advanced reasoning. Best signal for math and science tasks.

LiveCodeBench

Contamination-free code generation. Best signal for software development capability.

Compare Open Source Models Side by Side

See live pricing from self-hosting providers, latency, and full benchmark scores for all open source models.

Frequently Asked Questions

What is the best open source LLM in 2026?

GLM-5 (Reasoning) leads open source rankings in 2026 with a Quality Index of 49.64. For API use, DeepSeek V3.2 is the best value at $0.35/M tokens. For Ollama/local use, Qwen2.5-Coder 32B and Llama 3.3 70B have the best community support.

What are the best Ollama models in 2026?

For Ollama in 2026: Qwen2.5-Coder 32B for coding (needs 24GB VRAM), Llama 3.3 70B for general tasks (needs 40GB at Q4), and DeepSeek R1 distilled variants for reasoning. For 8GB VRAM, Gemma 3 4B and Qwen2.5 7B are the best small models.

Can open source LLMs match GPT-5 or Claude in 2026?

For most tasks, yes. The top open source models in 2026 trail proprietary leaders by only 3–8 Quality Index points. On math benchmarks, DeepSeek R1 actually surpasses many proprietary alternatives. The main gaps remain in instruction-following polish, multimodal capability, and very long contexts.

What hardware do I need to run LLMs locally?

Minimum recommendations: 8GB VRAM for 7B models (Gemma 3 4B, Qwen2.5 7B), 24GB VRAM for 32B models (Qwen2.5-Coder 32B, DeepSeek Coder V2 16B), 40GB+ for 70B models at 4-bit quantization. Apple Silicon (M-series) is excellent — 16GB unified memory handles 7B models comfortably, and 64GB handles 32B models well.

Which open source model is best for coding?

GLM-4.7 Thinking leads open source coding models with 89% on LiveCodeBench. For Ollama users, Qwen2.5-Coder 32B is the best local option. For cheap API access, DeepSeek V3.2 at $0.35/M tokens delivers 86% LiveCodeBench — matching Claude 3.5 Sonnet. See the full coding LLM ranking for more detail.

Related Rankings

Data sources: Rankings based on the Artificial Analysis Intelligence Index. Explore all models in our interactive leaderboard or compare models side by side.