🔓Updated February 2026

Best Open Source LLMs
February 2026 Rankings

Q: What is the best open source LLM in February 2026?

Kimi K2.5 (Reasoning) leads the February 2026 open source rankings with a Quality Index of 46.77. It is free to self-host under an open license.

Q: What changed in open source LLM rankings since January 2026?

The biggest change is Kimi K2.5 (Reasoning) entering at #2 with a Quality Index of 46.77, scoring 96% on AIME 2025 and 85% on LiveCodeBench. GLM-4.7 retains the top spot. DeepSeek V3.2 and MiniMax-M2.1 remain strong contenders.

Q: What are the best Ollama models in 2026?

For Ollama in 2026, top picks by size tier: 7B - Qwen3 Next 80B A3B (quantized), Gemma 3 12B; 32B - EXAONE 4.0 32B, Qwen3 30B A3B; 70B - DeepSeek R1 Distill Llama 70B, Hermes 4 70B. All run well with Ollama quantization.

Q: Can open source LLMs match GPT-5 or Claude?

Yes, for most tasks. GLM-4.7 (Thinking) achieves 89% on LiveCodeBench, competitive with top proprietary models. On AIME 2025 math, Kimi K2.5 scores 96%, surpassing most proprietary alternatives. The gap has closed to single-digit Quality Index points.

Q: What is the best free LLM for coding?

The best free/open source LLMs for coding in February 2026 are: GLM-4.7 (Thinking) at 89% LiveCodeBench, MiMo-V2-Flash at 87%, Kimi K2.5 at 85%, and DeepSeek V3.2 at 86%. All are open-weight and can be self-hosted or accessed via affordable API providers.

Q: Which open source LLM has the largest context window?

Llama 4 Scout supports up to 10 million tokens. For practical high-quality use, Kimi K2.5 and MiMo-V2-Flash both offer 256K tokens, and GLM-4.7 provides 200K tokens - more than enough for most document processing tasks.

Q: Is Kimi K2.5 open source?

Yes. Kimi K2.5 (Reasoning) is released under an open license by MoonshotAI (Kimi). It has 1 trillion parameters and can be self-hosted, fine-tuned, and deployed commercially. It debuted in February 2026 rankings at #2 with a Quality Index of 46.77.

Q: What is Quality Index and how are models ranked?

Quality Index is from the Artificial Analysis Intelligence Index v4.0 - a composite score evaluating overall model capability across reasoning, coding, math, and knowledge tasks. Rankings use QI as the primary factor, with category-specific benchmarks (LiveCodeBench, AIME 2025, MMLU-Pro) as tiebreakers.

Self-host, fine-tune, and deploy without restrictions. Open source models now hit 90% on LiveCodeBench and 97% on AIME 2025 - rivaling the best proprietary models.

12Models Ranked

$0License Cost

∞Fine-tuning Freedom

What Changed Since January

NewBiggest Entry

Kimi K2.5 (Reasoning)

MoonshotAI - 1T parameters, 256K context

Debuts at #2 with a Quality Index of 46.77. Scores 96% on AIME 2025 (best of any open source model) and 85% on LiveCodeBench. Open-weight, commercially usable.

Month-over-Month

→Kimi K2.5 (Reasoning) holds #1 with Quality Index 46.77
↑Kimi K2.5 enters at #2 - strongest new open source model this cycle
→DeepSeek V3.2 holds steady in the top tier
→MiMo-V2-Flash (Xiaomi) continues to impress for its size

← View January 2026 Rankings

🏆 Top 3 Open Source LLMs - February 2026

🥇

Quality Index

46.77

Kimi K2.5 (Reasoning)

Kimi

LiveCodeBench85%

AIME 202596%

🔓 Open Weights256K ctx

🥈

Quality Index

41.7

GLM-4.7 (Thinking)

Z AI

LiveCodeBench89%

AIME 202595%

🔓 Open Weights200K ctx

🥉

Quality Index

41.2

DeepSeek V3.2

DeepSeek

LiveCodeBench86%

AIME 202592%

🔓 Open Weights128K ctx

Complete Open Source Rankings - February 2026

Rank	Model	Quality	LiveCodeBench	AIME 2025	MMLU-Pro
1	Kimi K2.5 (Reasoning) Kimi	46.77	85%	96%	-
2	GLM-4.7 (Thinking) Z AI	41.7	89%	95%	86%
3	DeepSeek V3.2 DeepSeek	41.2	86%	92%	86%
4	Kimi K2 Thinking Kimi	40.3	85%	95%	85%
5	MiniMax-M2.1 MiniMax	39.3	81%	83%	88%
6	MiMo-V2-Flash Xiaomi	39	87%	96%	84%
7	Llama Nemotron Ultra NVIDIA	38	64%	64%	83%
8	MiniMax-M2 MiniMax	35.7	83%	78%	82%
9	DeepSeek V3.2 Speciale DeepSeek	34.1	90%	97%	86%
10	DeepSeek V3.1 Terminus DeepSeek	33.4	80%	90%	85%
11	gpt-oss-120B (high) OpenAI	32.9	88%	93%	81%
12	GLM-4.6 Z AI	32.2	56%	44%	78%

Rankings use the Artificial Analysis Intelligence Index as the primary score, with category-specific benchmarks as tiebreakers. Data refreshed daily. Explore all 100+ models →

💻Best Open Source LLMs for Coding

Ranked by LiveCodeBench - the most representative real-world coding benchmark. These models are free to self-host and fine-tune for your codebase.

DeepSeek V3.2 Speciale

DeepSeek

LiveCodeBench

90%

GLM-4.7 (Thinking)

Z AI

LiveCodeBench

89%

gpt-oss-120B (high)

OpenAI

LiveCodeBench

88%

MiMo-V2-Flash

Xiaomi

LiveCodeBench

87%

DeepSeek V3.2

DeepSeek

LiveCodeBench

86%

See full coding rankings (incl. proprietary) →|Compare coding models side-by-side →

🖥️Best Models for Ollama & Self-Hosting

Not every model can run on consumer hardware. Here are the best open source models organized by VRAM requirements, perfect for Ollama, vLLM, or Text Generation Inference.

🟢

7B–13B Models

8–16GB VRAM

Run on a single consumer GPU (RTX 4060–4090) or even Apple Silicon Macs.

Gemma 3 12B - Google, strong general performance
Phi-4 - Microsoft, great for its size
Gemma 3n E4B - Ultra-efficient, mobile-capable

🟡

30B–70B Models

24–48GB VRAM

Need an A100/H100 or multi-GPU consumer setup. Sweet spot for quality vs. cost.

Qwen3 30B A3B - MoE, fast for its quality
EXAONE 4.0 32B - LG, strong reasoning
DeepSeek R1 Distill 70B - Reasoning-focused

🔴

200B+ Models (MoE)

80GB+ VRAM

Frontier quality. Needs multi-GPU or cloud. MoE models use only a fraction of total params per token.

Qwen3 235B A22B - Only activates 22B/token
MiMo-V2-Flash - Xiaomi, 87% LiveCodeBench
DeepSeek V3.2 - Battle-tested, 86% LiveCodeBench

💡Self-Hosting Tips

• Use GGUF quantization (Q4_K_M) to cut VRAM 50–75% with ~2% quality loss
• Ollama is the easiest way to start - `ollama run qwen3:30b`
• For production, use vLLM or TGI for batching and throughput
• Apple Silicon Macs can run 30B models in 32GB unified memory

Explore all models with price and speed filters →

Why Open Source LLMs?

🔐

Data Privacy

Keep all data on your infrastructure. No API calls to third parties. Critical for healthcare, legal, and enterprise applications.

⚙️

Full Control

Fine-tune for your specific use case. Modify behavior, remove guardrails, or train on proprietary data. No terms of service limitations.

💰

Cost at Scale

At high volumes, self-hosting becomes dramatically cheaper. No per-token fees -just infrastructure costs.

Open Source vs Proprietary - February 2026 Update

Where Open Source Wins

✓Math & reasoning -Kimi K2.5 hits 96% AIME, surpassing most proprietary models
✓Coding parity -GLM-4.7 at 89% LiveCodeBench rivals GPT-5 and Claude
✓Cost at scale -self-hosting is 10-50x cheaper than API for high-volume use
✓Privacy & sovereignty -no data leaves your infrastructure

Where Proprietary Still Leads

✓Multimodal breadth -GPT-5, Gemini 3 Pro still lead on vision + audio + tools
✓Ease of use -one API call, no infrastructure to manage
✓Agentic capabilities -Claude and GPT still have more reliable tool use
✓Safety alignment -more robust RLHF and content safety layers

February verdict: The gap is now single-digit Quality Index points for most practical tasks. Open source wins on cost and privacy; proprietary wins on multimodal and convenience.Read our full analysis →

How to Deploy Open Source LLMs

🚀Easiest Path: API Providers

Use hosted APIs for open models. Get the benefits of open source with the ease of SaaS.

Nebius Token Factory - High-throughput inference on H100/H200 clusters, great pricing

Together.ai - Wide selection, competitive pricing

Fireworks.ai - Fast inference, great for production

Groq - Fastest inference available, ideal for real-time apps

DeepInfra - Cost-effective, good for batch workloads

Compare all providers →

🖥️Self-Hosting Options

Run models on your own infrastructure for maximum control and privacy.

Ollama - Easiest local setup, great for development and prototyping

vLLM - Production-grade, excellent throughput and batching

Text Generation Inference - HuggingFace's production server

llama.cpp - CPU-friendly, works on older hardware

Compare Open Source Models

Use our interactive tools to compare benchmarks, pricing, and speed for all 12+ open source models side-by-side.

Compare Open Source Models Explore All Models LLM Selector Tool

Frequently Asked Questions

What is the best open source LLM in February 2026?

Kimi K2.5 (Reasoning) leads our February 2026 rankings with a Quality Index of 46.77, excelling at coding (85%) and reasoning (96%). It's completely free to download and use under an open license.

What changed in open source LLM rankings since January 2026?

The biggest change is Kimi K2.5 (Reasoning) entering at #2 with a Quality Index of 46.77, scoring 96% on AIME 2025 and 85% on LiveCodeBench. GLM-4.7 retains #1. DeepSeek V3.2 and MiniMax-M2.1 remain strong top-tier contenders.View January rankings →

What are the best Ollama models in 2026?

By size tier: Small (7-13B) - Gemma 3 12B, Phi-4 for general tasks. Medium (30-70B) - Qwen3 30B A3B, EXAONE 4.0 32B, DeepSeek R1 Distill Llama 70B. Large (200B+ MoE) - Qwen3-235B, DeepSeek V3.2. Use GGUF quantization to cut memory needs 50–75%.

Can open source LLMs match GPT-5 or Claude?

Yes, for most tasks. Kimi K2.5 (Reasoning) achieves 85% on LiveCodeBench, matching top proprietary models on coding. Kimi K2.5 scores 96% on AIME 2025, outperforming most proprietary alternatives on math. The gap has closed to single-digit Quality Index points for practical applications.

What is the best free LLM for coding?

The best free/open source coding LLMs in February 2026: DeepSeek V3.2 Speciale (90% LiveCodeBench), GLM-4.7 (Thinking) (89%), gpt-oss-120B (high) (88%). All free to self-host. Via API, providers like Together.ai and Groq offer these at $0.20–0.80/M tokens.

Which open source LLM has the largest context window?

Llama 4 Scout supports up to 10 million tokens. For practical high-quality use, Kimi K2.5 and MiMo-V2-Flash offer 256K tokens, and GLM-4.7 provides 200K - more than enough for most document processing tasks. See our long context rankings for the full comparison.

Is Kimi K2.5 open source?

Yes. Kimi K2.5 (Reasoning) is released by MoonshotAI under an open license. It has 1 trillion parameters with a 256K context window and can be commercially deployed, fine-tuned, and self-hosted. It debuted in our February 2026 rankings at #2 with a Quality Index of 46.77.

What is Quality Index and how are models ranked?

Quality Index comes from the Artificial Analysis Intelligence Index v4.0 - a composite score evaluating overall model capability across reasoning, coding, math, and knowledge tasks. Rankings use QI as the primary factor, with category-specific benchmarks (LiveCodeBench, AIME 2025, MMLU-Pro) as tiebreakers.

Related Model Rankings

💻

What Changed Since January

Kimi K2.5 (Reasoning)

Month-over-Month

🏆 Top 3 Open Source LLMs - February 2026

Kimi K2.5 (Reasoning)

GLM-4.7 (Thinking)

DeepSeek V3.2

Complete Open Source Rankings - February 2026

💻Best Open Source LLMs for Coding

🖥️Best Models for Ollama & Self-Hosting

7B–13B Models

30B–70B Models

200B+ Models (MoE)

💡Self-Hosting Tips

Why Open Source LLMs?

Data Privacy

Full Control

Cost at Scale

Open Source vs Proprietary - February 2026 Update

Where Open Source Wins

Where Proprietary Still Leads

How to Deploy Open Source LLMs

🚀Easiest Path: API Providers

🖥️Self-Hosting Options

Compare Open Source Models

Frequently Asked Questions

What is the best open source LLM in February 2026?

What changed in open source LLM rankings since January 2026?

What are the best Ollama models in 2026?

Can open source LLMs match GPT-5 or Claude?

What is the best free LLM for coding?

Which open source LLM has the largest context window?

Is Kimi K2.5 open source?

What is Quality Index and how are models ranked?

Related Model Rankings

Coding Models

Math & Reasoning

Budget LLMs

Agentic Models

Long Context

Vision Models

Top 3 Overall

All Rankings