🔓Updated February 2026

Best Open Source LLMs
February 2026 Rankings

Self-host, fine-tune, and deploy without restrictions. Open source models now hit 90% on LiveCodeBench and 97% on AIME 2025 - rivaling the best proprietary models.

12Models Ranked
$0License Cost
Fine-tuning Freedom

What Changed Since January

NewBiggest Entry

Kimi K2.5 (Reasoning)

MoonshotAI - 1T parameters, 256K context

Debuts at #2 with a Quality Index of 46.77. Scores 96% on AIME 2025 (best of any open source model) and 85% on LiveCodeBench. Open-weight, commercially usable.

Month-over-Month

  • Kimi K2.5 (Reasoning) holds #1 with Quality Index 46.77
  • Kimi K2.5 enters at #2 - strongest new open source model this cycle
  • DeepSeek V3.2 holds steady in the top tier
  • MiMo-V2-Flash (Xiaomi) continues to impress for its size

🏆 Top 3 Open Source LLMs - February 2026

Complete Open Source Rankings - February 2026

RankModelQualityLiveCodeBenchAIME 2025MMLU-Pro
1

Kimi K2.5 (Reasoning)

Kimi

46.7785%96%-
2

GLM-4.7 (Thinking)

Z AI

41.789%95%86%
3

DeepSeek V3.2

DeepSeek

41.286%92%86%
4

Kimi K2 Thinking

Kimi

40.385%95%85%
5

MiniMax-M2.1

MiniMax

39.381%83%88%
6

MiMo-V2-Flash

Xiaomi

3987%96%84%
7

Llama Nemotron Ultra

NVIDIA

3864%64%83%
8

MiniMax-M2

MiniMax

35.783%78%82%
9

DeepSeek V3.2 Speciale

DeepSeek

34.190%97%86%
10

DeepSeek V3.1 Terminus

DeepSeek

33.480%90%85%
11

gpt-oss-120B (high)

OpenAI

32.988%93%81%
12

GLM-4.6

Z AI

32.256%44%78%
Rankings use the Artificial Analysis Intelligence Index as the primary score, with category-specific benchmarks as tiebreakers. Data refreshed daily. Explore all 100+ models →

💻Best Open Source LLMs for Coding

Ranked by LiveCodeBench - the most representative real-world coding benchmark. These models are free to self-host and fine-tune for your codebase.

🖥️Best Models for Ollama & Self-Hosting

Not every model can run on consumer hardware. Here are the best open source models organized by VRAM requirements, perfect for Ollama, vLLM, or Text Generation Inference.

🟢

7B–13B Models

8–16GB VRAM

Run on a single consumer GPU (RTX 4060–4090) or even Apple Silicon Macs.

  • Gemma 3 12B - Google, strong general performance
  • Phi-4 - Microsoft, great for its size
  • Gemma 3n E4B - Ultra-efficient, mobile-capable
🟡

30B–70B Models

24–48GB VRAM

Need an A100/H100 or multi-GPU consumer setup. Sweet spot for quality vs. cost.

  • Qwen3 30B A3B - MoE, fast for its quality
  • EXAONE 4.0 32B - LG, strong reasoning
  • DeepSeek R1 Distill 70B - Reasoning-focused
🔴

200B+ Models (MoE)

80GB+ VRAM

Frontier quality. Needs multi-GPU or cloud. MoE models use only a fraction of total params per token.

  • Qwen3 235B A22B - Only activates 22B/token
  • MiMo-V2-Flash - Xiaomi, 87% LiveCodeBench
  • DeepSeek V3.2 - Battle-tested, 86% LiveCodeBench

💡Self-Hosting Tips

  • • Use GGUF quantization (Q4_K_M) to cut VRAM 50–75% with ~2% quality loss
  • Ollama is the easiest way to start - `ollama run qwen3:30b`
  • • For production, use vLLM or TGI for batching and throughput
  • • Apple Silicon Macs can run 30B models in 32GB unified memory

Why Open Source LLMs?

🔐

Data Privacy

Keep all data on your infrastructure. No API calls to third parties. Critical for healthcare, legal, and enterprise applications.

⚙️

Full Control

Fine-tune for your specific use case. Modify behavior, remove guardrails, or train on proprietary data. No terms of service limitations.

💰

Cost at Scale

At high volumes, self-hosting becomes dramatically cheaper. No per-token fees -just infrastructure costs.

Open Source vs Proprietary - February 2026 Update

Where Open Source Wins

  • Math & reasoning -Kimi K2.5 hits 96% AIME, surpassing most proprietary models
  • Coding parity -GLM-4.7 at 89% LiveCodeBench rivals GPT-5 and Claude
  • Cost at scale -self-hosting is 10-50x cheaper than API for high-volume use
  • Privacy & sovereignty -no data leaves your infrastructure

Where Proprietary Still Leads

  • Multimodal breadth -GPT-5, Gemini 3 Pro still lead on vision + audio + tools
  • Ease of use -one API call, no infrastructure to manage
  • Agentic capabilities -Claude and GPT still have more reliable tool use
  • Safety alignment -more robust RLHF and content safety layers

February verdict: The gap is now single-digit Quality Index points for most practical tasks. Open source wins on cost and privacy; proprietary wins on multimodal and convenience.Read our full analysis →

How to Deploy Open Source LLMs

🚀Easiest Path: API Providers

Use hosted APIs for open models. Get the benefits of open source with the ease of SaaS.

Nebius Token Factory - High-throughput inference on H100/H200 clusters, great pricing

Together.ai - Wide selection, competitive pricing

Fireworks.ai - Fast inference, great for production

Groq - Fastest inference available, ideal for real-time apps

DeepInfra - Cost-effective, good for batch workloads

🖥️Self-Hosting Options

Run models on your own infrastructure for maximum control and privacy.

Ollama - Easiest local setup, great for development and prototyping

vLLM - Production-grade, excellent throughput and batching

Text Generation Inference - HuggingFace's production server

llama.cpp - CPU-friendly, works on older hardware

Compare Open Source Models

Use our interactive tools to compare benchmarks, pricing, and speed for all 12+ open source models side-by-side.

Frequently Asked Questions

What is the best open source LLM in February 2026?

Kimi K2.5 (Reasoning) leads our February 2026 rankings with a Quality Index of 46.77, excelling at coding (85%) and reasoning (96%). It's completely free to download and use under an open license.

What changed in open source LLM rankings since January 2026?

The biggest change is Kimi K2.5 (Reasoning) entering at #2 with a Quality Index of 46.77, scoring 96% on AIME 2025 and 85% on LiveCodeBench. GLM-4.7 retains #1. DeepSeek V3.2 and MiniMax-M2.1 remain strong top-tier contenders.View January rankings →

What are the best Ollama models in 2026?

By size tier: Small (7-13B) - Gemma 3 12B, Phi-4 for general tasks. Medium (30-70B) - Qwen3 30B A3B, EXAONE 4.0 32B, DeepSeek R1 Distill Llama 70B. Large (200B+ MoE) - Qwen3-235B, DeepSeek V3.2. Use GGUF quantization to cut memory needs 50–75%.

Can open source LLMs match GPT-5 or Claude?

Yes, for most tasks. Kimi K2.5 (Reasoning) achieves 85% on LiveCodeBench, matching top proprietary models on coding. Kimi K2.5 scores 96% on AIME 2025, outperforming most proprietary alternatives on math. The gap has closed to single-digit Quality Index points for practical applications.

What is the best free LLM for coding?

The best free/open source coding LLMs in February 2026: DeepSeek V3.2 Speciale (90% LiveCodeBench), GLM-4.7 (Thinking) (89%), gpt-oss-120B (high) (88%). All free to self-host. Via API, providers like Together.ai and Groq offer these at $0.20–0.80/M tokens.

Which open source LLM has the largest context window?

Llama 4 Scout supports up to 10 million tokens. For practical high-quality use, Kimi K2.5 and MiMo-V2-Flash offer 256K tokens, and GLM-4.7 provides 200K - more than enough for most document processing tasks. See our long context rankings for the full comparison.

Is Kimi K2.5 open source?

Yes. Kimi K2.5 (Reasoning) is released by MoonshotAI under an open license. It has 1 trillion parameters with a 256K context window and can be commercially deployed, fine-tuned, and self-hosted. It debuted in our February 2026 rankings at #2 with a Quality Index of 46.77.

What is Quality Index and how are models ranked?

Quality Index comes from the Artificial Analysis Intelligence Index v4.0 - a composite score evaluating overall model capability across reasoning, coding, math, and knowledge tasks. Rankings use QI as the primary factor, with category-specific benchmarks (LiveCodeBench, AIME 2025, MMLU-Pro) as tiebreakers.

Related Model Rankings

Data sources: Rankings based on the Artificial Analysis Intelligence Index. Explore all models in our interactive explorer or compare models side-by-side.