🔓Updated February 2026

Best Open Source LLMs
February 2026 Rankings

Self-host, fine-tune, and deploy without restrictions. Open source models now hit 90% on LiveCodeBench and 97% on AIME 2025 - rivaling the best proprietary models.

12Models Ranked
$0License Cost
Fine-tuning Freedom

What Changed Since January

NewBiggest Entry

GLM-5 (Reasoning)

Z AI - 203K context

Debuts at #1 with a Quality Index of 49.64, dethroning Kimi K2.5 (Reasoning) for the top spot. Open-weight, commercially usable. Also new: MiniMax-M2.5 enters the rankings.

Month-over-Month

  • GLM-5 (Reasoning) debuts at #1 with Quality Index 49.64
  • Kimi K2.5 (Reasoning) drops to #2 (QI 46.73)
  • MiniMax-M2.5 enters the top tier (QI 41.97)
  • MiMo-V2-Flash (Xiaomi) surges to QI 41.42

🏆 Top 3 Open Source LLMs - February 2026

Complete Open Source Rankings - February 2026

RankModelQualityLiveCodeBenchAIME 2025MMLU-Pro
1

GLM-5 (Reasoning)

Z AI

49.64---
2

Kimi K2.5 (Reasoning)

Kimi

46.7385%96%-
3

MiniMax-M2.5

MiniMax

41.97---
4

GLM-4.7 (Thinking)

Z AI

41.789%95%86%
5

DeepSeek V3.2

DeepSeek

41.286%92%86%
6

Kimi K2 Thinking

Kimi

40.385%95%85%
7

MiniMax-M2.1

MiniMax

39.381%83%88%
8

MiMo-V2-Flash

Xiaomi

3987%96%84%
9

Llama Nemotron Ultra

NVIDIA

3864%64%83%
10

MiniMax-M2

MiniMax

35.783%78%82%
11

DeepSeek V3.2 Speciale

DeepSeek

34.190%97%86%
12

DeepSeek V3.1 Terminus

DeepSeek

33.480%90%85%
Rankings use the Artificial Analysis Intelligence Index as the primary score, with category-specific benchmarks as tiebreakers. Data refreshed daily. Explore all 100+ models →

💻Best Open Source LLMs for Coding

Ranked by LiveCodeBench - the most representative real-world coding benchmark. These models are free to self-host and fine-tune for your codebase.

🖥️Best Models for Ollama & Self-Hosting

Not every model can run on consumer hardware. Here are the best open source models organized by VRAM requirements, perfect for Ollama, vLLM, or Text Generation Inference.

🟢

7B–13B Models

8–16GB VRAM

Run on a single consumer GPU (RTX 4060–4090) or even Apple Silicon Macs.

  • Gemma 3 12B - Google, strong general performance
  • Phi-4 - Microsoft, great for its size
  • Gemma 3n E4B - Ultra-efficient, mobile-capable
🟡

30B–70B Models

24–48GB VRAM

Need an A100/H100 or multi-GPU consumer setup. Sweet spot for quality vs. cost.

  • Qwen3 30B A3B - MoE, fast for its quality
  • EXAONE 4.0 32B - LG, strong reasoning
  • DeepSeek R1 Distill 70B - Reasoning-focused
🔴

200B+ Models (MoE)

80GB+ VRAM

Frontier quality. Needs multi-GPU or cloud. MoE models use only a fraction of total params per token.

  • Qwen3 235B A22B - Only activates 22B/token
  • MiMo-V2-Flash - Xiaomi, 87% LiveCodeBench
  • DeepSeek V3.2 - Battle-tested, 86% LiveCodeBench

💡Self-Hosting Tips

  • • Use GGUF quantization (Q4_K_M) to cut VRAM 50–75% with ~2% quality loss
  • Ollama is the easiest way to start - `ollama run qwen3:30b`
  • • For production, use vLLM or TGI for batching and throughput
  • • Apple Silicon Macs can run 30B models in 32GB unified memory

Why Open Source LLMs?

🔐

Data Privacy

Keep all data on your infrastructure. No API calls to third parties. Critical for healthcare, legal, and enterprise applications.

⚙️

Full Control

Fine-tune for your specific use case. Modify behavior, remove guardrails, or train on proprietary data. No terms of service limitations.

💰

Cost at Scale

At high volumes, self-hosting becomes dramatically cheaper. No per-token fees -just infrastructure costs.

Open Source vs Proprietary - February 2026 Update

Where Open Source Wins

  • Math & reasoning -Kimi K2.5 hits 96% AIME, surpassing most proprietary models
  • Coding parity -GLM-5 and GLM-4.7 rival GPT-5 and Claude on LiveCodeBench
  • Cost at scale -self-hosting is 10-50x cheaper than API for high-volume use
  • Privacy & sovereignty -no data leaves your infrastructure

Where Proprietary Still Leads

  • Multimodal breadth -GPT-5, Gemini 3 Pro still lead on vision + audio + tools
  • Ease of use -one API call, no infrastructure to manage
  • Agentic capabilities -Claude and GPT still have more reliable tool use
  • Safety alignment -more robust RLHF and content safety layers

February verdict: The gap is now single-digit Quality Index points for most practical tasks. Open source wins on cost and privacy; proprietary wins on multimodal and convenience.Read our full analysis →

How to Deploy Open Source LLMs

🚀Easiest Path: API Providers

Use hosted APIs for open models. Get the benefits of open source with the ease of SaaS.

Nebius Token Factory - High-throughput inference on H100/H200 clusters, great pricing

Together.ai - Wide selection, competitive pricing

Fireworks.ai - Fast inference, great for production

Groq - Fastest inference available, ideal for real-time apps

DeepInfra - Cost-effective, good for batch workloads

🖥️Self-Hosting Options

Run models on your own infrastructure for maximum control and privacy.

Ollama - Easiest local setup, great for development and prototyping

vLLM - Production-grade, excellent throughput and batching

Text Generation Inference - HuggingFace's production server

llama.cpp - CPU-friendly, works on older hardware

Compare Open Source Models

Use our interactive tools to compare benchmarks, pricing, and speed for all 12+ open source models side-by-side.

Frequently Asked Questions

What is the best open source LLM in February 2026?

GLM-5 (Reasoning) leads our February 2026 rankings with a Quality Index of 49.64, excelling at coding () and reasoning (). It's completely free to download and use under an open license.

What changed in open source LLM rankings since January 2026?

The biggest change is GLM-5 (Reasoning) debuting at #1 with a Quality Index of 49.64, dethroning Kimi K2.5 (Reasoning).MiniMax-M2.5 is another notable new entry. MiMo-V2-Flash surged to QI 41.42.View January rankings →

What are the best Ollama models in 2026?

By size tier: Small (7-13B) - Gemma 3 12B, Phi-4 for general tasks. Medium (30-70B) - Qwen3 30B A3B, EXAONE 4.0 32B, DeepSeek R1 Distill Llama 70B. Large (200B+ MoE) - Qwen3-235B, DeepSeek V3.2. Use GGUF quantization to cut memory needs 50–75%.

Can open source LLMs match GPT-5 or Claude?

Yes, for most tasks. GLM-5 (Reasoning) achieves on LiveCodeBench, matching top proprietary models on coding. Kimi K2.5 scores 96% on AIME 2025, outperforming most proprietary alternatives on math. The gap has closed to single-digit Quality Index points for practical applications.

What is the best free LLM for coding?

The best free/open source coding LLMs in February 2026: DeepSeek V3.2 Speciale (90% LiveCodeBench), GLM-4.7 (Thinking) (89%), MiMo-V2-Flash (87%). All free to self-host. Via API, providers like Together.ai and Groq offer these at $0.20–0.80/M tokens.

Which open source LLM has the largest context window?

Llama 4 Scout supports up to 10 million tokens. For practical high-quality use, Kimi K2.5 and MiMo-V2-Flash offer 256K tokens, and GLM-4.7 provides 200K - more than enough for most document processing tasks. See our long context rankings for the full comparison.

Is GLM-5 open source?

Yes. GLM-5 (Reasoning) is released by Z AI under an open license with a 203K context window. It leads our February 2026 open source rankings at #1 with a Quality Index of 49.64. It can be self-hosted, fine-tuned, and deployed commercially.

What is Quality Index and how are models ranked?

Quality Index comes from the Artificial Analysis Intelligence Index v4.0 - a composite score evaluating overall model capability across reasoning, coding, math, and knowledge tasks. Rankings use QI as the primary factor, with category-specific benchmarks (LiveCodeBench, AIME 2025, MMLU-Pro) as tiebreakers.

Related Model Rankings

Data sources: Rankings based on the Artificial Analysis Intelligence Index. Explore all models in our interactive explorer or compare models side-by-side.