Model hubOpen weights12 min read

DeepSeek models: what to use, what to skip, and where to run them

DeepSeek is not one model. It is a family with very different personalities, from reasoning-focused variants to high-throughput general models. The trap is treating "DeepSeek" as a single checkbox. The reality is that your biggest win often comes from choosing the right provider endpoint, not just the right weights.

By Dylan Bristot

DeepSeek models in dataset
25
Provider endpoints
84
Unique providers
29
Most recent release in dataset
Dec 1, 2025
Data source: Artificial Analysis model and provider leaderboards (exported January 9, 2026).

Who builds DeepSeek

DeepSeek is a Chinese AI lab. It has been widely reported as founded by Liang Wenfeng, and associated with High-Flyer, a quantitative trading firm. The reason this matters is not nationalism or drama. It is incentives. A lab that can fund long training runs without selling an enterprise product can optimize for research outcomes, publish more, and move faster.

Most people talk about DeepSeek like it is only weights. In practice, the DeepSeek story is about a loop: fast iteration on training recipes, aggressive efficiency work, then rapid distribution across providers. That distribution is what shows up in your costs and latency today.

Sources for company context: Wikipedia, TechTarget.

Model evolution: a timeline you can skim

DeepSeek has released a lot of names. Some are genuine architectural jumps, others are post-training revisions, and others are packaging choices that show up as "reasoning" vs "non-reasoning" in downstream distributions. This is a selected timeline intended for orientation, not an exhaustive catalog.

Release windowFamilyWhat changedWhy it matters
Late 2023DeepSeek-CoderEarly coding-specialized releases and instruct variants.Established the lab as serious about developer workloads.
Mid 2024DeepSeek-V2, Coder-V2More competitive general models and stronger coding variants.Triggered price pressure across the China hosting ecosystem.
Dec 2024DeepSeek-V3Mixture of experts style scaling that moved the quality ceiling for open weights.Created a base that later reasoning variants could build on.
Jan 2025DeepSeek-R1Reasoning-focused releases with heavier post-training.Marked the shift from "good open weights" to "serious reasoning open weights".
2025V3 revisions, R1 revisionsIterative improvements and packaging for tool use.Most buyers feel this as better endpoints and better defaults, not as a new paper.

Sources for the release overview: TechTarget, Tom's Hardware, Le Monde.

Reasoning vs non-reasoning variants

In practice, "reasoning" is not a brand. It is an operating mode. Reasoning variants tend to spend more tokens thinking, which can raise cost and push up latency, but they can also be meaningfully stronger on multi-step tasks. Non-reasoning variants are often better defaults for interactive products where speed and throughput matter.

Below is a summary computed from the provider endpoints in your dataset. Classification is based on the model name labels that appear in the AA dump, such as "Reasoning", "Thinking", and "Non-reasoning". It is a pragmatic grouping, not a claim about internal architecture.

GroupModelsEndpointsProvidersMedian blended $/1MBest blended $/1MMedian speed tok/sMedian TTFT s
Reasoning51711$0.6350$0.3345431.40
Non-reasoning42617$0.6725$0.1350600.89
Other164118$1.450$0.0750510.86

What most people get wrong about DeepSeek

Most discussions compress DeepSeek into a single point on a chart. That is a category error. You are actually making two decisions: which DeepSeek variant fits your workload, and which provider delivers the economics and latency you need. For many teams, the provider decision moves the needle more than switching between closely related variants.

This guide is built for builders. It is not a hype piece, and it is not a benchmark dump. The goal is to help you choose a default, understand when it fails, and make the provider choice with eyes open.

The headline numbers

In the current dataset, there are 25 DeepSeek models with a non-zero quality index and 84 provider endpoints across 29 hosts. DeepSeek is widely distributed, which is exactly why pricing and latency can vary so much.

The current ceiling model in this dataset

The highest quality DeepSeek model in this dataset is DeepSeek V3.2 (Reasoning) at QI 41.2, with a 128,000 token context window. Use it when quality is the priority and you can afford the endpoint tradeoffs.

The value pick

If you care about quality per dollar, the best value pick in the dataset is DeepSeek V3.2 (Non-reasoning). Its cheapest observed blended price is $0.1350 per 1M tokens on GMI. Do not read this as "always choose the cheapest". Read it as a shortlist candidate for high-volume workloads.

Why DeepSeek stands out

The most important DeepSeek advantage is not a single benchmark. It is the combination of fast iteration and aggressive distribution. When a model family shows up across many providers, competition forces down blended price and increases the chance that at least one host has a strong throughput or latency profile. That is why the "where to run it" question matters so much.

The second advantage is that DeepSeek releases have consistently targeted efficiency. Efficiency is not only training cost. It is inference economics, routing, quantization friendliness, and real-world provider viability. If your workload is high-volume, this matters more than a small difference in a single benchmark.

Provider reality: same model, very different outcomes

DeepSeek endpoints can differ across price, throughput, and time to first token. If you are building an interactive product, time to first token often matters as much as raw throughput. If you are doing batch workloads, blended price and throughput dominate. Use the explorer below to pick the model you care about and sort providers by the metric that maps to your actual constraints.

Provider explorer
Pick a DeepSeek model and see which providers are cheapest, fastest, or lowest latency.
ProviderInput $/1MOutput $/1MBlended $/1MSpeed tok/sTTFT s
Novita0.26900.40000.3345321.45
SiliconFlow (FP8)0.27000.42000.3450432.20
DeepSeek0.28000.42000.3500291.38
Parasail (FP8)0.28000.45000.365081.21
Baseten0.30000.45000.3750634.69
Google Vertex0.56001.6801.120510.52
Prices are in dollars per 1M tokens. Speed and latency are medians from the AA provider leaderboard.
Compare DeepSeek in the app
Data source: Artificial Analysis. This page is a research aid, not a guarantee of provider availability or pricing.

A practical shortlist, with provider signals

Below is a compact table for decision-making. It lists the top DeepSeek models by quality index, then shows the cheapest observed endpoint, plus the fastest and lowest-latency endpoints where available. This is the easiest way to spot models that look good on paper but lack mature provider coverage.

ModelQIContextCheapest providerBlended $/1MFastest providerTok/sLowest latency providerTTFT s
DeepSeek V3.2 (Reasoning)41.2128,000Novita$0.3345Baseten63Google Vertex0.52
DeepSeek V3.2 Speciale34.1128,000Parasail (FP8)$0.4500Parasail (FP8)26Parasail (FP8)0.88
DeepSeek V3.1 Terminus (Reasoning)33.4128,000Novita (FP8)$0.6350SambaNova172Novita (FP8)1.57
DeepSeek V3.2 Exp (Reasoning)32.5128,000Novita (FP8)$0.3400Novita (FP8)37Novita (FP8)0.80
DeepSeek V3.2 (Non-reasoning)31.8128,000GMI$0.1350Fireworks218Google Vertex0.51
DeepSeek V3.2 Exp (Non-reasoning)28.1128,000DeepInfra$0.2650Novita (FP8)33Novita (FP8)0.78
DeepSeek V3.1 Terminus (Non-reasoning)28128,000DeepInfra (FP4)$0.5000SambaNova271Fireworks0.52
DeepSeek V3.1 (Reasoning)27.9128,000GMI (FP8)$0.6350Google Vertex293Google Vertex0.67

When not to use DeepSeek

DeepSeek is not a universal default. If you need the most conservative behavior under uncertainty, or you are shipping into strict safety and compliance constraints, you may prefer a proprietary model with mature policy tooling and stable enterprise routing. If you are extremely latency-sensitive and cannot tolerate cold starts, pick the provider first, then the model, and validate time to first token on your own prompts.

How to use this page to make a decision

If you are starting from scratch, here is a simple workflow. Pick two models. One should be the ceiling model by quality. One should be the value pick. Then choose two providers for each: one optimized for cost, one optimized for latency. Run your own prompt suite and measure quality regressions and tail latencies. After that, the decision usually becomes obvious.

Keep exploring

If you want the bigger context around why open weights are compounding, read open source vs proprietary LLMs and three forces that broke OpenAI's moat. If your workload is code-heavy, start with best coding models. If you are looking for broad open-weights coverage, use best open source models. If you are explicitly optimizing for hard reasoning tasks, start with best novel reasoning models.

Try it in WhatLLM

Want to sanity check the provider choice in context? Jump into the compare tool and start from a real model, not a vague family label.

Note: prices are taken from the dataset snapshot and can change. Always verify current provider pricing before committing.