Who builds DeepSeek
DeepSeek is a Chinese AI lab. It has been widely reported as founded by Liang Wenfeng, and associated with High-Flyer, a quantitative trading firm. The reason this matters is not nationalism or drama. It is incentives. A lab that can fund long training runs without selling an enterprise product can optimize for research outcomes, publish more, and move faster.
Most people talk about DeepSeek like it is only weights. In practice, the DeepSeek story is about a loop: fast iteration on training recipes, aggressive efficiency work, then rapid distribution across providers. That distribution is what shows up in your costs and latency today.
Sources for company context: Wikipedia, TechTarget.
Model evolution: a timeline you can skim
DeepSeek has released a lot of names. Some are genuine architectural jumps, others are post-training revisions, and others are packaging choices that show up as "reasoning" vs "non-reasoning" in downstream distributions. This is a selected timeline intended for orientation, not an exhaustive catalog.
| Release window | Family | What changed | Why it matters |
|---|---|---|---|
| Late 2023 | DeepSeek-Coder | Early coding-specialized releases and instruct variants. | Established the lab as serious about developer workloads. |
| Mid 2024 | DeepSeek-V2, Coder-V2 | More competitive general models and stronger coding variants. | Triggered price pressure across the China hosting ecosystem. |
| Dec 2024 | DeepSeek-V3 | Mixture of experts style scaling that moved the quality ceiling for open weights. | Created a base that later reasoning variants could build on. |
| Jan 2025 | DeepSeek-R1 | Reasoning-focused releases with heavier post-training. | Marked the shift from "good open weights" to "serious reasoning open weights". |
| 2025 | V3 revisions, R1 revisions | Iterative improvements and packaging for tool use. | Most buyers feel this as better endpoints and better defaults, not as a new paper. |
Sources for the release overview: TechTarget, Tom's Hardware, Le Monde.
Reasoning vs non-reasoning variants
In practice, "reasoning" is not a brand. It is an operating mode. Reasoning variants tend to spend more tokens thinking, which can raise cost and push up latency, but they can also be meaningfully stronger on multi-step tasks. Non-reasoning variants are often better defaults for interactive products where speed and throughput matter.
Below is a summary computed from the provider endpoints in your dataset. Classification is based on the model name labels that appear in the AA dump, such as "Reasoning", "Thinking", and "Non-reasoning". It is a pragmatic grouping, not a claim about internal architecture.
| Group | Models | Endpoints | Providers | Median blended $/1M | Best blended $/1M | Median speed tok/s | Median TTFT s |
|---|---|---|---|---|---|---|---|
| Reasoning | 5 | 17 | 11 | $0.6350 | $0.3345 | 43 | 1.40 |
| Non-reasoning | 4 | 26 | 17 | $0.6725 | $0.1350 | 60 | 0.89 |
| Other | 16 | 41 | 18 | $1.450 | $0.0750 | 51 | 0.86 |
What most people get wrong about DeepSeek
Most discussions compress DeepSeek into a single point on a chart. That is a category error. You are actually making two decisions: which DeepSeek variant fits your workload, and which provider delivers the economics and latency you need. For many teams, the provider decision moves the needle more than switching between closely related variants.
This guide is built for builders. It is not a hype piece, and it is not a benchmark dump. The goal is to help you choose a default, understand when it fails, and make the provider choice with eyes open.
The headline numbers
In the current dataset, there are 25 DeepSeek models with a non-zero quality index and 84 provider endpoints across 29 hosts. DeepSeek is widely distributed, which is exactly why pricing and latency can vary so much.
The current ceiling model in this dataset
The highest quality DeepSeek model in this dataset is DeepSeek V3.2 (Reasoning) at QI 41.2, with a 128,000 token context window. Use it when quality is the priority and you can afford the endpoint tradeoffs.
The value pick
If you care about quality per dollar, the best value pick in the dataset is DeepSeek V3.2 (Non-reasoning). Its cheapest observed blended price is $0.1350 per 1M tokens on GMI. Do not read this as "always choose the cheapest". Read it as a shortlist candidate for high-volume workloads.
Why DeepSeek stands out
The most important DeepSeek advantage is not a single benchmark. It is the combination of fast iteration and aggressive distribution. When a model family shows up across many providers, competition forces down blended price and increases the chance that at least one host has a strong throughput or latency profile. That is why the "where to run it" question matters so much.
The second advantage is that DeepSeek releases have consistently targeted efficiency. Efficiency is not only training cost. It is inference economics, routing, quantization friendliness, and real-world provider viability. If your workload is high-volume, this matters more than a small difference in a single benchmark.
Provider reality: same model, very different outcomes
DeepSeek endpoints can differ across price, throughput, and time to first token. If you are building an interactive product, time to first token often matters as much as raw throughput. If you are doing batch workloads, blended price and throughput dominate. Use the explorer below to pick the model you care about and sort providers by the metric that maps to your actual constraints.
| Provider | Input $/1M | Output $/1M | Blended $/1M | Speed tok/s | TTFT s |
|---|---|---|---|---|---|
| Novita | 0.2690 | 0.4000 | 0.3345 | 32 | 1.45 |
| SiliconFlow (FP8) | 0.2700 | 0.4200 | 0.3450 | 43 | 2.20 |
| DeepSeek | 0.2800 | 0.4200 | 0.3500 | 29 | 1.38 |
| Parasail (FP8) | 0.2800 | 0.4500 | 0.3650 | 8 | 1.21 |
| Baseten | 0.3000 | 0.4500 | 0.3750 | 63 | 4.69 |
| Google Vertex | 0.5600 | 1.680 | 1.120 | 51 | 0.52 |
A practical shortlist, with provider signals
Below is a compact table for decision-making. It lists the top DeepSeek models by quality index, then shows the cheapest observed endpoint, plus the fastest and lowest-latency endpoints where available. This is the easiest way to spot models that look good on paper but lack mature provider coverage.
| Model | QI | Context | Cheapest provider | Blended $/1M | Fastest provider | Tok/s | Lowest latency provider | TTFT s |
|---|---|---|---|---|---|---|---|---|
| DeepSeek V3.2 (Reasoning) | 41.2 | 128,000 | Novita | $0.3345 | Baseten | 63 | Google Vertex | 0.52 |
| DeepSeek V3.2 Speciale | 34.1 | 128,000 | Parasail (FP8) | $0.4500 | Parasail (FP8) | 26 | Parasail (FP8) | 0.88 |
| DeepSeek V3.1 Terminus (Reasoning) | 33.4 | 128,000 | Novita (FP8) | $0.6350 | SambaNova | 172 | Novita (FP8) | 1.57 |
| DeepSeek V3.2 Exp (Reasoning) | 32.5 | 128,000 | Novita (FP8) | $0.3400 | Novita (FP8) | 37 | Novita (FP8) | 0.80 |
| DeepSeek V3.2 (Non-reasoning) | 31.8 | 128,000 | GMI | $0.1350 | Fireworks | 218 | Google Vertex | 0.51 |
| DeepSeek V3.2 Exp (Non-reasoning) | 28.1 | 128,000 | DeepInfra | $0.2650 | Novita (FP8) | 33 | Novita (FP8) | 0.78 |
| DeepSeek V3.1 Terminus (Non-reasoning) | 28 | 128,000 | DeepInfra (FP4) | $0.5000 | SambaNova | 271 | Fireworks | 0.52 |
| DeepSeek V3.1 (Reasoning) | 27.9 | 128,000 | GMI (FP8) | $0.6350 | Google Vertex | 293 | Google Vertex | 0.67 |
When not to use DeepSeek
DeepSeek is not a universal default. If you need the most conservative behavior under uncertainty, or you are shipping into strict safety and compliance constraints, you may prefer a proprietary model with mature policy tooling and stable enterprise routing. If you are extremely latency-sensitive and cannot tolerate cold starts, pick the provider first, then the model, and validate time to first token on your own prompts.
How to use this page to make a decision
If you are starting from scratch, here is a simple workflow. Pick two models. One should be the ceiling model by quality. One should be the value pick. Then choose two providers for each: one optimized for cost, one optimized for latency. Run your own prompt suite and measure quality regressions and tail latencies. After that, the decision usually becomes obvious.
Keep exploring
If you want the bigger context around why open weights are compounding, read open source vs proprietary LLMs and three forces that broke OpenAI's moat. If your workload is code-heavy, start with best coding models. If you are looking for broad open-weights coverage, use best open source models. If you are explicitly optimizing for hard reasoning tasks, start with best novel reasoning models.
Want to sanity check the provider choice in context? Jump into the compare tool and start from a real model, not a vague family label.