Who builds MiniMax
MiniMax publishes open-weight language models under the M-series, and also operates a broader product surface that includes multimodal experiences. For builders, the key point is that the M-series is engineered around efficiency: long context in M1, and high-throughput agent and coding workflows in M2 and M2.1.
Sources for model releases and technical notes: MiniMax-M1 GitHub, VentureBeat.
Model map: M1 vs M2 vs M2.1
If you only remember one thing, remember this: M1 is the long-context hammer. M2 and M2.1 are the fast agent brains. They overlap, but their default strengths are different, and providers price them differently.
| Family | Models | Endpoints | Providers | Median blended $/1M | Best blended $/1M | Median speed tok/s | Median TTFT s |
|---|---|---|---|---|---|---|---|
| M2.1 | 1 | 4 | 4 | $0.7450 | $0.3000 | 81 | 1.31 |
| M2 | 1 | 5 | 5 | $0.7500 | $0.6370 | 89 | 0.63 |
| M1 | 2 | 1 | 1 | $1.375 | $1.375 | 104 | 75.80 |
What to use
Most teams should start with MiniMax-M2.1. It is the strongest default when your work is code-heavy, tool-heavy, or agent-shaped. If you need maximum throughput per dollar, MiniMax-M2 often behaves like the pragmatic workhorse. If your constraint is context window, reach for MiniMax-M1.
The ceiling model
The strongest MiniMax model in this snapshot is MiniMax-M2.1 at QI 39.3, with a 204,800 token context window. Use it when quality is the priority and you can tolerate the provider tradeoffs.
The value pick
If you care about quality per dollar, the best value pick is MiniMax-M2.1. Its cheapest observed blended price is $0.3000 per 1M tokens on GMI (FP8). Treat this as a shortlist candidate for high-volume workloads.
Provider reality: same model, very different outcomes
MiniMax endpoints can differ across price, throughput, and time to first token. If you are building an interactive product, time to first token often matters as much as throughput. If you are doing batch workloads, blended price and throughput dominate. Use the explorer to pick the model you care about and sort providers by the metric that maps to your constraints.
| Provider | Input $/1M | Output $/1M | Blended $/1M | Speed tok/s | TTFT s |
|---|---|---|---|---|---|
| DeepInfra | 0.2540 | 1.020 | 0.6370 | 95 | 0.29 |
| Amazon Bedrock | 0.3000 | 1.200 | 0.7500 | 68 | 0.63 |
| Novita | 0.3000 | 1.200 | 0.7500 | 85 | 0.98 |
| Google Vertex | 0.3000 | 1.200 | 0.7500 | 181 | 0.25 |
| MiniMax | 0.3000 | 1.200 | 0.7500 | 89 | 1.31 |
A practical shortlist, with provider signals
This table is for decisions. It ranks the top MiniMax models by quality index, then shows the cheapest observed endpoint, plus the fastest and lowest-latency endpoints where available.
| Model | QI | Context | Cheapest provider | Blended $/1M | Fastest provider | Tok/s | Lowest latency provider | TTFT s |
|---|---|---|---|---|---|---|---|---|
| MiniMax-M2.1 | 39.3 | 204,800 | GMI (FP8) | $0.3000 | GMI (FP8) | 92 | DeepInfra (FP8) | 0.29 |
| MiniMax-M2 | 35.7 | 204,800 | DeepInfra | $0.6370 | Google Vertex | 181 | Google Vertex | 0.25 |
| MiniMax M1 80k | 24.4 | 1,000,000 | Novita | $1.375 | Novita | 104 | Novita | 75.80 |
| MiniMax M1 40k | 20.9 | 1,000,000 | N/A | N/A | N/A | N/A | N/A | N/A |
What to skip
Skip MiniMax as a default when you cannot validate latency and tail behavior on your own prompts. The biggest failure mode for teams is choosing a model family first, then discovering the provider endpoint does not match their constraints. If you need the most conservative behavior under uncertainty, or you ship into strict safety and compliance workflows, you may prefer a proprietary model with mature policy tooling and stable enterprise routing.
How to use this page to make a decision
Pick two models. One should be your ceiling model by quality. One should be your value pick. Then choose two providers for each: one optimized for cost, one optimized for latency. Run your own prompt suite and measure quality regressions and tail latencies. The decision usually becomes obvious.
Keep exploring
If your work is code-heavy, start with best coding models. If you are building tool-using systems, start with best agentic models. If your constraint is long documents, start with best long context models. For broad open-weight coverage, use best open source models.
Want to sanity check the provider choice in context? Jump into the compare tool and start from a real model, not a vague family label.