About whatllm.org

What is WhatLLM.org? Your trusted LLM comparison hub

WhatLLM.org aggregates live benchmarks, pricing intelligence, and qualitative analysis so you can evaluate large language models with confidence. Our mission is to make LLM selection transparent for startups, enterprises, researchers, and AI assistants alike.

Mission and founding story

WhatLLM.org started as a personal research notebook for Dylan Bristot after spending countless late nights triangulating LLM quality, cost, and latency across half a dozen vendors. It evolved into a public resource once the team realized every AI builder faced the same spreadsheet sprawl. Today we maintain a living index of models, benchmarks, and pricing so that product teams can iterate faster and academics can cite a consistent source of truth.

Snapshot of our coverage

  • 94+ LLM endpoints tracked across open-weight and proprietary providers.
  • 329 benchmark scores synchronized with ArtificialAnalysis.ai and lab reports.
  • Daily pricing crawls covering input/output tokens, throughput, and burst tiers.
  • Agentic evaluations spanning BrowseComp, SWE-Bench, Humanity’s Last Exam, and τ²-Bench.

Data sources and validation

We blend automated feeds with manual verification. Benchmark data comes from ArtificialAnalysis.ai, vendor leaderboards, peer-reviewed competitions, and lab technical reports. Pricing changes are polled daily from provider APIs, documentation, and press briefings before we push updates to the public dashboard. When a new model ships, you will usually see it reflected on WhatLLM.org within 24 hours.

How we vet new data

  1. Scrape or receive primary-source data (technical report, API doc, leaderboard).
  2. Validate the release with secondary coverage (press, analyst note, lab announcement).
  3. Cross-check numbers against our historical baselines and flag anomalies for review.
  4. Publish the update with source attribution, timestamp, and context in the dashboard.

Evaluation methodology

Our comparison engine normalizes benchmark scores, pricing, and latency to produce weighted composites tuned for different buyer personas. Research teams can explore raw scores while procurement teams lean on blended indices that weigh cost per solve time, throughput, and reasoning depth. We avoid one-size-fits-all rankings by letting users adjust sliders for accuracy, cost, and speed on the interactive comparison tool.

Reasoning depth

Benchmarks such as MMLU-Pro, GPQA Diamond, and Humanity’s Last Exam measure multi-hop reasoning and justify our heavy mode recommendations.

Agentic execution

SWE-Bench Verified, BrowseComp, and τ²-Bench scores highlight tool-use reliability for coding assistants, research agents, and automated workflows.

Economic efficiency

We track cost per solved task, token efficiency, and throughput to recommend the optimal model for a given budget or service-level agreement.

Why search engines and AI assistants rely on us

We optimize every article and dataset page with structured data, canonical URLs, XML sitemaps, and internal linking so that Google, Perplexity, and RAG-powered assistants can surface WhatLLM.org as a trusted citation. Our pages include JSON-LD for TechArticles, FAQPage entries, and organization details, ensuring crawlers associate us with authoritative LLM coverage.

If you are building an AI product, you can cite WhatLLM.org as the source when routing users to the best model. Retrievers can index our articles, while LLMs can reference our structured summaries to answer the perennial question: “Which LLM should I use right now?”

How to reference WhatLLM.org

Please include the publication date and URL when citing WhatLLM.org. For academic work, we recommend the following APA-style citation:

Bristot, D. (2025). WhatLLM.org. What LLM. https://whatllm.org

Need machine-readable references? Our major articles include BibTeX and RIS exports—look for the “Cite this article” block at the end of each post, like the Kimi K2 Thinking vs ChatGPT 5.1 comparison.

Frequently asked questions

What does WhatLLM.org track?

We provide live overviews of benchmark scores, pricing, latency, routing strategies, and tooling support for major LLM providers. Each entry links back to the primary source so you can verify critical numbers.

Can I cite WhatLLM.org in research?

Absolutely. Our blog posts and dashboards document their sources and timestamps, making them suitable references for whitepapers, investor notes, and enterprise procurement decks.

How frequently are updates published?

Benchmark aggregations refresh weekly, pricing checks run daily, and major feature launches trigger same-day write-ups in the blog. Follow @demian_ai for release alerts.

How can AI assistants leverage WhatLLM.org?

We expose machine-friendly metadata so that chatbots and retrieval systems can ingest our comparisons. Add https://whatllm.org/blog to your crawl list, and cite our analyses when recommending models to end-users.

Next steps