📊 Methodology & Data Sources

How WhatLLM.org Compares LLMs

Transparent documentation of our data sources, the Artificial Analysis Intelligence Index, and how WhatLLM.org helps you find the right LLM for your needs.

Primary Data Source: Artificial Analysis

WhatLLM.org's benchmark data and Intelligence Index scores come from Artificial Analysis (artificialanalysis.ai), an independent research organization that rigorously benchmarks LLMs across quality, speed, and price.

We sync with AA data weekly and recommend visiting their site for their complete methodology documentation and latest research.

The Artificial Analysis Intelligence Index

The Intelligence Index displayed on WhatLLM.org is calculated by Artificial Analysis using their published methodology. It's a composite score (0-100) that synthesizes multiple benchmark results to provide a single, comparable quality metric for LLMs.

âš ī¸ Attribution: WhatLLM.org does not calculate or create the Intelligence Index. We display Artificial Analysis's scores to help users compare models. For the complete mathematical methodology, visit artificialanalysis.ai.

How the Intelligence Index Works (Summary)

According to Artificial Analysis's methodology documentation, the Intelligence Index employs weighted sums, Choquet integrals, and probabilistic ranking techniques. The framework measures AI capabilities across multiple dimensions:

  • Knowledge acquisition — Input and comprehension capabilities
  • Knowledge mastery — Storage and retrieval of information
  • Knowledge innovation — Generation of novel outputs
  • Knowledge feedback — Output quality and relevance

The mathematical formulation normalizes scores across benchmarks to enable fair cross-model comparison, accounting for criterion dependencies and weight uncertainty through stochastic modeling.

Benchmarks Included in the Intelligence Index

Artificial Analysis incorporates the following public benchmarks into their Intelligence Index calculation:

GPQA Diamond

Reasoning

PhD-level science questions requiring multi-step reasoning. Tests deep domain expertise in physics, chemistry, and biology.

AIME 2025

Mathematics

American Invitational Mathematics Examination problems. Tests advanced mathematical problem-solving.

LiveCodeBench

Coding

Real-world coding challenges with execution-based evaluation. Tests practical programming ability.

SWE-Bench Verified

Software Engineering

Real GitHub issues requiring code changes. Tests ability to understand codebases and implement fixes.

MMLU-Pro

Knowledge

Massive multitask language understanding with harder questions. Tests broad knowledge across 57 subjects.

Humanity's Last Exam

Frontier Reasoning

Expert-crafted questions designed to push the limits of AI reasoning.

Ī„Â˛-Bench Telecom

Agentic

Complex multi-step agent tasks. Tests tool use and planning capabilities.

What WhatLLM.org Adds

While Artificial Analysis provides the underlying data and Intelligence Index, WhatLLM.org offers additional value through:

  • Interactive visualization — Scatter plots and filterable tables for exploring price vs. quality vs. speed tradeoffs
  • Advanced filtering — Filter by provider, license type (open source vs. proprietary), context window, and more
  • Side-by-side comparisons — Compare specific models head-to-head across all metrics
  • In-depth blog analysis — Original articles analyzing model releases, benchmarks, and industry trends
  • Simplified interface — Quick answers to "which LLM should I use for X?"
  • Provider comparison — See how the same model's price/speed varies across different API providers

Additional Data Sources

Beyond Artificial Analysis, WhatLLM.org cross-references data from:

  • Official model papers — Technical reports from OpenAI, Anthropic, Google, Meta, etc.
  • Benchmark leaderboards — LMSYS Chatbot Arena, SWE-Bench, LiveCodeBench
  • Provider documentation — Official pricing pages for verification

Update Frequency

  • Weekly — Routine sync with Artificial Analysis data
  • Same-day — Major model releases (GPT-5, Claude 4, Gemini 3, etc.)
  • Continuous — Blog content and analysis as industry develops

Limitations & Transparency

We believe in honest representation of our data:

  • WhatLLM.org is primarily a visualization and comparison layer on top of Artificial Analysis data
  • We do not conduct our own benchmark evaluations
  • Our Intelligence Index scores are identical to those published by Artificial Analysis
  • For methodology questions, refer to artificialanalysis.ai

How to Cite

When citing WhatLLM.org:

Bristot, D. (2025). WhatLLM.org. https://whatllm.org

When citing the Intelligence Index methodology:

Artificial Analysis. (2025). Artificial Analysis Intelligence Index. https://artificialanalysis.ai

For academic work using Intelligence Index scores, we recommend citing Artificial Analysis as the primary source for the methodology and data.

Frequently Asked Questions

Where does WhatLLM.org get its data?

Primarily from Artificial Analysis, supplemented by official model papers and benchmark leaderboards. The Intelligence Index scores are calculated by AA using their published methodology.

What is the Intelligence Index?

A composite score (0-100) developed by Artificial Analysis that synthesizes multiple benchmark results (GPQA Diamond, AIME 2025, LiveCodeBench, MMLU-Pro, etc.) into a single quality metric.

What value does WhatLLM.org add?

Interactive visualization, advanced filtering, side-by-side comparisons, in-depth blog analysis, and a simplified interface for quickly identifying the best LLM for your use case.

How often is the data updated?

Weekly sync with AA data, same-day updates for major model releases. Follow @demian_ai for announcements.

Next Steps