What is the Artificial Analysis Intelligence Index?

The Artificial Analysis Intelligence Index is a composite score (0-100) developed by Artificial Analysis that synthesizes multiple benchmark results including GPQA Diamond, AIME 2025, LiveCodeBench, MMLU-Pro, and others into a single quality metric for comparing LLMs.

How often is WhatLLM.org updated?

WhatLLM.org syncs with Artificial Analysis data weekly, with major model releases triggering same-day updates. Pricing data is verified against provider documentation.

LLM Comparison Methodology | Data Sources & Intelligence Index Explained

Q: Where does WhatLLM.org get its data?

WhatLLM.org aggregates benchmark and pricing data primarily from Artificial Analysis (artificialanalysis.ai), supplemented by official model papers and benchmark leaderboards. The Intelligence Index scores displayed on WhatLLM are calculated by Artificial Analysis using their published methodology.

Q: What value does WhatLLM.org add beyond Artificial Analysis?

WhatLLM.org provides interactive visualization tools, advanced filtering by provider/price/speed, side-by-side model comparisons, in-depth blog analysis, and a simplified interface for quickly identifying the best LLM for specific use cases. We make AA's excellent data more accessible.

The Artificial Analysis Intelligence Index

The Intelligence Index displayed on WhatLLM.org is calculated by Artificial Analysis using their published methodology. It's a composite score (0-100) that synthesizes multiple benchmark results to provide a single, comparable quality metric for LLMs.

⚠️ Attribution: WhatLLM.org does not calculate or create the Intelligence Index. We display Artificial Analysis's scores to help users compare models. For the complete mathematical methodology, visit artificialanalysis.ai.

How the Intelligence Index Works (Summary)

According to Artificial Analysis's methodology documentation, the Intelligence Index employs weighted sums, Choquet integrals, and probabilistic ranking techniques. The framework measures AI capabilities across multiple dimensions:

Knowledge acquisition — Input and comprehension capabilities
Knowledge mastery — Storage and retrieval of information
Knowledge innovation — Generation of novel outputs
Knowledge feedback — Output quality and relevance

The mathematical formulation normalizes scores across benchmarks to enable fair cross-model comparison, accounting for criterion dependencies and weight uncertainty through stochastic modeling.

Benchmarks Included in the Intelligence Index

Artificial Analysis incorporates the following public benchmarks into their Intelligence Index calculation:

GPQA Diamond

Reasoning

PhD-level science questions requiring multi-step reasoning. Tests deep domain expertise in physics, chemistry, and biology.

AIME 2025

Mathematics

American Invitational Mathematics Examination problems. Tests advanced mathematical problem-solving.

LiveCodeBench

Coding

Real-world coding challenges with execution-based evaluation. Tests practical programming ability.

SWE-Bench Verified

Software Engineering

Real GitHub issues requiring code changes. Tests ability to understand codebases and implement fixes.

MMLU-Pro

Knowledge

Massive multitask language understanding with harder questions. Tests broad knowledge across 57 subjects.

Humanity's Last Exam

Frontier Reasoning

Expert-crafted questions designed to push the limits of AI reasoning.

τ²-Bench Telecom

Agentic

Complex multi-step agent tasks. Tests tool use and planning capabilities.

What WhatLLM.org Adds

While Artificial Analysis provides the underlying data and Intelligence Index, WhatLLM.org offers additional value through:

Interactive visualization — Scatter plots and filterable tables for exploring price vs. quality vs. speed tradeoffs
Advanced filtering — Filter by provider, license type (open source vs. proprietary), context window, and more
Side-by-side comparisons — Compare specific models head-to-head across all metrics
In-depth blog analysis — Original articles analyzing model releases, benchmarks, and industry trends
Simplified interface — Quick answers to "which LLM should I use for X?"
Provider comparison — See how the same model's price/speed varies across different API providers

Additional Data Sources

Beyond Artificial Analysis, WhatLLM.org cross-references data from:

Official model papers — Technical reports from OpenAI, Anthropic, Google, Meta, etc.
Benchmark leaderboards — LMSYS Chatbot Arena, SWE-Bench, LiveCodeBench
Provider documentation — Official pricing pages for verification

Update Frequency

Weekly — Routine sync with Artificial Analysis data
Same-day — Major model releases (GPT-5, Claude 4, Gemini 3, etc.)
Continuous — Blog content and analysis as industry develops

Limitations & Transparency

We believe in honest representation of our data:

WhatLLM.org is primarily a visualization and comparison layer on top of Artificial Analysis data
We do not conduct our own benchmark evaluations
Our Intelligence Index scores are identical to those published by Artificial Analysis
For methodology questions, refer to artificialanalysis.ai

How to Cite

When citing WhatLLM.org:

Bristot, D. (2025). WhatLLM.org. https://whatllm.org

When citing the Intelligence Index methodology:

Artificial Analysis. (2025). Artificial Analysis Intelligence Index. https://artificialanalysis.ai

For academic work using Intelligence Index scores, we recommend citing Artificial Analysis as the primary source for the methodology and data.

Frequently Asked Questions

Where does WhatLLM.org get its data?

Primarily from Artificial Analysis, supplemented by official model papers and benchmark leaderboards. The Intelligence Index scores are calculated by AA using their published methodology.

What is the Intelligence Index?

A composite score (0-100) developed by Artificial Analysis that synthesizes multiple benchmark results (GPQA Diamond, AIME 2025, LiveCodeBench, MMLU-Pro, etc.) into a single quality metric.

What value does WhatLLM.org add?

Interactive visualization, advanced filtering, side-by-side comparisons, in-depth blog analysis, and a simplified interface for quickly identifying the best LLM for your use case.

How often is the data updated?

Weekly sync with AA data, same-day updates for major model releases. Follow @demian_ai for announcements.

Next Steps

Explore the interactive comparison dashboard to find your ideal LLM.
Read our blog analysis for deep dives on specific model comparisons.
Visit Artificial Analysis for complete methodology documentation.

How WhatLLM.org Compares LLMs

Primary Data Source: Artificial Analysis