Artificial Analysis Intelligence Index
Artificial Analysis Intelligence Index — Übersicht
Der Artificial Analysis Intelligence Index ist ein Meta-Benchmark, der keine eigenen Fragen enthält, sondern die Ergebnisse von zehn unabhängigen Benchmarks aggregiert: GDPval-AA (220 Aufgaben), tau2-Bench Telecom (114 Aufgaben), Terminal-Bench Hard (44 Aufgaben), SciCode (338 Teilprobleme), AA-LCR (100 Fragen), AA-Omniscience (6.000 Fragen), IFBench (294 Aufgaben), Humanity's Last Exam (2.158 Fragen), GPQA Diamond (198 Fragen) und CritPt (70 Aufgaben). Dabei gewichtet der Index die Benchmarks in vier Kategorien zu jeweils 25%: Agents, Coding, General und Scientific Reasoning. Durch die breite Abdeckung soll der Index die Generalisierungsfähigkeiten von LLMs widerspiegeln und die Gesamtkapazität der Modelle auf dem Weg zu AGI einordnen. Die Tests sind rein textbasiert und auf Englisch. Über alle Teilbenchmarks wird das pass@1-Scoring, also die Wahrscheinlichkeit, dass das erste Ausgabeergebnis korrekt ist, verwendet.
AA Intelligence Index Leaderboard
Ranking aller getesteten Modelle im Artificial Analysis Intelligence Index Benchmark, sortiert nach Score.
Beispielaufgaben aus dem Artificial Analysis Intelligence Index Benchmark
Die folgenden Beispielaufgaben zeigen typische Fragestellungen, die im Artificial Analysis Intelligence Index Benchmark vorkommen.
[GPQA Diamond] Two quantum states with energies E1 and E2 have a lifetime of 10^-9 sec and 10^-8 sec, respectively. We want to clearly distinguish these two energy levels. Which one of the following options could be their energy difference so that they can be clearly resolved?
(A) 10^-4 eV
(B) 10^-11 eV
(C) 10^-8 eV
(D) 10^-9 eV
(A) 10^-4 eV
[GPQA Diamond] In a given population, 1 out of every 400 people has a cancer caused by a completely recessive allele, b. Assuming the population is in Hardy-Weinberg equilibrium, which of the following is the expected proportion of individuals who carry the b allele but are not expected to develop the cancer?
(A) 1/400
(B) 19/400
(C) 20/400
(D) 38/400
(D) 38/400
[Humanity's Last Exam] Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.
4
[GPQA Diamond] Consider a quantum mechanical system containing a particle of mass m moving in an isotropic three dimensional potential of the form V(r) = 1/2 m omega^2 r^2 corresponding to the acted force obeying Hooke's law. Here, omega is the angular frequency of oscillation and r is the radial distance of the particle from the origin in spherical polar coordinate. What is the value of energy of the third excited state, and how many linearly independent eigenfunctions are possible for the same energy eigenvalue?
(A) 11 pi^2 hbar^2/(2mr^2), 3
(B) (9/2) hbar*omega, 10
(C) 11 pi^2 hbar^2/(2mr^2), 10
(D) (9/2) hbar*omega, 3
(B) (9/2) hbar*omega, 10
[GPQA Diamond] A Fe pellet of 0.056g is first dissolved in 10mL of hydrobromic acid HBr (0.1M). The resulting solution is then titrated by KMnO4 (0.02M). How many equivalence points are there?
(A) Two points, 25ml and 35ml
(B) One point, 25mL
(C) One point, 10ml
(D) Two points, 25ml and 30ml
(A) Two points, 25ml and 35ml