Aider Polyglot
Aider Polyglot — Übersicht
Aider Polyglot ist ein Code-Editing-Benchmark, der die Programmierfähigkeiten von LLMs anhand von 225 anspruchsvollen Programmieraufgaben in sechs Sprachen (C++, Go, Java, JavaScript, Python, Rust) bewertet. Die getesteten LLMs erhalten zwei Versuche pro Aufgabe: Bei einem Fehler im ersten Versuch werden die Unit-Test-Ergebnisse als Feedback für einen zweiten Korrekturversuch bereitgestellt. Aider Polyglot wurde entwickelt, um die Sättigung des ursprünglichen Python-only-Benchmarks von Aider zu überwinden und eine deutlichere Differenzierung zwischen den stärksten Coding-Modellen zu ermöglichen. In 2026 gilt auch Aider Polyglot als gesättigt und wird nicht mehr aktiv aktualisiert.
Aider Polyglot Leaderboard
Ranking aller getesteten Modelle im Aider Polyglot Benchmark, sortiert nach Score.
Beispielaufgaben aus dem Aider Polyglot Benchmark
Die folgenden Beispielaufgaben zeigen typische Fragestellungen, die im Aider Polyglot Benchmark vorkommen.
Zebra Puzzle (Go/Python/JavaScript): Solve the Zebra Puzzle. There are five houses in a row, each with a different color, inhabited by people of different nationalities, with different pets, drinks and cigarettes. Given 15 constraints, determine: Which resident drinks water? Who owns the zebra?
Implement a constraint satisfaction solver that efficiently narrows down approximately 24.8 billion possible combinations using the 15 given logical constraints to determine that the Norwegian drinks water and the Japanese owns the zebra.
Bowling (Python/JavaScript/Java): Score a bowling game. Implement two methods: roll(pins) to record each throw and score() to calculate the final game total. Handle open frames, spares (10 + next throw), strikes (10 + next two throws), and the special 10th frame fill balls. Raise exceptions for invalid inputs.
Implement a BowlingGame class that tracks rolls frame by frame, correctly applies spare and strike bonus scoring rules, handles the 10th frame's fill ball logic, and validates all inputs (pin counts, game state).
React (Rust/JavaScript/Python): Implement a basic reactive system with cells with settable values ('input' cells) and cells with values computed in terms of other cells ('compute' cells). Compute cells should allow for registering change notification callbacks. Call a cell's callbacks when the cell's value in a new stable state has changed from the previous stable state.
Implement a reactive cell system supporting input cells, compute cells with dependency tracking, automatic value propagation on input changes, and callback registration that only fires when values actually change between stable states.
Forth (JavaScript/Rust/Python): Implement an evaluator for a very simple subset of Forth. Support arithmetic operations (+, -, *, /), stack manipulation (DUP, DROP, SWAP, OVER), and user-defined words using ': word-name definition ;' syntax. Words are case-insensitive.
Implement a stack-based Forth interpreter that parses and evaluates arithmetic and stack manipulation commands, supports custom word definitions with proper scoping, handles case-insensitive word lookup, and raises appropriate errors for stack underflow and undefined operations.
Connect (Python/JavaScript): Compute the result for a game of Hex / Polygon. Two players place stones on a parallelogram with hexagonal fields. Player O wins by connecting top to bottom, Player X wins by connecting left to right. Determine the winner from a given board representation using '.', 'O', and 'X' characters.
Implement a board parser and graph traversal algorithm (e.g., BFS/DFS) that checks whether Player O has an unbroken chain of stones from the top row to the bottom row, or Player X has a chain from the left column to the right column, accounting for hexagonal adjacency.