SWE-bench Verified
SWE-bench Verified — Übersicht
SWE-bench Verified ist ein kuratierter Benchmark mit 500 Aufgaben, die von menschlichen Software-Entwicklern validiert wurden. Der Benchmark testet die Fähigkeit von KI-Modellen, reale GitHub-Issues in populären Open-Source-Python-Repositories zu lösen. Jede Aufgabe besteht aus einer Issue-Beschreibung und der dazugehörigen Code-Repository. Das LLM soll einen Patch erstellen, der alle zugehörigen Unit-Tests besteht. Der SWE-Bench (verified) Benchmark wurde in Zusammenarbeit von Princeton NLP und OpenAI entwickelt.
SWE-bench Verified Leaderboard
Ranking aller getesteten Modelle im SWE-bench Verified Benchmark, sortiert nach Score.
Beispielaufgaben aus dem SWE-bench Verified Benchmark
Die folgenden Beispielaufgaben zeigen typische Fragestellungen, die im SWE-bench Verified Benchmark vorkommen.
django__django-15572: Django 3.2.4+ autoreload breaks on empty string in TEMPLATES DIRS. Django versions > 3.2.3 changes the way template dirs are handled, they are now normalized using pathlib.Path. People having an invalid value in TEMPLATES DIRS will notice that autoreload stops working. "DIRS": os.getenv("TEMPLATES_DIRS", "").split(",") produces "DIRS": [''] which breaks autoreload.
A patch to django/template/autoreload.py that filters out empty strings from TEMPLATES DIRS before normalization, preventing the autoreload from treating empty strings as the project root directory.
astropy__astropy-12907: Modeling's `separability_matrix` does not compute separability correctly for nested CompoundModels. Consider the following model:
from astropy.modeling import models as m
from astropy.modeling.separable import separability_matrix
cm = m.Linear1D(10) & m.Linear1D(5)
separability_matrix(m.Pix2Sky_TAN() & cm) returns incorrect results showing inputs and outputs as no longer separable when they should be.
A patch to astropy/modeling/separable.py that correctly computes the separability matrix for nested CompoundModels by properly handling the recursive structure of compound model trees.
astropy__astropy-13033: TimeSeries: misleading exception when required column check fails. For a TimeSeries object that has additional required columns (in addition to `time`), when codes mistakenly try to remove a required column, the exception it produces is misleading: ValueError: TimeSeries object is invalid - expected 'time' as the first columns but found 'time'
A patch that fixes the error message logic in astropy's TimeSeries validation to correctly report which required columns are missing, rather than producing a confusing message that compares 'time' to 'time'.
django__django-15732: Cannot drop unique_together constraint on a single field with its own unique=True constraint. When a model has both a unique=True constraint on a field AND a unique_together constraint on the same field, trying to drop the unique_together constraint fails with 'ValueError: Found wrong number (2) of constraints'.
A patch to Django's migration framework that correctly distinguishes between unique=True constraints and unique_together constraints on the same column, allowing either to be dropped independently.
django__django-15629: Errors with db_collation – no propagation to foreignkeys. Using db_collation with a pk that also has referenced fks in other models causes foreign key constraint errors in MySQL. When altering a primary key field's db_collation, the collation should also be applied to foreign key columns referencing that primary key.
A patch to Django's schema editor that propagates db_collation changes to all foreign key columns referencing the modified primary key, ensuring collation consistency across related tables.