Alle AI-Trends
Direkt in dein Postfach
Erhalte exklusive AI-Tutorials, Praxistipps und KI-News direkt in dein Postfach.
*Mit deiner Anmeldung akzeptierst du unsere Datenschutzrichtlinien.
Jetzt neu bei Byte: Unser WhatsApp Channel 📱

SWE-bench Verified

Veröffentlichung
August 2024
Score-Bereich
0 – 100 %
Modelle getestet
44
ProgrammierungAgentische Aufgaben
Experte

SWE-bench Verified — Übersicht

SWE-bench Verified ist ein kuratierter Benchmark mit 500 Aufgaben, die von menschlichen Software-Entwicklern validiert wurden. Der Benchmark testet die Fähigkeit von KI-Modellen, reale GitHub-Issues in populären Open-Source-Python-Repositories zu lösen. Jede Aufgabe besteht aus einer Issue-Beschreibung und der dazugehörigen Code-Repository. Das LLM soll einen Patch erstellen, der alle zugehörigen Unit-Tests besteht. Der SWE-Bench (verified) Benchmark wurde in Zusammenarbeit von Princeton NLP und OpenAI entwickelt.

SWE-bench Verified Leaderboard

Ranking aller getesteten Modelle im SWE-bench Verified Benchmark, sortiert nach Score.



Beispielaufgaben aus dem SWE-bench Verified Benchmark

Die folgenden Beispielaufgaben zeigen typische Fragestellungen, die im SWE-bench Verified Benchmark vorkommen.

django__django-15572: Django 3.2.4+ autoreload breaks on empty string in TEMPLATES DIRS. Django versions > 3.2.3 changes the way template dirs are handled, they are now normalized using pathlib.Path. People having an invalid value in TEMPLATES DIRS will notice that autoreload stops working. "DIRS": os.getenv("TEMPLATES_DIRS", "").split(",") produces "DIRS": [''] which breaks autoreload.

A patch to django/template/autoreload.py that filters out empty strings from TEMPLATES DIRS before normalization, preventing the autoreload from treating empty strings as the project root directory.

astropy__astropy-12907: Modeling's `separability_matrix` does not compute separability correctly for nested CompoundModels. Consider the following model: from astropy.modeling import models as m from astropy.modeling.separable import separability_matrix cm = m.Linear1D(10) & m.Linear1D(5) separability_matrix(m.Pix2Sky_TAN() & cm) returns incorrect results showing inputs and outputs as no longer separable when they should be.

A patch to astropy/modeling/separable.py that correctly computes the separability matrix for nested CompoundModels by properly handling the recursive structure of compound model trees.

astropy__astropy-13033: TimeSeries: misleading exception when required column check fails. For a TimeSeries object that has additional required columns (in addition to `time`), when codes mistakenly try to remove a required column, the exception it produces is misleading: ValueError: TimeSeries object is invalid - expected 'time' as the first columns but found 'time'

A patch that fixes the error message logic in astropy's TimeSeries validation to correctly report which required columns are missing, rather than producing a confusing message that compares 'time' to 'time'.

django__django-15732: Cannot drop unique_together constraint on a single field with its own unique=True constraint. When a model has both a unique=True constraint on a field AND a unique_together constraint on the same field, trying to drop the unique_together constraint fails with 'ValueError: Found wrong number (2) of constraints'.

A patch to Django's migration framework that correctly distinguishes between unique=True constraints and unique_together constraints on the same column, allowing either to be dropped independently.

django__django-15629: Errors with db_collation – no propagation to foreignkeys. Using db_collation with a pk that also has referenced fks in other models causes foreign key constraint errors in MySQL. When altering a primary key field's db_collation, the collation should also be applied to foreign key columns referencing that primary key.

A patch to Django's schema editor that propagates db_collation changes to all foreign key columns referencing the modified primary key, ensuring collation consistency across related tables.