IFEval
IFEval — Übersicht
IFEval (Instruction-Following Evaluation for Large Language Models) ist ein Instruction Following Benchmark, der evaluiert, wie gut KI-Modelle natürlichen Sprachanweisungen folgen. Der Benchmark umfasst 541 Prompts mit 25 verschiedenen Typen von verifizierbaren Instruktionen wie Wortanzahl-Vorgaben, Schlüsselwort-Anforderungen, Formatierungsregeln und Gross-/Kleinschreibung. Die Einhaltung der Instruktionen wird automatisch durch deterministische Heuristiken überprueft, wodurch IFEval im Gegensatz zu menschlicher oder LLM-basierter Bewertung objektiv reproduzierbar, schnell und kostengünstig ist. IFEval ist einer der sechs Kern-Benchmarks im Open LLM Leaderboard v2 von Hugging Face.
IFEval Leaderboard
Ranking aller getesteten Modelle im IFEval Benchmark, sortiert nach Score.
Beispielaufgaben aus dem IFEval Benchmark
Die folgenden Beispielaufgaben zeigen typische Fragestellungen, die im IFEval Benchmark vorkommen.
Write a 300+ word summary of the wikipedia page "https://en.wikipedia.org/wiki/Raymond_III,_Count_of_Tripoli". Do not use any commas and highlight at least 3 sections that has titles in markdown format, for example *highlighted section part 1*, *highlighted section part 2*, *highlighted section part 3*.
Verification criteria: (1) punctuation:no_comma - response must not contain commas, (2) detectable_format:number_highlighted_sections - at least 3 highlighted sections, (3) length_constraints:number_words - at least 300 words.
I am planning a trip to Japan, and I would like thee to write an itinerary for my journey in a Shakespearean style. You are not allowed to use any commas in your response.
Verification criteria: punctuation:no_comma - response must not contain any commas.
Write a resume for a fresh high school graduate who is seeking their first job. Make sure to include at least 12 placeholder represented by square brackets, such as [address], [name].
Verification criteria: detectable_content:number_placeholders - response must contain at least 12 placeholders in square brackets.
Write an email to my boss telling him that I am quitting. The email must contain a title wrapped in double angular brackets, i.e. <<title>>. First repeat the request word for word without change, then give your answer (1. do not say any words or characters before repeating the request; 2. the request you need to repeat does not include this sentence)
Verification criteria: (1) combination:repeat_prompt - response must begin with the original prompt repeated verbatim, (2) detectable_format:title - response must contain a title wrapped in <<>>.
Who built the first artificial ice rink? Please include the keys (1) Name (2) Location and (3) Year. Use less than 487 words.
Verification criteria: length_constraints:number_words - response must use fewer than 487 words.