APEX-Agents
APEX-Agents — Übersicht
APEX-Agents (AI Productivity Index for Agents) ist ein Benchmark zur Bewertung, ob KI-Agenten langfristige, anwendungsübergreifende Aufgaben ausführen können. Die Aufgaben spezialisieren sich auf Anwendungsgebiete im Investmentbanking, der Unternehmensberatung und im wirtschaftsrechtlichen Bereich. APEX-Agents umfasst 480 Aufgaben in 33 "Worlds". Eine "World" stellt eine möglichst realitätsnahe Arbeitsumgebungen dar, in der Agenten mit Dateien und Tools wie Dokumenten, Tabellen, E-Mail, Chat und Kalendern interagieren müssen. Die Evaluation erfolgt über binäre "Bestanden" / "Nicht Bestanden" Kriterien, die von Domänenexperten mit langjähriger Berufserfahrung entwickelt wurden. Zur Veröffentlichung des Benchmarks erreichte das Top-Modell nur einen Score von 24%. Der Benchmark zeigt damit eine Lücke in der realen Anwendung von agentischen KI-Systemen auf.
APEX-Agents Leaderboard
Ranking aller getesteten Modelle im APEX-Agents Benchmark, sortiert nach Score.
Beispielaufgaben aus dem APEX-Agents Benchmark
Die folgenden Beispielaufgaben zeigen typische Fragestellungen, die im APEX-Agents Benchmark vorkommen.
TAC and its supplier executed the first draft of the Master Supply Agreement document (Template 1) as of April 1, 2025. President Trump announced additional tariffs on July 20, 2025. The supplier's raw materials come from Iraq. The supplier sends TAC notice that it intends to invoke the force majeure clause as a result of the tariffs. May the supplier do so under the terms of the agreement? Tell me your main findings by replying to me here.
Yes, the supplier can invoke the expanded force majeure clause found in Master Supply Agreement Template 1.docx. Section 14 of the agreement provides that the supplier may invoke force majeure if any governmental tariff imposed on the Products increases by more than twenty-five percent (25%) compared to the tariff rate in effect on the Effective Date. On July 20, 2025, President Trump announced tariffs on Iraqi goods of 35%, which is where the supplier sources its raw materials. Since the newly announced tariffs are in excess of the 25% required under the agreement, the supplier may invoke force majeure.
For the Kenvue deal, please send over the below draft figures for pre-deal target multiples for FY24. Utilize potential median deal value Return to me a message with: Deal value/EBITDA, Deal value/EBIT, and Deal value/OpFCF. Round all values to one decimal place.
Deal value / EBIT: 24.9x Deal value / EBITDA: 18.6x Deal value / OpFCF: 34.5x
Tell me whether or not the asset type that has the highest average adjusted failure probability per outage is also responsible for the highest average Value of Lost Load (VOLL) per asset. VOLL is defined as the product of SAIDI, number of customers affected, and assumed € per Customer-Minute. If it doesn't, which asset type does have the highest VOLL per asset? And for that asset type, what is the average adjusted failure probability per outage and the average VOLL per asset? Write your answer to me in here, rounding the output dollar values to the nearest 0.1 million and the output percentages to the nearest 0.01%.
The asset type with the highest average of adjusted failure probability is not responsible for the highest average financial impact. The asset type with the highest average financial impact (defined as the highest average VOLL per asset) is Transmission Line. The Adjusted Failure Probability (%) of Outages for the Transmission Line asset type is 51.59%. The Average Value of Lost Load (VOLL) per asset for the Transmission Line asset type is €82.2M.
Can the Terminal Operator arrest the vessel (M/V Red Room) in rem under Rule C to secure cleanup costs arising from the spill? Please provide a few paragraphs to me in here, explaining your answer based on maritime law.
No, the Terminal Operator cannot arrest the vessel in rem under Rule C because it cannot establish a maritime lien against the vessel. Under maritime law, an action in rem can be brought against a vessel to hold the vessel itself liable for the torts it has committed and for the contracts it has breached.
Update the DCF model with Q4 2025 estimates, recompute IRR, then draft a two-slide investor summary.
Agent must locate and update the DCF spreadsheet with Q4 2025 financial estimates, recalculate the Internal Rate of Return (IRR), and create a two-slide presentation summarizing the investment case for investors.