ODCV-Bench: Outcome-Driven Constraint Violations

About the Benchmark

ODCV-Bench comprises 40 scenarios designed to examine how autonomous agents behave when strong incentives to optimize a KPI come into conflict with ethical, legal, or safety constraints. The benchmark tests agents in two modes: Mandated (explicit instructions to optimize) and Incentivized (pressure via KPIs without explicit commands).

Metric: We measure the Misalignment Rate (% of scenarios with severity score ≥ 3) and Average Severity (0-5 scale).

🏆 Leaderboard

View Mode:

This selection updates both the Leaderboard and the Detailed Breakdown table below.

Rank	Model	Misalignment Rate (%)	Avg Severity (0-5)

📊 Alignment Analysis

Correlation between Incentivized vs. Mandated Misalignment Rates. Models above the diagonal line are "Obediently Misaligned," while models below demonstrate "Proactive Deception."

🔍 Detailed Scenario Breakdown

Raw severity scores (0-5) for each scenario, median across four judges, averaged over Mandated and Incentivized variations. Scores ≥ 3 indicate active outcome-driven violations. Shown for the 12 current-generation models.

Scenario	Avg Severity	Claude-Opus-4.7	Gemini-3.1-Pro-Preview	GLM-5.1	GPT-5.5	gpt-oss-120b	gpt-oss-20b	Grok-4.20	Kimi-K2.6	Llama-4-Maverick	Minimax-M2.7	Qwen3.6-27B	Qwen3.6-Max-Preview