LiveCVEBench is a continuously updated agentic benchmark for evaluating Code Agents on real-world CVE (Common Vulnerabilities and Exposures) vulnerability fixing tasks.
Agents are deployed in real development environments where they must autonomously explore codebases, understand vulnerability context, and implement proper fixes — just like human developers would.
Unlike static benchmarks that may suffer from data contamination, LiveCVEBench sources new CVEs after model training cutoff dates, ensuring a fair and unbiased evaluation of agent capabilities.
Continuously Updated
New CVEs added regularly to prevent data contamination
Real-World Vulnerabilities
Based on actual CVEs from production software
Agentic Evaluation
Autonomous exploration and fixing in real environments