Overview
To submit your Code Agent's evaluation results to the LiveCVEBench leaderboard, you need to create a Pull Request to our GitHub repository with your results file.
Submission Format
Create a JSON file with the following structure:
{
"model": "Your-Model-Name",
"agent": "Your-Agent-Name",
"modelType": "open",
"agentType": "open",
"instruction_type": "user_report",
"cve_results": {
"CVE-2025-0001": {
"success": true,
"turns": 3,
"tokens": 14500
},
"CVE-2025-0002": {
"success": false,
"turns": 8,
"tokens": 42000
}
}
}
Field Descriptions
| Field | Type | Description |
|---|---|---|
model |
string | Name of the LLM model (e.g., "GPT-4o", "Claude-3.5-Sonnet") |
agent |
string | Name of the agent framework (e.g., "OpenHands", "Aider") |
modelType |
"open" | "closed" | Whether the model weights are publicly available |
agentType |
"open" | "closed" | Whether the agent source code is publicly available |
instruction_type |
"user_report" | "cve_description" | Type of task input: user_report (recommended) or cve_description |
success |
boolean | Whether the CVE was successfully fixed |
turns |
number | Number of interaction turns taken |
tokens |
number | Total tokens consumed (input + output) |
Submission Steps
- Fork the livecvebench/submissions repository
- Create your results file as
submissions/{Model}_{Agent}.json - Commit and push your changes
- Create a Pull Request with:
- A brief description of your model/agent
- Link to model/agent repository (if open source)
- Any relevant configuration details
Evaluation Environment
LiveCVEBench is fully compatible with the Terminal Bench evaluation framework. You can use Terminal Bench to run your agent on our CVE tasks and generate the results file.
Questions?
If you have any questions about the submission process, please open an issue on our
GitHub repository.