Command Palette
Search for a command to run...
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Abstract
The increasing adoption of large language models (LLMs) in softwareengineering necessitates rigorous security evaluation of their generated code.However, existing benchmarks are inadequate, as they focus on isolated codesnippets, employ unstable evaluation methods that lack reproducibility, andfail to connect the quality of input context with the security of the output.To address these gaps, we introduce A.S.E (AI Code Generation SecurityEvaluation), a benchmark for repository-level secure code generation. A.S.Econstructs tasks from real-world repositories with documented CVEs, preservingfull repository context like build systems and cross-file dependencies. Itsreproducible, containerized evaluation framework uses expert-defined rules toprovide stable, auditable assessments of security, build quality, andgeneration stability. Our evaluation of leading LLMs on A.S.E reveals three keyfindings: (1) Claude-3.7-Sonnet achieves the best overall performance. (2) Thesecurity gap between proprietary and open-source models is narrow;Qwen3-235B-A22B-Instruct attains the top security score. (3) Concise,fast-thinking'' decoding strategies consistently outperform complex,slow-thinking'' reasoning for security patching.
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.