8 months ago

Abstract

While large language models have made significant strides in code generation,the pass rate of the generated code is bottlenecked on subtle errors, oftenrequiring human intervention to pass tests, especially for complex problems.Existing LLM-based debugging systems treat generated programs as monolithicunits, failing to address bugs at multiple levels of granularity, fromlow-level syntax errors to high-level algorithmic flaws. In this paper, weintroduce Multi-Granularity Debugger (MGDebugger), a hierarchical code debuggerby isolating, identifying, and resolving bugs at various levels of granularity.MGDebugger decomposes problematic code into a hierarchical tree structure ofsubfunctions, with each level representing a particular granularity of error.During debugging, it analyzes each subfunction and iteratively resolves bugs ina bottom-up manner. To effectively test each subfunction, we propose anLLM-simulated Python executor, which traces code execution and tracks importantvariable states to pinpoint errors accurately. Extensive experimentsdemonstrate that MGDebugger outperforms existing debugging systems, achievingan 18.9% improvement in accuracy over seed generations in HumanEval and a 97.6%repair success rate in HumanEvalFix. Furthermore, MGDebugger effectively fixesbugs across different categories and difficulty levels, demonstrating itsrobustness and effectiveness.

Source PDF