
Can AI Fix Buggy Code? Exploring the Use of
Large Language Models in Automated
Program Repair
Lan Zhang, Northern Arizona University, Flagstaff, AZ, 86005, USA
Anoop Singhal, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
Qingtian Zou, University of Texas Southwestern Medical Center, Dallas, TX, USA
Xiaoyan Sun, Worcester Polytechnic Institute, Worcester, MA, 01609, USA
Peng Liu, Pennsylvania State University, State College, PA, 16803, USA
Abstract: LLMs are becoming increasingly used to
help programmers fix buggy code due to their re-
markable capabilities. This article reviews the current
human-LLM collaboration approach to bug fixing and
points out the research directions towards (the devel-
opment of) autonomous program repair AI agents.
INTRODUCTION
The field of software engineering has witnessed a
paradigm shift with the advent of large language
models (LLMs). These sophisticated AI systems have
demonstrated remarkable versatility across various
software development tasks, including code genera-
tion, bug detection, and code review [1, 2, 3]. The
potential of LLMs to revolutionize software develop-
ment practices has sparked broad interest within both
academic and industry circles, prompting a surge of
research into their capabilities and limitations.
A recent breakthrough in this domain came with the
introduction of Devin, an LLM-powered AI system ca-
pable of autonomously completing 13.8% of real-world
coding tasks [4]. These tasks encompass a range of
complex operations, from diagnosing and fixing bugs to
conducting comprehensive code reviews. However, the
relatively modest success rate of 13.8% in real-world
scenarios raises a critical question that forms the core
of our investigation: Are we truly prepared to leverage
LLMs for repairing buggy complex programs? This
question is not merely academic but has far-reaching
implications for the future of software development and
maintenance practices.
To address this fundamental quest, our study fo-
cuses on two modes of LLM-supported program repair:
Human-LLM Collaboration: This approach examines
the synergistic relationship between human software
XXXX-XXX © 2024 IEEE
Digital Object Identifier 10.1109/XXX.0000.0000000
engineers and LLMs in the bug repair process [5]. It en-
compasses both interactive, dialogue-based method-
ologies and more integrated solutions such as real-
time code completion and suggestion systems.
Autonomous AI Agent Repair: This mode investi-
gates the potential for LLMs to independently identify
and rectify bugs without direct human intervention,
representing a more ambitious vision of automated
program repair.
By examining the efficacy of LLMs across diverse
programming contexts, e.g., C/C++, Java, Python, we
aim to provide a nuanced understanding of their cur-
rent capabilities and limitations in addressing com-
plex software bugs. Our findings reveal a nuanced
landscape of LLM-supported program repair. For the
Human-LLM Collaboration mode, we observed that
results could be significantly improved when humans
provide additional contextual knowledge. This includes
information about variable contexts, relevant data
structures, related functions, and even the underlying
logic of the code. This synergy between human exper-
tise and LLM capabilities shows promise for enhancing
bug repair processes in complex software systems.
In contrast, the Autonomous AI Agent Repair mode
presents a more challenging frontier. Our research
indicates that we are still far from achieving reliable
automatic code repair using LLMs alone. The com-
plexity of real-world software systems, coupled with
the nuanced understanding required for effective bug
repair, continues to pose significant challenges for fully
autonomous LLM-based solutions.
Human-LLM Collaboration
GitHub Copilot’s ROBIN system represents a signifi-
cant advancement in human-LLM collaboration for de-
bugging [6]. It uses multiple AI agents to analyze code
context, exception information, and user queries, guid-
ing developers through systematic debugging steps.
ROBIN leverages LLMs as reasoning engines to pro-
Month Published by the IEEE Computer Society Publication Name
1