Automatic bug fixing: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 14:51, 27 May 2024 edit Harrygoods76 (talk \| contribs) 286 edits Linked WP page Tag: Visual edit ← Previous edit		Latest revision as of 11:56, 16 August 2025 edit undo Jnestorius (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers 80,487 edits test case (software)s
(9 intermediate revisions by 8 users not shown)
Line 1: {{short description\|Automatic repair of software bugs}} '''Automatic bug-fixing''' is the automatic [[Patch (computing)\|repair]] of [[software bug]]s without the intervention of a human programmer.<ref>{{Cite journal \|last=Rinard \|first=Martin C. \|year=2008 \|title=Technical perspective ''Patching'' program errors \|journal=Communications of the ACM \|volume=51 \|issue=12 \|pages=86 \|doi=10.1145/1409360.1409381 \|s2cid=28629846}}</ref><ref>{{Cite journal \|last=Harman \|first=Mark \|year=2010 \|title=Automated patching techniques \|journal=Communications of the ACM \|volume=53 \|issue=5 \|pages=108 \|doi=10.1145/1735223.1735248 \|s2cid=9729944}}</ref><ref name="Gazzola2019">{{Cite journal \|last1=Gazzola \|first1=Luca \|last2=Micucci \|first2=Daniela \|last3=Mariani \|first3=Leonardo \|year=2019 \|title=Automatic Software Repair: A Survey \|url=https://boa.unimib.it/bitstream/10281/184798/2/08089448_final.pdf \|journal=IEEE Transactions on Software Engineering \|volume=45 \|issue=1 \|pages=34–67 \|doi=10.1109/TSE.2017.2755013 \|hdl=10281/184798 \|s2cid=57764123\|doi-access=free }}</ref> It is also commonly referred to as ''automatic patch generation'', ''automatic bug repair'', or ''automatic program repair''.<ref name="Gazzola2019">{{Cite journal \|last1=Gazzola \|first1=Luca \|last2=Micucci \|first2=Daniela \|last3=Mariani \|first3=Leonardo \|year=2019 \|title=Automatic Software Repair: A Survey \|url=https://boa.unimib.it/bitstream/10281/184798/2/08089448_final.pdf \|journal=IEEE Transactions on Software Engineering \|volume=45 \|issue=1 \|pages=34–67 \|doi=10.1109/TSE.2017.2755013 \|hdl=10281/184798 \|s2cid=57764123\|doi-access=free }}</ref> The typical goal of such techniques is to automatically generate correct [[Patch (computing)\|patches]] to eliminate bugs in [[software program]]s without causing [[software regression]].<ref>{{Cite book \|last1=Tan \|first1=Shin Hwei \|title=2015 IEEE/ACM 37th IEEE International Conference on Software Engineering \|last2=Roychoudhury \|first2=Abhik \|date=2015 \|publisher=IEEE \|isbn=978-1-4799-1934-5 \|pages=471–482 \|chapter=relifix: Automated repair of software regressions \|doi=10.1109/ICSE.2015.65 \|s2cid=17125466}}</ref> == Specification == Automatic bug fixing is made according to a specification of the expected behavior which can be for instance a [[formal specification]] or a [[test suite]].<ref name="genprog2009">{{Cite book \|last1=Weimer \|first1=Westley \|title=Proceedings of the 31st International Conference on Software Engineering \|last2=Nguyen \|first2=ThanhVu \|last3=Le Goues \|first3=Claire\|author3-link=Claire Le Goues \|last4=Forrest \|first4=Stephanie \|date=2009 \|publisher=IEEE \|isbn=978-1-4244-3453-4 \|pages=364–374 \|chapter=Automatically finding patches using genetic programming \|citeseerx=10.1.1.147.8995 \|doi=10.1109/ICSE.2009.5070536 \|s2cid=1706697}}</ref> A test-suite – the input/output pairs specify the functionality of the program, possibly captured in [[Assertion (software development)\|assertions]] can be used as a [[test oracle]] to drive the search. This oracle can in fact be divided between the ''bug oracle'' that exposes the faulty behavior, and the ''regression oracle'', which encapsulates the functionality any program repair method must preserve. Note that a test suite is typically incomplete and does not cover all possible cases. Therefore, it is often possible for a validated patch to produce expected outputs for all inputs in the test suite but incorrect outputs for other inputs.<ref name="kali">{{Cite book \|last1=Qi \|first1=Zichao \|title=Proceedings of the 2015 International Symposium on Software Testing and Analysis \|last2=Long \|first2=Fan \|last3=Achour \|first3=Sara \|last4=Rinard \|first4=Martin \|date=2015 \|publisher=ACM \|isbn=978-1-4503-3620-8 \|chapter=An ~~Anlysis~~Analysis of Patch Plausibility and Correctness for Generate-and-Validate Patch Generation Systems \|citeseerx=10.1.1.696.5616 \|doi=10.1145/2771783.2771791 \|s2cid=6845282}}</ref> The existence of such validated but incorrect patches is a major challenge for generate-and-validate techniques.<ref name="kali" /> Recent successful automatic bug-fixing techniques often rely on additional information other than the test suite, such as information learned from previous human patches, to further identify correct patches among validated patches.<ref name="prophet">{{Cite book \|last1=Long \|first1=Fan \|title=Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages \|last2=Rinard \|first2=Martin \|date=2016 \|publisher=ACM \|isbn=978-1-4503-3549-2 \|pages=298–312 \|chapter=Automatic patch generation by learning correct code \|doi=10.1145/2837614.2837617 \|s2cid=6091588}}</ref> Another way to specify the expected behavior is to use [[formal specification]]s<ref name="autofixe">{{Cite journal \|last1=Pei \|first1=Yu \|last2=Furia \|first2=Carlo A. \|last3=Nordio \|first3=Martin \|last4=Wei \|first4=Yi \|last5=Meyer \|first5=Bertrand \|last6=Zeller \|first6=Andreas \|date=May 2014 \|title=Automated Fixing of Programs with Contracts \|journal=IEEE Transactions on Software Engineering \|volume=40 \|issue=5 \|pages=427–449 \|arxiv=1403.1117 \|bibcode=2014arXiv1403.1117P \|doi=10.1109/TSE.2014.2312918 \|s2cid=53302638}}</ref><ref>{{Cite journal \|title=Contract-based Data Structure Repair Using Alloy \|citeseerx=10.1.1.182.4390}}</ref> Verification against full specifications that specify the whole program behavior including functionalities is less common because such specifications are typically not available in practice and the computation cost of such [[formal verification\|verification]] is prohibitive. For specific classes of errors, however, implicit partial specifications are often available. For example, there are targeted bug-fixing techniques validating that the patched program can no longer trigger overflow errors in the same execution path.<ref name="codephage" /> Line 14: === Generate-and-validate === Generate-and-validate approaches compile and test each candidate patch to collect all validated patches that produce expected outputs for all inputs in the test suite.<ref name="genprog2009" /><ref name="kali" /> Such a technique typically starts with a test suite of the program, i.e., a set of [[test ~~cases~~case (software)\|test case]]s, at least one of which exposes the bug.<ref name="genprog2009" /><ref name="prophet" /><ref name="rsrepair">{{Cite book \|last1=Qi \|first1=Yuhua \|title=Proceedings of the 36th International Conference on Software Engineering \|last2=Mao \|first2=Xiaoguang \|last3=Lei \|first3=Yan \|last4=Dai \|first4=Ziying \|last5=Wang \|first5=Chengsong \|date=2014 \|publisher=ACM \|isbn=978-1-4503-2756-5 \|series=ICSE 2014 \|___location=Austin, Texas \|pages=254–265 \|chapter=The Strength of Random Search on Automated Program Repair \|doi=10.1145/2568225.2568254 \|s2cid=14976851}}</ref><ref name="spr">{{Cite book \|last1=Long \|first1=Fan \|title=Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering \|last2=Rinard \|first2=Martin \|date=2015 \|publisher=ACM \|isbn=978-1-4503-3675-8 \|series=ESEC/FSE 2015 \|___location=Bergamo, Italy \|pages=166–178 \|chapter=Staged Program Repair with Condition Synthesis \|citeseerx=10.1.1.696.9059 \|doi=10.1145/2786805.2786811 \|s2cid=5987616}}</ref> An early generate-and-validate bug-fixing systems is GenProg.<ref name="genprog2009" /> The effectiveness of generate-and-validate techniques remains controversial, because they typically do not provide [[#Limitations of automatic bug-fixing\|patch correctness guarantees]].<ref name="kali" /> Nevertheless, the reported results of recent state-of-the-art techniques are generally promising. For example, on systematically collected 69 real world bugs in eight large [[C (programming language)\|C software programs]], the state-of-the-art bug-fixing system Prophet generates correct patches for 18 out of the 69 bugs.<ref name="prophet" /> <!-- mutation based repair --> One way to generate candidate patches is to apply [[program mutation\|mutation operators]] on the original program. Mutation operators manipulate the original program, potentially via its [[abstract syntax tree]] representation, or a more coarse-grained representation such as operating at the [[Statement (programming)\|statement]]-level or [[Block (programming)\|block]]-level. Earlier [[Genetic improvement (computer science)\|genetic improvement]] approaches operate at the statement level and carry out simple delete/replace operations such as deleting an existing statement or replacing an existing statement with another statement in the same source file.<ref name=genprog2009 /><ref name="genprog2012">{{Cite book \|last1=Le Goues \|first1=Claire\|author1-link=Claire Le Goues \|title=2012 34th International Conference on Software Engineering (ICSE) \|last2=Dewey-Vogt \|first2=Michael \|last3=Forrest \|first3=Stephanie \|last4=Weimer \|first4=Westley \|date=2012 \|publisher=IEEE \|isbn=978-1-4673-1067-3 \|pages=3–13 \|chapter=A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each \|citeseerx=10.1.1.661.9690 \|doi=10.1109/ICSE.2012.6227211 \|s2cid=10987936}}</ref> Recent approaches use more fine-grained operators at the [[abstract syntax tree]] level to generate more diverse set of candidate patches.<ref name=spr /> Notably, the statement deletion mutation operator, and more generally removing code, is a reasonable repair strategy, or at least a good fault localization strategy.<ref>{{Cite book \|last1=Qi \|first1=Zichao \|last2=Long \|first2=Fan \|last3=Achour \|first3=Sara \|last4=Rinard \|first4=Martin \|title=Proceedings of the 2015 International Symposium on Software Testing and Analysis \|chapter=An analysis of patch plausibility and correctness for generate-and-validate patch generation systems \|date=2015-07-13 \|chapter-url=http://dx.doi.org/10.1145/2771783.2771791 \|pages=24–36 \|___location=New York, NY, USA \|publisher=ACM \|doi=10.1145/2771783.2771791\|hdl=1721.1/101586 \|isbn=9781450336208 \|s2cid=6845282 }}</ref> <!-- fix templates --> Line 30: SemFix<ref name="semfix" /> uses component-based synthesis.<ref>{{Cite book \|last1=Jha \|first1=Susmit \|url=http://techreports.lib.berkeley.edu/accessPages/EECS-2010-15.html \|title=Oracle-guided component-based program synthesis \|last2=Gulwani \|first2=Sumit \|last3=Seshia \|first3=Sanjit A. \|last4=Tiwari \|first4=Ashish \|date=2010-05-01 \|publisher=ACM \|isbn=9781605587196 \|pages=215–224 \|doi=10.1145/1806799.1806833 \|s2cid=6344783}}</ref> Dynamoth uses dynamic synthesis.<ref>{{Cite book \|last1=Galenson \|first1=Joel \|title=CodeHint: dynamic and interactive synthesis of code snippets \|last2=Reames \|first2=Philip \|last3=Bodik \|first3=Rastislav \|last4=Hartmann \|first4=Björn \|last5=Sen \|first5=Koushik \|date=2014-05-31 \|publisher=ACM \|isbn=9781450327565 \|pages=653–663 \|doi=10.1145/2568225.2568250 \|s2cid=10656182}}</ref> S3<ref>{{Cite book \|last1=Le \|first1=Xuan-Bach D. \|url=https://ink.library.smu.edu.sg/sis_research/3917 \|title=Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2017 \|last2=Chu \|first2=Duc-Hiep \|last3=Lo \|first3=David \|last4=Le Goues \|first4=Claire \|author4-link=Claire Le Goues\|last5=Visser \|first5=Willem \|date=2017-08-21 \|publisher=ACM \|isbn=9781450351058 \|pages=593–604 \|doi=10.1145/3106237.3106309 \|s2cid=1503790}}</ref> is based on [[syntax-guided synthesis]].<ref>{{Cite book \|last1=Alur \|first1=Rajeev \|title=2013 Formal Methods in Computer-Aided Design \|last2=Bodik \|first2=Rastislav \|last3=Juniwal \|first3=Garvit \|last4=Martin \|first4=Milo M. K. \|last5=Raghothaman \|first5=Mukund \|last6=Seshia \|first6=Sanjit A. \|last7=Singh \|first7=Rishabh \|last8=Solar-Lezama \|first8=Armando \|last9=Torlak \|first9=Emina \|author9-link=Emina Torlak\|year=2013 \|isbn=9780983567837 \|pages=1–8 \|chapter=Syntax-guided synthesis \|citeseerx=10.1.1.377.2829 \|doi=10.1109/fmcad.2013.6679385 \|last10=Udupa \|first10=Abhishek}}</ref> SearchRepair<ref name="searchrepair">{{Cite book \|last1=Ke \|first1=Yalin \|title=Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering \|last2=Stolee \|first2=Kathryn \|last3=Le Goues \|first3=Claire \|author3-link=Claire Le Goues\|last4=Brun \|first4=Yuriy \|date=2015 \|publisher=ACM \|isbn=978-1-5090-0025-8 \|series=ASE 2015 \|___location=Lincoln, Nebraska \|pages=295–306 \|chapter=Repairing Programs with Semantic Code Search \|doi=10.1109/ASE.2015.60 \|s2cid=16361458}}</ref> converts potential patches into an SMT formula and queries candidate patches that allow the patched program to pass all supplied test cases. === Data-driven === Line 61: == Overfitting == Sometimes, in test-suite based program repair, tools generate patches that pass the test suite, yet are actually incorrect, this is known as the "overfitting" problem.<ref name="overfitting">{{Cite book \|last1=Smith \|first1=Edward K. \|title=Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering \|last2=Barr \|first2=Earl T. \|last3=Le Goues \|first3=Claire\|author3-link=Claire Le Goues \|last4=Brun \|first4=Yuriy \|date=2015 \|publisher=ACM \|isbn=978-1-4503-3675-8 \|series=ESEC/FSE 2015 \|___location=New York, New York \|pages=532–543 \|chapter=Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair \|doi=10.1145/2786805.2786825 \|s2cid=6300790}}</ref> "Overfitting" in this context refers to the fact that the patch overfits to the test inputs. There are different kinds of overfitting: incomplete fixing means that only some buggy inputs are fixed, regression introduction means some previously working features are broken after the patch (because they were poorly tested). Early prototypes for automatic repair suffered a lot from overfitting: on the Manybugs C benchmark, Qi et al.<ref name="kali" /> reported that 104/110 of plausible GenProg patches were overfitting. In the context of synthesis-based repair, Le et al.<ref>{{Cite journal \|last1=Le \|first1=Xuan Bach D. \|last2=Thung \|first2=Ferdian \|last3=Lo \|first3=David \|last4=Goues \|first4=Claire Le \|date=2018-03-02 \|title=Overfitting in semantics-based automated program repair \|url=https://ink.library.smu.edu.sg/sis_research/3986 \|journal=Empirical Software Engineering \|language=en \|volume=23 \|issue=5 \|pages=3007–3033 \|doi=10.1007/s10664-017-9577-2 \|issn=1382-3256 \|s2cid=3635768}}</ref> obtained more than 80% of overfitting patches. One way to avoid overfitting is to filter out the generated patches. This can be done based on dynamic analysis.<ref>{{Cite book\|last1=Xin\|first1=Qi\|last2=Reiss\|first2=Steven P.\|title=Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis \|chapter=Identifying test-suite-overfitted patches through test case generation \|date=2017-07-10\|chapter-url=http://dx.doi.org/10.1145/3092703.3092718\|pages=226–236\|___location=New York, NY, USA\|publisher=ACM\|doi=10.1145/3092703.3092718\|isbn=978-1-4503-5076-1\|s2cid=20562134}}</ref> Alternatively, Tian et al. propose heuristic approaches to assess patch correctness. <ref>{{cite news \|last1=Tian \|first1=Haoye \|last2=Liu \|first2=Kui \|last3=Kaboré \|first3=Abdoul Kader \|last4=Koyuncu \|first4=Anil \|last5=Li \|first5=Li \|last6=Klein \|first6=Jacques \|last7=Bissyandé \|first7=Tegawendé F. \|title=Evaluating representation learning of code changes for predicting patch correctness in program repair \|url=https://dl.acm.org/doi/abs/10.1145/3324884.3416532 \|work=Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering \|publisher=Association for Computing Machinery \|date=27 January 2021 \|pages=981–992 \|doi=10.1145/3324884.3416532\|isbn=9781450367684 }}</ref><ref>{{cite book \|last1=Tian \|first1=Haoye \|last2=Tang \|first2=Xunzhu \|last3=Habib \|first3=Andrew \|last4=Wang \|first4=Shangwen \|last5=Liu \|first5=Kui \|last6=Xia \|first6=Xin \|last7=Klein \|first7=Jacques \|last8=BissyandÉ \|first8=TegawendÉ F. \|title=Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering \|chapter=Is this Change the Answer to that Problem?: Correlating Descriptions of Bug and Code Changes for Evaluating Patch Correctness \|date=5 January 2023 \|pages=1–13 \|doi=10.1145/3551349.3556914 \|chapter-url=https://dl.acm.org/doi/abs/10.1145/3551349.3556914 \|publisher=Association for Computing Machinery\|s2cid=251403079 \|arxiv=2208.04125 \|isbn=9781450394758 }}</ref> == Limitations of automatic bug-fixing == Line 79: In C, the Manybugs benchmark collected by GenProg authors contains 69 real world defects and it is widely used to evaluate many other bug-fixing tools for C.<ref name=genprog2012 /><ref name=prophet /><ref name=spr /><ref name=angelix /> In [[Java (programming language)\|Java]], the main benchmark is Defects4J now extensively used in most research papers on program repair for Java.<ref name="capgen">{{Cite book \|last1=Wen \|first1=Ming \|last2=Chen \|first2=Junjie \|last3=Wu \|first3=Rongxin \|last4=Hao \|first4=Dan \|last5=Cheung \|first5=Shing-Chi \|title=Proceedings of the 40th International Conference on Software Engineering \|chapter=Context-aware patch generation for better automated program repair \|date=2018 \|___location=New York, New York, USA \|publisher=ACM Press \|pages=1–11 \|doi=10.1145/3180155.3180233 \|isbn=9781450356381 \|s2cid=3374770\|url=https://repository.hkust.edu.hk/ir/Record/1783.1-92186 \|chapter-url=~~http~~https://repository.ust.hk/ir/Record/1783.1-92186 }}</ref><ref>{{Cite book \|last1=Hua \|first1=Jinru \|last2=Zhang \|first2=Mengshi \|last3=Wang \|first3=Kaiyuan \|last4=Khurshid \|first4=Sarfraz \|title=Proceedings of the 40th International Conference on Software Engineering \|chapter=Towards practical program repair with on-demand candidate generation \|date=2018 \|___location=New York, New York, USA \|publisher=ACM Press \|pages=12–23 \|doi=10.1145/3180155.3180245 \|isbn=9781450356381 \|s2cid=49666327\|doi-access=free }}</ref> Alternative benchmarks exist, such as the Quixbugs benchmark,<ref>{{Cite book \|last1=Lin \|first1=Derrick \|last2=Koppel \|first2=James \|last3=Chen \|first3=Angela \|last4=Solar-Lezama \|first4=Armando \|title=Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity \|chapter=QuixBugs: A multi-lingual program repair benchmark set based on the quixey challenge \|date=2017 \|___location=New York, New York, USA \|publisher=ACM Press \|pages=55–56 \|doi=10.1145/3135932.3135941 \|isbn=9781450355148 \|doi-access=free}}</ref> which contains original bugs for program repair. Other benchmarks of Java bugs include Bugs.jar,<ref>{{Cite book \|last1=Saha \|first1=Ripon K. \|last2=Lyu \|first2=Yingjun \|last3=Lam \|first3=Wing \|last4=Yoshida \|first4=Hiroaki \|last5=Prasad \|first5=Mukul R. \|title=Proceedings of the 15th International Conference on Mining Software Repositories \|chapter=Bugs.jar \|date=2018 \|chapter-url=http://dl.acm.org/citation.cfm?doid=3196398.3196473 \|series=MSR '18 \|language=en \|pages=10–13 \|doi=10.1145/3196398.3196473 \|isbn=9781450357166 \|s2cid=50770093}}</ref> based on past commits. == Example tools == Line 93: * LeakFix:<ref name=leakfix /> A tool that automatically fixes memory leaks in C programs. * Prophet:<ref name=prophet /> The first generate-and-validate tool that uses machine learning techniques to learn useful knowledge from past human patches to recognize correct patches. It is evaluated on the same benchmark as GenProg and generate correct patches (i.e., equivalent to human patches) for 18 out of 69 cases.<ref name=prophet /> * SearchRepair:<ref name=searchrepair /> A tool for replacing buggy code using snippets of code from elsewhere. It is evaluated on the IntroClass benchmark<ref name="introclassmanybugs">{{Cite journal \|last1=Le Goues \|first1=Claire\|author1-link=Claire Le Goues \|last2=Holtschulte \|first2=Neal \|last3=Smith \|first3=Edward \|last4=Brun \|first4=Yuriy \|last5=Devanbu \|first5=Premkumar \|last6=Forrest \|first6=Stephanie \|last7=Weimer \|first7=Westley \|date=2015 \|title=The Many ''Bugs'' and Intro ''Class'' Benchmarks for Automated Repair of C Programs \|journal=IEEE Transactions on Software Engineering \|volume=41 \|issue=12 \|pages=1236–1256 \|doi=10.1109/TSE.2015.2454513 \|doi-access=free}}</ref> and generates much higher quality patches on that benchmark than GenProg, RSRepair, and AE. * Angelix:<ref name=angelix /> An improved solver-based bug-fixing tool. It is evaluated on the GenProg benchmark. For 10 out of the 69 cases, it generate patches that is equivalent to human patches. * Learn2Fix:<ref name=learn2fix /> The first human-in-the-loop semi-automatic repair tool. Extends GenProg to learn the condition under which a semantic bug is observed by systematic queries to the user who is reporting the bug. Only works for programs that take and produce integers. Line 120: == External links == * {{URL\|~~http~~https://program-repair.org/}} datasets, tools, etc., related to automated program repair research. [[Category:Debugging]]