Automatic bug fixing: Difference between revisions

Content deleted Content added
Overfitting: mentions test generation based on reference patch
Data-driven: add continual learning for repair
Line 38:
=== Data-driven ===
 
[[Machine learning]] techniques can improve the effectiveness of automatic bug-fixing systems.<ref name="prophet" /> One example of such techniques learns from past successful patches from human developers collected from [[open-source software|open source]] [[software repository|repositories]] in [[GitHub]] and [[SourceForge]].<ref name="prophet" /> It then use the learned information to recognize and prioritize potentially correct patches among all generated candidate patches.<ref name="prophet" /> Alternatively, patches can be directly mined from existing sources. Example approaches include mining patches from donor applications<ref name="codephage" /> or from QA web sites.<ref name="QAFix" /> Learning can done online, aka continual learning, with the known precedent of online learning of patches from the stream of open source build results from continuous integration.<ref>{{Cite journal|last=Baudry|first=Benoit|last2=Chen|first2=Zimin|last3=Etemadi|first3=Khashayar|last4=Fu|first4=Han|last5=Ginelli|first5=Davide|last6=Kommrusch|first6=Steve|last7=Martinez|first7=Matias|last8=Monperrus|first8=Martin|last9=Ron Arteaga|first9=Javier|last10=Ye|first10=He|last11=Yu|first11=Zhongxing|date=2021|title=A Software-Repair Robot Based on Continual Learning|url=https://arxiv.org/abs/2012.06824|journal=IEEE Software|volume=38|issue=4|pages=28–35|doi=10.1109/MS.2021.3070743|issn=0740-7459}}</ref>
 
SequenceR uses [[Neural machine translation|sequence-to-sequence learning]] on source code in order to generate one-line patches.<ref>{{Cite journal |last=Chen |first=Zimin |last2=Kommrusch |first2=Steve James |last3=Tufano |first3=Michele |last4=Pouchet |first4=Louis-Noel |last5=Poshyvanyk |first5=Denys |last6=Monperrus |first6=Martin |date=2019 |title=SEQUENCER: Sequence-to-Sequence Learning for End-to-End Program Repair |journal=IEEE Transactions on Software Engineering |pages=1 |arxiv=1901.01808 |doi=10.1109/TSE.2019.2940179 |issn=0098-5589 |s2cid=57573711}}</ref> It defines a neural network architecture that works well with source code, with the copy mechanism that allows to produce patches with tokens that are not in the learned vocabulary. Those tokens are taken from the code of the Java class under repair.