Automatic bug fixing: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: url, pages. URLs might have been anonymized. Add: issue, volume, s2cid, arxiv, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Whoop whoop pull up | Category:Debugging | #UCB_Category 36/62
Data-driven: syntax fix
Line 38:
=== Data-driven ===
 
[[Machine learning]] techniques can improve the effectiveness of automatic bug-fixing systems.<ref name="prophet" /> One example of such techniques learns from past successful patches from human developers collected from [[open-source software|open source]] [[software repository|repositories]] in [[GitHub]] and [[SourceForge]].<ref name="prophet" /> It then use the learned information to recognize and prioritize potentially correct patches among all generated candidate patches.<ref name="prophet" /> Alternatively, patches can be directly mined from existing sources. Example approaches include mining patches from donor applications<ref name="codephage" /> or from QA web sites.<ref name="QAFix" /> Learning can be done online, aka continual learning, with the known precedent of online learning of patches from the stream of open source build results from continuous integration.<ref>{{Cite journal|last1=Baudry|first1=Benoit|last2=Chen|first2=Zimin|last3=Etemadi|first3=Khashayar|last4=Fu|first4=Han|last5=Ginelli|first5=Davide|last6=Kommrusch|first6=Steve|last7=Martinez|first7=Matias|last8=Monperrus|first8=Martin|last9=Ron Arteaga|first9=Javier|last10=Ye|first10=He|last11=Yu|first11=Zhongxing|date=2021|title=A Software-Repair Robot Based on Continual Learning|url=https://arxiv.org/abs/2012.06824|journal=IEEE Software|volume=38|issue=4|pages=28–35|doi=10.1109/MS.2021.3070743|issn=0740-7459|arxiv=2012.06824|s2cid=229156186}}</ref>
 
SequenceR uses [[Neural machine translation|sequence-to-sequence learning]] on source code in order to generate one-line patches.<ref>{{Cite journal |last1=Chen |first1=Zimin |last2=Kommrusch |first2=Steve James |last3=Tufano |first3=Michele |last4=Pouchet |first4=Louis-Noel |last5=Poshyvanyk |first5=Denys |last6=Monperrus |first6=Martin |date=2019 |title=SEQUENCER: Sequence-to-Sequence Learning for End-to-End Program Repair |journal=IEEE Transactions on Software Engineering |pages=1 |arxiv=1901.01808 |doi=10.1109/TSE.2019.2940179 |issn=0098-5589 |s2cid=57573711}}</ref> It defines a neural network architecture that works well with source code, with the copy mechanism that allows to produce patches with tokens that are not in the learned vocabulary. Those tokens are taken from the code of the Java class under repair. Different kinds of loss functions can be used for optimizing the sequence-to-sequence neural network: the most common ones are purely static, typically a token-based cross entropy loss, and advanced ones also include dynamic information, e.g. through loss amplification.<ref>{{Cite journal |last1=Ye |first1=He |last2=Martinez |first2=Matias |last3=Monperrus |first3=Martin |date=2022-05-21 |title=Neural program repair with execution-based backpropagation |url=https://arxiv.org/abs/2105.04123 |journal=Proceedings of the 44th International Conference on Software Engineering |language=en |___location=Pittsburgh Pennsylvania |publisher=ACM |pages=1506–1518 |doi=10.1145/3510003.3510222 |arxiv=2105.04123 |isbn=978-1-4503-9221-1|s2cid=234340063 }}</ref>