Content deleted Content added
Citation bot (talk | contribs) Alter: url, pages. URLs might have been anonymized. Add: date, s2cid, isbn, hdl, pages, authors 1-1. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Abductive | #UCB_webform 2036/3849 |
→Data-driven: different kinds of loss function for seq2seq |
||
Line 40:
[[Machine learning]] techniques can improve the effectiveness of automatic bug-fixing systems.<ref name="prophet" /> One example of such techniques learns from past successful patches from human developers collected from [[open-source software|open source]] [[software repository|repositories]] in [[GitHub]] and [[SourceForge]].<ref name="prophet" /> It then use the learned information to recognize and prioritize potentially correct patches among all generated candidate patches.<ref name="prophet" /> Alternatively, patches can be directly mined from existing sources. Example approaches include mining patches from donor applications<ref name="codephage" /> or from QA web sites.<ref name="QAFix" /> Learning can done online, aka continual learning, with the known precedent of online learning of patches from the stream of open source build results from continuous integration.<ref>{{Cite journal|last1=Baudry|first1=Benoit|last2=Chen|first2=Zimin|last3=Etemadi|first3=Khashayar|last4=Fu|first4=Han|last5=Ginelli|first5=Davide|last6=Kommrusch|first6=Steve|last7=Martinez|first7=Matias|last8=Monperrus|first8=Martin|last9=Ron Arteaga|first9=Javier|last10=Ye|first10=He|last11=Yu|first11=Zhongxing|date=2021|title=A Software-Repair Robot Based on Continual Learning|url=https://arxiv.org/abs/2012.06824|journal=IEEE Software|volume=38|issue=4|pages=28–35|doi=10.1109/MS.2021.3070743|issn=0740-7459|arxiv=2012.06824|s2cid=229156186}}</ref>
SequenceR uses [[Neural machine translation|sequence-to-sequence learning]] on source code in order to generate one-line patches.<ref>{{Cite journal |last1=Chen |first1=Zimin |last2=Kommrusch |first2=Steve James |last3=Tufano |first3=Michele |last4=Pouchet |first4=Louis-Noel |last5=Poshyvanyk |first5=Denys |last6=Monperrus |first6=Martin |date=2019 |title=SEQUENCER: Sequence-to-Sequence Learning for End-to-End Program Repair |journal=IEEE Transactions on Software Engineering |pages=1 |arxiv=1901.01808 |doi=10.1109/TSE.2019.2940179 |issn=0098-5589 |s2cid=57573711}}</ref> It defines a neural network architecture that works well with source code, with the copy mechanism that allows to produce patches with tokens that are not in the learned vocabulary. Those tokens are taken from the code of the Java class under repair. Different kinds of loss functions can be used for optimizing the sequence-to-sequence neural network: the most common ones are purely static, typically a token-based cross entropy loss, and advanced ones also include dynamic information, e.g. through loss amplification<ref>{{Cite journal |last=Ye |first=He |last2=Martinez |first2=Matias |last3=Monperrus |first3=Martin |date=2022-05-21 |title=Neural program repair with execution-based backpropagation |url=https://arxiv.org/abs/2105.04123 |journal=Proceedings of the 44th International Conference on Software Engineering |language=en |___location=Pittsburgh Pennsylvania |publisher=ACM |pages=1506–1518 |doi=10.1145/3510003.3510222 |isbn=978-1-4503-9221-1}}</ref>.
Getafix<ref name=":0">{{Cite journal |last1=Bader |first1=Johannes |last2=Scott |first2=Andrew |last3=Pradel |first3=Michael |last4=Chandra |first4=Satish |date=2019-10-10 |title=Getafix: learning to fix bugs automatically |journal=Proceedings of the ACM on Programming Languages |volume=3 |issue=OOPSLA |pages=159:1–159:27 |doi=10.1145/3360585|doi-access=free }}</ref> is a language-agnostic approach developed and used in production at [[Facebook, Inc.|Facebook]]. Given a sample of [[Commit (version control)|code commits]] where engineers fixed a certain kind of bug, it learns human-like fix patterns that apply to future bugs of the same kind. Besides using Facebook's own [[Repository (version control)|code repositories]] as training data, Getafix learnt some fixes from [[open source]] Java repositories. When new bugs get detected, Getafix applies its previously learnt patterns to produce candidate fixes and ranks them within seconds. It presents only the top-ranked fix for final validation by tools or an engineer, in order to save resources and ideally be so fast that no human time was spent on fixing the same bug, yet.
|