Revision as of 02:24, 30 August 2021 edit OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: doi added to citation with #oabot. ← Previous edit		Revision as of 13:04, 30 August 2021 edit undo WikiCleanerBot (talk \| contribs) Bots 1,007,735 edits m v2.04b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation) Tag: WPCleaner Next edit →
Line 64: Sometimes, in test-suite based program repair, tools generate patches that pass the test suite, yet are actually incorrect, this is known as the "overfitting" problem.<ref name="overfitting">{{Cite book \|last=Smith \|first=Edward K. \|title=Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering \|last2=Barr \|first2=Earl T. \|last3=Le Goues \|first3=Claire \|last4=Brun \|first4=Yuriy \|date=2015 \|publisher=ACM \|isbn=978-1-4503-3675-8 \|series=ESEC/FSE 2015 \|___location=New York, New York \|pages=532–543 \|chapter=Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair \|doi=10.1145/2786805.2786825 \|s2cid=6300790}}</ref> "Overfitting" in this context refers to the fact that the patch overfits to the test inputs. There are different kinds of overfitting:<ref name="Yu2018">{{Cite journal \|last=Yu \|first=Zhongxing \|last2=Martinez \|first2=Matias \|last3=Danglot \|first3=Benjamin \|last4=Durieux \|first4=Thomas \|last5=Monperrus \|first5=Martin \|year=2018 \|title=Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system \|journal=Empirical Software Engineering \|volume=24 \|pages=33–67 \|arxiv=1810.10614 \|bibcode=2018arXiv181010614Y \|doi=10.1007/s10664-018-9619-4 \|issn=1382-3256 \|s2cid=21659819}}</ref> incomplete fixing means that only some buggy inputs are fixed, regression introduction means some previously working features are broken after the patch (because they were poorly tested). Early prototypes for automatic repair suffered a lot from overfitting: on the Manybugs C benchmark, Qi et al.<ref name="kali" /> reported that 104/110 of plausible GenProg patches were overfitting; on the Defects4J Java benchmark, Martinez et al.<ref name="martinezdefects4j">{{Cite journal \|last=Martinez \|first=Matias \|last2=Durieux \|first2=Thomas \|last3=Sommerard \|first3=Romain \|last4=Xuan \|first4=Jifeng \|last5=Monperrus \|first5=Martin \|date=2016-10-25 \|title=Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset \|url=https://hal.archives-ouvertes.fr/hal-01387556/document \|journal=Empirical Software Engineering \|language=en \|volume=22 \|issue=4 \|pages=1936–1964 \|arxiv=1811.02429 \|doi=10.1007/s10664-016-9470-4 \|issn=1382-3256 \|s2cid=24538587}}</ref> reported that 73/84 plausible patches as overfitting. In the context of synthesis-based repair, Le et al.<ref>{{Cite journal \|last=Le \|first=Xuan Bach D. \|last2=Thung \|first2=Ferdian \|last3=Lo \|first3=David \|last4=Goues \|first4=Claire Le \|date=2018-03-02 \|title=Overfitting in semantics-based automated program repair \|url=https://ink.library.smu.edu.sg/sis_research/3986 \|journal=Empirical Software Engineering \|language=en \|volume=23 \|issue=5 \|pages=3007–3033 \|doi=10.1007/s10664-017-9577-2 \|issn=1382-3256 \|s2cid=3635768}}</ref> obtained more than 80% of overfitting patches. One way to avoid overfitting is to filter out the generated patches. This can be done based on dynamic analysis,<ref>{{Cite journal\|last=Xin\|first=Qi\|last2=Reiss\|first2=Steven P.\|date=2017-07-10\|title=Identifying test-suite-overfitted patches through test case generation\|url=http://dx.doi.org/10.1145/3092703.3092718\|journal=Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis\|___location=New York, NY, USA\|publisher=ACM\|doi=10.1145/3092703.3092718\|isbn=978-1-4503-5076-1}}</ref>, or static code analysis of the generated patches.<ref>{{Cite journal\|last=Ye\|first=He\|last2=Gu\|first2=Jian\|last3=Martinez\|first3=Matias\|last4=Durieux\|first4=Thomas\|last5=Monperrus\|first5=Martin\|date=2021\|title=Automated Classification of Overfitting Patches with Statically Extracted Code Features\|url=https://arxiv.org/abs/1910.12057\|journal=IEEE Transactions on Software Engineering\|pages=1–1\|doi=10.1109/tse.2021.3071750\|issn=0098-5589\|arxiv=1910.12057}}</ref>. When a reference patch is available, a state of the art technique is to generate tests based on the patched version, such that the generated tests capture the expected behavior. While the sampling of the input ___domain by test generation is incomplete by construction, it has been shown to be effective at detecting overfitting patches, and even at finding human errors done during manual classification of patches.<ref>{{Cite journal\|last=Ye\|first=He\|last2=Martinez\|first2=Matias\|last3=Monperrus\|first3=Martin\|date=2021\|title=Automated patch assessment for program repair at scale\|url=https://arxiv.org/abs/1909.13694\|journal=Empirical Software Engineering\|language=en\|volume=26\|issue=2\|pages=20\|doi=10.1007/s10664-020-09920-w\|issn=1382-3256\|arxiv=1909.13694}}</ref> == Limitations of automatic bug-fixing ==

Automatic bug fixing: Difference between revisions