Symbolic execution: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 15:42, 3 November 2022 edit 105.161.213.89 (talk) →Example Tags: Reverted Mobile edit Mobile web edit ← Previous edit		Latest revision as of 06:39, 24 May 2025 edit undo Citation bot (talk \| contribs) Bots 5,870,714 edits Added url. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 773/892
(20 intermediate revisions by 16 users not shown)
Line 1: {{Short description\|Technique for Program Analysis}} In [[computer science]], '''symbolic execution '''(also '''symbolic evaluation''' or '''symbex''') is a means of [[Program analysis\|analyzing a program]] to determine what [[Input (computer science)\|inputs]] cause each part of a program to [[Execution (computing)\|execute]]. An [[Interpreter (computing)\|interpreter]] follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would. It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch. Finally, the possible inputs that trigger a branch can be determined by solving the constraints. Line 4 ⟶ 5: ==Example== Consider the program below, which reads in a value and fails if the input is 126. <syntaxhighlight lang="c" line="1"> Line 26 ⟶ 27: ===Path explosion=== Symbolically executing all feasible program paths does not scale to large programs. The number of feasible paths in a program grows exponentially with an increase in program size and can even be infinite in the case of programs with unbounded loop iterations.<ref>{{cite book\|last=Anand\|first=Saswat\|author2=Patrice Godefroid \|author3=Nikolai Tillmann \|chapter=Demand-Driven Compositional Symbolic Execution\|title=Tools and Algorithms for the Construction and Analysis of Systems\|year=2008\|volume=4963\|pages=367–381\|doi=10.1007/978-3-540-78800-3_28\|series=Lecture Notes in Computer Science\|isbn=978-3-540-78799-0}}</ref> Solutions to the ''path explosion'' problem generally use either heuristics for path-finding to increase code coverage,<ref>{{cite book\|last=Ma\|first=Kin-Keng\|author2=Khoo Yit Phang \|author3=Jeffrey S. Foster \|author4=Michael Hicks \|chapter=Directed Symbolic Execution\|title=Proceedings of the 18th International Conference on Statis Analysis\|year=2011\|pages=95–111\|chapter-url=http://dl.acm.org/citation.cfm?id=2041563\|accessdate=2013-04-03\|isbn=9783642237010}}</ref> reduce execution time by parallelizing independent paths,<ref>{{cite book\|last=Staats\|first=Matt\|author2=Corina Pasareanu \|s2cid=9898522\|chapter=Parallel symbolic execution for structural test generation\|title=Proceedings of the 19th International Symposium on Software Testing and Analysis\|year=2010\|pages=183–194\|doi=10.1145/1831708.1831732\|isbn=9781605588230}}</ref> or by merging similar paths.<ref>{{Cite book\|chapter= Efficient State Merging in Symbolic Execution\|publisher = ACM\|title= Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation\|date = 2012-01-01\|___location = New York, NY, USA\|isbn = 978-1-4503-1205-9\|pages = 193–204\|series =<!---->\|doi = 10.1145/2254064.2254088\|first1 = Volodymyr\|last1 = Kuznetsov\|first2 = Johannes\|last2 = Kinder\|first3 = Stefan\|last3 = Bucur\|first4 = George\|last4 = Candea\|s2cid = 135107\|citeseerx = 10.1.1.348.823}}</ref> One example of merging is ''veritesting'', which "employs static symbolic execution to amplify the effect of dynamic symbolic execution".<ref>{{Cite web\|url=https://cacm.acm.org/magazines/2016/6/202649-enhancing-symbolic-execution-with-veritesting/fulltext\|title=Enhancing Symbolic Execution with Veritesting}}</ref>▼ {{Main\|Path explosion}} ▲Symbolically executing all feasible program paths does not scale to large programs. The number of feasible paths in a program grows exponentially with an increase in program size and can even be infinite in the case of programs with unbounded loop iterations.<ref>{{cite book\|last=Anand\|first=Saswat\|author2=Patrice Godefroid \|author3=Nikolai Tillmann \|chapter=Demand-Driven Compositional Symbolic Execution\|title=Tools and Algorithms for the Construction and Analysis of Systems\|year=2008\|volume=4963\|pages=367–381\|doi=10.1007/978-3-540-78800-3_28\|series=Lecture Notes in Computer Science\|isbn=978-3-540-78799-0}}</ref> Solutions to the ''path explosion'' problem generally use either heuristics for path-finding to increase code coverage,<ref>{{cite book\|last=Ma\|first=Kin-Keng\|author2=Khoo Yit Phang \|author3=Jeffrey S. Foster \|author4=Michael Hicks \|chapter=Directed Symbolic Execution\|title=Proceedings of the 18th International Conference on Statis Analysis\|year=2011\|pages=95–111\|publisher=Springer \|chapter-url=http://dl.acm.org/citation.cfm?id=2041563\|accessdate=2013-04-03\|isbn=9783642237010}}</ref> reduce execution time by parallelizing independent paths,<ref>{{cite book\|last=Staats\|first=Matt\|author2=Corina Pasareanu \|author2-link= Corina Păsăreanu \|s2cid=9898522\|chapter=Parallel symbolic execution for structural test generation\|title=Proceedings of the 19th International Symposium on Software Testing and Analysis\|year=2010\|pages=183–194\|doi=10.1145/1831708.1831732\|hdl=11299/217417 \|isbn=9781605588230}}</ref> or by merging similar paths.<ref>{{Cite book\|chapter= Efficient State Merging in Symbolic Execution\|publisher = ACM\|title= Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation\|date = 2012-01-01\|___location = New York, NY, USA\|isbn = 978-1-4503-1205-9\|pages = 193–204\|series =<!---->\|doi = 10.1145/2254064.2254088\|first1 = Volodymyr\|last1 = Kuznetsov\|first2 = Johannes\|last2 = Kinder\|first3 = Stefan\|last3 = Bucur\|first4 = George\|last4 = Candea\|s2cid = 135107\|citeseerx = 10.1.1.348.823}}</ref> One example of merging is ''veritesting'', which "employs static symbolic execution to amplify the effect of dynamic symbolic execution".<ref>{{Cite web\|url=https://cacm.acm.org/magazines/2016/6/202649-enhancing-symbolic-execution-with-veritesting/fulltext\|title=Enhancing Symbolic Execution with Veritesting\|date=June 2016 }}</ref> ===Program-dependent efficiency=== Line 33 ⟶ 37: === Memory aliasing === Symbolic execution is harder when the same memory ___location can be accessed through different names ([[aliasing (computing) \|aliasing]]). Aliasing cannot always be recognized statically, so the symbolic execution engine can't recognize that a change to the value of one variable also changes the other.<ref name=DeMillo1991>{{Cite journal\| title = Constraint-Based Automatic Test Data Generation\| journal = IEEE Transactions on Software Engineering\| date = 1991-09-01\| pages = 900–910\| volume = 17\| issue = 9\| first1 = Rich\| last1 = DeMillo\| first2 = Jeff\| last2 = Offutt\| doi = 10.1109/32.92910}}</ref> === Arrays === Line 40 ⟶ 44: === Environment interactions === Programs interact with their environment by performing [[~~System~~system call~~\|system calls~~]]s, receiving signals, etc. Consistency problems may arise when execution reaches components that are not under control of the symbolic execution tool (e.g., kernel or libraries). Consider the following example:<syntaxhighlight lang="c" line="1"> int main() { Line 55 ⟶ 59: </syntaxhighlight>This program opens a file and, based on some condition, writes different kind of data to the file. It then later reads back the written data. In theory, symbolic execution would fork two paths at line 5 and each path from there on would have its own copy of the file. The statement at line 11 would therefore return data that is consistent with the value of "condition" at line 5. In practice, file operations are implemented as system calls in the kernel, and are outside the control of the symbolic execution tool. The main approaches to address this challenge are: '''Executing calls to the environment directly.''' The advantage of this approach is that it is simple to implement. The disadvantage is that the side effects of such calls will clobber all states managed by the symbolic execution engine. In the example above, the instruction at line 11 would return "some datasome other data" or "some other ~~datasomedata~~datasome data" depending on the sequential ordering of the states. '''Modeling the environment.''' In this case, the engine instruments the system calls with a model that simulates their effects and that keeps all the side effects in per-state storage. The advantage is that one would get correct results when symbolically executing programs that interact with the environment. The disadvantage is that one needs to implement and maintain many potentially complex models of system calls. Tools such as KLEE,<ref>{{Cite journal\|title = KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs\|url = http://dl.acm.org/citation.cfm?id=1855741.1855756\|journal = Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation\|date = 2008-01-01\|pages = 209–224\|series = OSDI'08\|first1 = Cristian\|last1 = Cadar\|first2 = Daniel\|last2 = Dunbar\|first3 = Dawson\|last3 = Engler}}</ref> Cloud9, and Otter<ref>{{cite web\|title=MultiOtter: Multiprocess Symbolic Execution\|url=https://www.cs.umd.edu/~mwh/papers/multiotter.pdf\|first1 = Jonathan\|last1 = Turpie\|first2 = Elnatan\|last2 = Reisner\|first3 = Jeffrey\|last3 = Foster\|first4 = Michael \|last4 = Hicks}}</ref> take this approach by implementing models for file system operations, sockets, [[Inter-process communication\|IPC]], etc. '''Forking the entire system state.''' Symbolic execution tools based on virtual machines solve the environment problem by forking the entire VM state. For example, in S2E<ref>{{Cite journal\|title = The S2E Platform: Design, Implementation, and Applications\|journal = ACM Trans. Comput. Syst.\|date = 2012-02-01\|issn = 0734-2071\|pages = 2:1–2:49\|volume = 30\|issue = 1\|doi = 10.1145/2110356.2110358\|first1 = Vitaly\|last1 = Chipounov\|first2 = Volodymyr\|last2 = Kuznetsov\|first3 = George\|last3 = Candea\|s2cid = 16905399\| url=http://infoscience.epfl.ch/record/175469 }}</ref> each state is an independent VM snapshot that can be executed separately. This approach alleviates the need for writing and maintaining complex models and allows virtually any program binary to be executed symbolically. However, it has higher memory usage overheads (VM snapshots may be large). ==Tools== Line 97 ⟶ 101: \| VineIL / Native \| http://bitblaze.cs.berkeley.edu/fuzzball.html \| {{free \| yes}} \|- \| GenSym \| LLVM \| https://github.com/Generative-Program-Analysis/GenSym \| {{free \| yes}} \|- Line 172 ⟶ 181: \| C \| https://bitbucket.org/khooyp/otter/overview \| {{free \| yes}} \|- \| Owi <ref>{{Cite journal \| doi = 10.22152/programming-journal.org/2025/9/3 \| year = 2024 \| last1 = Andrès \| first1 = Léo \| title = Owi: Performant Parallel Symbolic Execution Made Easy, an Application to WebAssembly \| journal = The Art, Science, and Engineering of Programming \| volume= 9 \| arxiv= 2412.06391 \| url = https://programming-journal.org/2025/9/3/ }}</ref> \| [[C (programming language)\|C]], [[C++]], [[Rust (programming language)\|Rust]], [[WebAssembly]], [[Zig (programming language)\|Zig]] \| https://github.com/ocamlpro/owi \| {{free \| yes}} \|- Line 233 ⟶ 257: \| https://core.ac.uk/download/pdf/24067593.pdf \| {{proprietary\|no}} \|- \| SymCC \| LLVM \| https://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html \| {{free \| yes}} \|- \| Triton \| x86 ~~and~~, x86-64, ARM and AArch64 \| https://triton.quarkslab.com \| {{free \| yes}} Line 250 ⟶ 279: ==History== The concept of symbolic execution was introduced academically in the 1970s with descriptions of: the Select system,<ref>Robert S. Boyer and Bernard Elspas and Karl N. Levitt SELECT--a formal system for testing and debugging programs by symbolic execution, Proceedings of the International Conference on Reliable Software, 1975, page 234--245, Los Angeles, California</ref> the EFFIGY system,<ref>James C. King, Symbolic execution and program testing, Communications of the ACM, volume 19, number 7, 1976, 385--394</ref> the DISSECT system,<ref>William E. Howden, Experiments with a symbolic evaluation system, Proceedings, National Computer Conference, 1976.</ref> and Clarke's system.<ref>Lori A. Clarke, A program testing system, ACM 76: Proceedings of the Annual Conference, 1976, pages 488-491, Houston, Texas, United States</ref> Line 271 ⟶ 300: * [https://ti.arc.nasa.gov/publications/3007/download/ Symbolic Execution for Software Testing in Practice – Preliminary Assessment] * [https://github.com/saswatanand/symexbib A bibliography of papers related to symbolic execution] {{Software testing}} Line 277 ⟶ 305: {{DEFAULTSORT:Symbolic Execution}} [[Category:Abstract interpretation]] [[Category:~~Computer~~Program ~~science~~analysis]]