Self-modifying code: Difference between revisions

Content deleted Content added
CE
Application in low and high level languages: switched creating to creation in {'''creation or modification of source code statements'''} for grammar reasons
 
(11 intermediate revisions by 6 users not shown)
Line 4:
{{Use list-defined references|date=December 2021}}
{{Use American English|date=January 2019}}
 
In [[computer science]], '''self-modifying code''' ('''SMC''' or '''SMoC''') is [[source code|code]] that alters its own [[instruction (computer science)|instruction]]s while it is [[execution (computing)|executing]] – usually to reduce the [[instruction path length]] and improve [[computer performance|performance]] or simply to reduce otherwise [[duplicate code|repetitively similar code]], thus simplifying [[software maintenance|maintenance]]. The term is usually only applied to code where the self-modification is intentional, not in situations where code accidentally modifies itself due to an error such as a [[buffer overflow]].
 
Line 24 ⟶ 25:
* '''overlay of existing instructions''' (or parts of instructions such as opcode, register, flags or addresses) or
* '''direct creation of whole instructions''' or sequences of instructions in memory
* '''creatingcreation or modification of [[source code]] statements''' followed by a 'mini compile' or a dynamic interpretation (see [[eval]] statement)
* '''creating an entire program dynamically''' and then executing it
 
===Assembly language===
Self-modifying code is quite straightforward to implement when using [[assembly language]]. Instructions can be dynamically created in [[computer memory|memory]] (or else overlaid over existing code in non-protected program storage),<ref name="HP9100A_1998"/> in a sequence equivalent to the ones that a standard compiler may generate as the [[object code]]. With modern processors, there can be unintended [[side effect (computer science)|side effect]]s on the [[CPU cache]] that must be considered. The method was frequently used for testing 'first time' conditions, as in this suitably commented [[IBM/360]] [[assembler (computer programming)|assembler]] example. It uses instruction overlay to reduce the [[instruction path length]] by (N×1)−1 where N is the number of records on the file (−1 being the [[computational overhead|overhead]] to perform the overlay).
 
SUBRTN NOP OPENED FIRST TIME HERE?
Line 95 ⟶ 96:
 
==History==
The [[IBM SSEC]], demonstrated in January 1948, had the ability to modify its instructions or otherwise treat them exactly like data. However, the capability was rarely used in practice.<ref name="Bashe-Buchholz-Hawkins-Ingram-Rochester_1981"/> In the early days of computers, self-modifying code was often used to reduce use of limited memory, or improve performance, or both. It was also sometimes used to implement subroutine calls and returns when the instruction set only provided simple branching or skipping instructions to vary the [[control flow]].<ref name="Miller_2006"/><ref name="Wenzl-Merzdovnik-Ullrich-Weippl_2019"/> This use is still relevant in certain ultra-[[Reduced instruction set computer|RISC]] architectures, at least theoretically; see for example [[one-instruction set computer]]. [[Donald Knuth]]'s [[MIX (abstract machine)|MIX]] architecture also used self-modifying code to implement subroutine calls.<ref name="Knuth_MMIX"/>
 
==Usage==
Line 139 ⟶ 140:
 
===Specialization===
Suppose a set of statistics such as average, extrema, ___location of extrema, standard deviation, etc. are to be calculated for some large data set. In a general situation, there may be an option of associating weights with the data, so each x<sub>i</sub> is associated with a w<sub>i</sub> and rather than test for the presence of weights at every index value, there could be two versions of the calculation, one for use with weights and one not, with one test at the start. Now consider a further option, that each value may have associated with it a booleanBoolean to signify whether that value is to be skipped or not. This could be handled by producing four batches of code, one for each permutation and code bloat results. Alternatively, the weight and the skip arrays could be merged into a temporary array (with zero weights for values to be skipped), at the cost of processing and still there is bloat. However, with code modification, to the template for calculating the statistics could be added as appropriate the code for skipping unwanted values, and for applying weights. There would be no repeated testing of the options and the data array would be accessed once, as also would the weight and skip arrays, if involved.
 
===Use as camouflage===
Self-modifying code is more complex to analyze than standard code and can therefore be used as a protection against [[reverse engineering]] and [[software cracking]]. Self-modifying code was used to hide copy protection instructions in 1980s disk-based programs for platformssystems such as [[IBM PersonalPC Computer|IBM PCcompatible]]s and [[Apple II series|Apple II]]. For example, on an IBM PC (or [[IBM PC compatible|compatible]]), the [[floppy disk]] drive access instruction <code>[[int 0x13]]</code> would not appear in the executable program's image but it would be written into the executable's memory image after the program started executing.
 
Self-modifying code is also sometimes used by programs that do not want to reveal their presence, such as [[computer virus]]es and some [[shellcode]]s. Viruses and shellcodes that use self-modifying code mostly do this in combination with [[polymorphic code]]. Modifying a piece of running code is also used in certain attacks, such as [[buffer overflow]]s.
Line 150 ⟶ 151:
 
===Operating systems===
The [[Linux kernel]] notably makes wide use of self-modifying code; it does so to be able to distribute a single binary image for each major architecture (e.g. [[IA-32]], [[x86-64]], 32-bit [[ARM architecture family|ARM]], [[ARM64]]...) while adapting the kernel code in memory during boot depending on the specific CPU model detected, e.g. to be able to take advantage of new CPU instructions or to work around hardware bugs.<ref name="linux_self_modifying_Paltsev">{{cite web |last1author-last=Paltsev |first1author-first=Evgeniy |title=Self Modifying Code in Linux Kernel - What, Where and How |date=30 January 2020-01-30 |url=https://talk.telematika.org/2019/all/self_modifying_code_in_linux_kernel_-_what_where_and_how/ |access-date=27 November 2022-11-27}}</ref><ref name="linux_self_modifying_altinstructions">{{cite web |last1author-last=Wieczorkiewicz |first1author-first=Pawel |title=Linux Kernel Alternatives |url=https://grsecurity.net/linux_kernel_alternatives |access-date=27 November 2022-11-27}}</ref> To a lesser extent, the [[DR-DOS]] kernel also optimizes speed-critical sections of itself at loadtime depending on the underlying processor generation.<ref name="Caldera_1997_DOSSRC"/><ref name="Paul_1997_OD-A3"/><ref group="nb" name="NB_DR-DOS_386"/>
 
Regardless, at a [[meta-level]], programs can still modify their own behavior by changing data stored elsewhere (see [[metaprogramming]]) or via use of [[type polymorphism|polymorphism]].
Line 202 ⟶ 203:
* [[AARD code]]
* [[Algorithmic efficiency]]
* [[Data as code]]
* [[eval]] statement
* [[IBM 1130#Code modification|IBM 1130]] (Example)
Line 215 ⟶ 217:
* [[Self-modifying computer virus]]
* [[Self-hosting (compilers)|Self-hosting]]
* [[Synthetic programming]]
* [[Compiler bootstrapping]]
* [[Patchable microcode]]
Line 227 ⟶ 230:
==References==
{{Reflist|refs=
<ref name="Massalin_1992_Synthesis">{{Cite thesis |author-first1=Calton |author-last1=Pu |author-link1=Calton Pu |author-first2=Henry |author-last2=Massalin |author-link2=Henry Massalin |author-first3=John |author-last3=Ioannidis |degree=PhD |title=Synthesis: An Efficient Implementation of Fundamental Operating System Services |publisher=Department of Computer Sciences, [[Columbia University]] |___location=New York, NY, USA |id=UMI Order No. GAX92-32050 |date=1992 |url=https://www.scs.stanford.edu/nyu/04fa/sched/readings/synthesis.pdf |access-date=2023-04-25}} [https://www.cs.columbia.edu/~library/TR-repository/reports/reports-1992/cucs-039-92.ps.gz]</ref>
<ref name="Henson_2008">{{cite news |title=KHB: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services |author-first=Valerie |author-last=Henson |author-link=Valerie Henson |date=2008-02-20 |work=LWN.net |url=https://lwn.net/Articles/270081/ |access-date=2022-05-19 |url-status=live |archive-url=https://web.archive.org/web/20210817175159/https://lwn.net/Articles/270081/ |archive-date=2021-08-17}}</ref>
<ref name="Haeberli_1994_GraficaObscura">{{cite web |author-first1=Paul |author-last1=Haeberli |author-link1=Paul Haeberli |author-first2=Bruce |author-last2=Karsh |title=Io Noi Boccioni - Background on Futurist Programming |work=Grafica Obscura |date=1994-02-03 |url=https://www.graficaobscura.com/future/index.html |access-date=2023-04-25}}</ref>
Line 243 ⟶ 246:
<ref name="Caldera_1997_DOSSRC">{{cite web |title=Caldera OpenDOS Machine Readable Source Kit (M.R.S) 7.01 |publisher=[[Caldera (company)|Caldera, Inc.]] |date=1997-05-01 |url=https://archive.sundby.com/retro/DR-DOS/dossrc.zip |access-date=2022-01-02 |url-status=dead |archive-url=https://web.archive.org/web/20210807095409/https://archive.sundby.com/retro/DR-DOS/dossrc.zip |archive-date=2021-08-07}} [https://web.archive.org/web/20220102102656/https://archive.sundby.com/retro/OpenDOS/OPENDOS_7.01_CODE.ZIP]</ref>
<ref name="Paul_1997_OD-A3">{{cite web |author-first=Matthias R. |author-last=Paul |title=Caldera OpenDOS 7.01/7.02 Update Alpha 3 IBMBIO.COM README.TXT |url=http://www.uni-bonn.de/~uzs180/download/ibmbioa3.zip |date=1997-10-02 |access-date=2009-03-29 |url-status=dead |archive-url=https://web.archive.org/web/20031004074600/http://www-student.informatik.uni-bonn.de/~frinke/ibmbioa3.zip |archive-date=2003-10-04}} [https://web.archive.org/web/20181225154705/http://mirror.macintosharchive.org/max1zzz.co.uk/+Windows%20&%20DOS/DOS/System/Novell/Support/Bins/Op702src.zip<!-- Op702src.zip is an unofficial renamed distribution of the ibmbioa3.zip file -->]</ref>
<ref name="HP9100A_1998">{{cite web |title=HP 9100A/B |date=1998 |work=MoHPC - The Museum of HP Calculators |at=Overlapped Data and Program Memory / Self-Modifying Code|url=https://www.hpmuseum.org/hp9100.htm |access-date=2023-09-23 |url-status=live |archive-url=https://web.archive.org/web/20230923125424/https://www.hpmuseum.org/hp9100.htm |archive-date=2023-09-23}}</ref>
}}