Content deleted Content added
Matthiaspaul (talk | contribs) →Further reading: +ref |
→Application in low and high level languages: switched creating to creation in {'''creation or modification of source code statements'''} for grammar reasons |
||
(25 intermediate revisions by 8 users not shown) | |||
Line 4:
{{Use list-defined references|date=December 2021}}
{{Use American English|date=January 2019}}
In [[computer science]], '''self-modifying code''' ('''SMC''' or '''SMoC''') is [[source code|code]] that alters its own [[instruction (computer science)|
Self-modifying code can involve overwriting existing instructions or generating new code at run time and transferring control to that code.
Line 13 ⟶ 14:
The modifications may be performed:
* '''only during initialization''' – based on input [[Parameter#Computing|parameter]]s (when the process is more commonly described as software '[[computer configuration|configuration]]' and is somewhat analogous, in hardware terms, to setting [[jumper (computing)|
* '''throughout execution''' ("on the fly") – based on particular program states that have been reached during the execution
In either case, the modifications may be performed directly to the [[machine code]] instructions themselves, by [[overlapping instructions|overlaying]] new instructions over the existing ones (for example: altering a compare and branch to an [[unconditional branch]] or alternatively a '[[NOP (code)|NOP]]').
In the [[IBM System/360 architecture]], and its successors up to [[z/Architecture]], an EXECUTE (EX) instruction ''logically'' overlays the second byte of its target instruction with the low-order 8 bits of [[general
==Application in low and high level languages==
Line 24 ⟶ 25:
* '''overlay of existing instructions''' (or parts of instructions such as opcode, register, flags or addresses) or
* '''direct creation of whole instructions''' or sequences of instructions in memory
* '''
* '''creating an entire program dynamically''' and then executing it
===Assembly language===
Self-modifying code is quite straightforward to implement when using [[assembly language]]. Instructions can be dynamically created in [[computer memory|memory]] (or else overlaid over existing code in non-protected program storage),<ref name="HP9100A_1998"/> in a sequence equivalent to the ones that a standard compiler may generate as the [[object code]]. With modern processors, there can be unintended [[side effect (computer science)|side effect]]s on the [[CPU cache]] that must be considered. The method was frequently used for testing 'first time' conditions, as in this suitably commented [[IBM/360]] [[assembler (computer programming)|assembler]] example. It uses instruction overlay to reduce the [[instruction path length]] by (N×1)−1 where N is the number of records on the file (−1 being the [[computational overhead|overhead]] to perform the overlay).
SUBRTN NOP OPENED FIRST TIME HERE?
Line 62 ⟶ 63:
===High-level languages===
Some compiled languages explicitly permit self-modifying code. For example, the ALTER verb in [[COBOL]] may be implemented as a branch instruction that is modified during execution.<ref name="MicroFocus_ALTER"/> Some [[batch file|batch]] programming techniques involve the use of self-modifying code. [[Clipper (programming language)|Clipper]] and [[
With interpreted languages, the "machine code" is the source text and may be susceptible to editing on-the-fly: in [[SNOBOL]] the source statements being executed are elements of a text array. Other languages, such as [[Perl]] and [[Python (programming language)|Python]], allow programs to create new code at run-time and execute it using an [[eval]] function, but do not allow existing code to be mutated. The illusion of modification (even though no machine code is really being overwritten) is achieved by modifying function pointers, as in this JavaScript example:
Line 71 ⟶ 72:
f = new Function('x', 'return x + 2');
</syntaxhighlight>
[[Lisp
The Push programming language is a [[genetic programming]] system that is explicitly designed for creating self-modifying programs. While not a high level language, it is not as low level as assembly language.<ref name="Push"/>
Line 89 ⟶ 90:
===Control tables===
[[Control table]] [[interpreter (computing)|interpreter]]s can be considered to be, in one sense, 'self-modified' by data values extracted from the table entries (rather than specifically [[hand coding|hand coded]] in [[
===Channel programs===
Some IBM [[access method]]s traditionally used self-modifying [[Channel I/O#Channel program|channel
==History==
The [[IBM SSEC]], demonstrated in January 1948, had the ability to modify its instructions or otherwise treat them exactly like data. However, the capability was rarely used in practice.<ref name="Bashe-Buchholz-Hawkins-Ingram-Rochester_1981"/> In the early days of computers, self-modifying code was often used to reduce use of limited memory, or improve performance, or both. It was also sometimes used to implement subroutine calls and returns when the instruction set only provided simple branching or skipping instructions to vary the [[control flow]].<ref name="Miller_2006"/><ref name="Wenzl-Merzdovnik-Ullrich-Weippl_2019"/> This use is still relevant in certain ultra-[[Reduced instruction set computer|RISC]] architectures, at least theoretically; see for example [[one
==Usage==
Self-modifying code can be used for various purposes:
* Semi-automatic [[Program optimization
* Dynamic in-place code optimization for speed depending on load environment.<ref name="Caldera_1997_DOSSRC"/><ref name="Paul_1997_OD-A3"/><ref group="nb" name="NB_DR-DOS_386"/>
* [[Run time (program lifecycle phase)|Run-time]] code generation, or specialization of an algorithm in runtime or loadtime (which is popular, for example, in the ___domain of real-time graphics) such as a general sort utility – preparing code to perform the key comparison described in a specific invocation.
* Altering of [[inline function|inlined]] state of an [[object (computer science)|object]], or simulating the high-level construction of [[closure (computer
* Patching of [[subroutine]] ([[pointer (computer programming)|pointer]]) address calling, usually as performed at load/initialization time of [[
* Evolutionary computing systems such as [[neuroevolution]], [[genetic programming]] and other [[evolutionary algorithm]]s.
* Hiding of code to prevent [[reverse engineering]] (by use of a [[disassembler]] or [[debugger]]) or to evade detection by virus/spyware scanning software and the like.
* Filling 100% of memory (in some architectures) with a rolling pattern of repeating [[
* [[Executable compression|Compressing]] code to be decompressed and executed at runtime, e.g., when memory or disk space is limited.<ref name="Caldera_1997_DOSSRC"/><ref name="Paul_1997_OD-A3"/>
* Some very limited [[instruction set architecture|instruction set]]s leave no option but to use self-modifying code to perform certain functions. For example, a [[one
* [[Booting]]. Early [[microcomputer]]s often used self-modifying code in their bootloaders. Since the bootloader was keyed in via the front panel at every power-on, it did not matter if the [[bootloader]] modified itself. However, even today many bootstrap loaders are [[self-relocating]], and a few are even self-modifying.<ref group="nb" name="NB_DR-DOS_707"/>
* Altering instructions for fault-tolerance.<ref name="Ortiz_2015"/>
Line 139 ⟶ 140:
===Specialization===
Suppose a set of statistics such as average, extrema, ___location of extrema, standard deviation, etc. are to be calculated for some large data set. In a general situation, there may be an option of associating weights with the data, so each x<sub>i</sub> is associated with a w<sub>i</sub> and rather than test for the presence of weights at every index value, there could be two versions of the calculation, one for use with weights and one not, with one test at the start. Now consider a further option, that each value may have associated with it a
===Use as camouflage===
Self-modifying code is more complex to analyze than standard code and can therefore be used as a protection against [[reverse engineering]] and [[software cracking]]. Self-modifying code was used to hide copy protection instructions in 1980s disk-based programs for
Self-modifying code is also sometimes used by programs that do not want to reveal their presence, such as [[computer virus]]es and some [[shellcode]]s. Viruses and shellcodes that use self-modifying code mostly do this in combination with [[polymorphic code]]. Modifying a piece of running code is also used in certain attacks, such as [[buffer overflow]]s.
===Self-referential machine learning systems===
Traditional [[machine learning]] systems have a fixed, pre-programmed learning [[algorithm]] to adjust their [[parameter (computer programming)|parameter]]s. However, since the 1980s [[Jürgen Schmidhuber]] has published several self-modifying systems with the ability to change their own learning algorithm. They avoid the danger of catastrophic self-rewrites by making sure that self-modifications will survive only if they are useful according to a user-given [[fitness function|fitness]], [[error function|error]] or [[reward function|reward]] function.<ref name="Schmidhuber"/>
===Operating systems===
The [[Linux kernel]] notably makes wide use of self-modifying code; it does so to be able to distribute a single binary image for each major architecture (e.g. [[IA-32]], [[x86-64]], 32-bit [[ARM architecture family|ARM]], [[ARM64]]...) while adapting the kernel code in memory during boot depending on the specific CPU model detected, e.g. to be able to take advantage of new CPU instructions or to work around hardware bugs.<ref name="linux_self_modifying_Paltsev">{{cite web |
Regardless, at a [[meta-level]], programs can still modify their own behavior by changing data stored elsewhere (see [[metaprogramming]]) or via use of [[type polymorphism|polymorphism]].
Line 182 ⟶ 183:
Self-modifying code is harder to read and maintain because the instructions in the source program listing are not necessarily the instructions that will be executed. Self-modification that consists of substitution of [[function pointer]]s might not be as cryptic, if it is clear that the names of functions to be called are placeholders for functions to be identified later.
Self-modifying code can be rewritten as code that tests a [[flag (
Self-modifying code conflicts with authentication of the code and may require exceptions to policies requiring that all code running on a system be signed.
Line 188 ⟶ 189:
Modified code must be stored separately from its original form, conflicting with memory management solutions that normally discard the code in RAM and reload it from the executable file as needed.
On modern processors with an [[instruction pipelining|instruction pipeline]], code that modifies itself frequently may run more slowly, if it modifies instructions that the processor has already read from memory into the pipeline.
Self-modifying code cannot be used at all in some environments, such as the following:
Line 202 ⟶ 203:
* [[AARD code]]
* [[Algorithmic efficiency]]
* [[Data as code]]
* [[eval]] statement
* [[IBM 1130#Code modification|IBM 1130]] (Example)
Line 215 ⟶ 217:
* [[Self-modifying computer virus]]
* [[Self-hosting (compilers)|Self-hosting]]
* [[Synthetic programming]]
* [[Compiler bootstrapping]]
* [[Patchable microcode]]
Line 227 ⟶ 230:
==References==
{{Reflist|refs=
<ref name="Massalin_1992_Synthesis">{{Cite thesis |author-first1=Calton |author-last1=Pu |author-link1=Calton Pu |author-first2=Henry |author-last2=Massalin |author-link2=Henry Massalin |author-first3=John |author-last3=Ioannidis |degree=PhD |title=Synthesis: An Efficient Implementation of Fundamental Operating System Services |publisher=Department of Computer Sciences, [[Columbia University]] |___location=New York
<ref name="Henson_2008">{{cite news |title=KHB: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services |author-first=Valerie |author-last=Henson |author-link=Valerie Henson |date=2008-02-20 |work=LWN.net |url=https://lwn.net/Articles/270081/ |access-date=2022-05-19 |url-status=live |archive-url=https://web.archive.org/web/20210817175159/https://lwn.net/Articles/270081/ |archive-date=2021-08-17}}</ref>
<ref name="Haeberli_1994_GraficaObscura">{{cite web |author-first1=Paul |author-last1=Haeberli |author-link1=Paul Haeberli |author-first2=Bruce |author-last2=Karsh |title=Io Noi Boccioni - Background on Futurist Programming |work=Grafica Obscura |date=1994-02-03 |url=https://www.graficaobscura.com/future/index.html |access-date=2023-04-25}}</ref>
Line 243 ⟶ 246:
<ref name="Caldera_1997_DOSSRC">{{cite web |title=Caldera OpenDOS Machine Readable Source Kit (M.R.S) 7.01 |publisher=[[Caldera (company)|Caldera, Inc.]] |date=1997-05-01 |url=https://archive.sundby.com/retro/DR-DOS/dossrc.zip |access-date=2022-01-02 |url-status=dead |archive-url=https://web.archive.org/web/20210807095409/https://archive.sundby.com/retro/DR-DOS/dossrc.zip |archive-date=2021-08-07}} [https://web.archive.org/web/20220102102656/https://archive.sundby.com/retro/OpenDOS/OPENDOS_7.01_CODE.ZIP]</ref>
<ref name="Paul_1997_OD-A3">{{cite web |author-first=Matthias R. |author-last=Paul |title=Caldera OpenDOS 7.01/7.02 Update Alpha 3 IBMBIO.COM README.TXT |url=http://www.uni-bonn.de/~uzs180/download/ibmbioa3.zip |date=1997-10-02 |access-date=2009-03-29 |url-status=dead |archive-url=https://web.archive.org/web/20031004074600/http://www-student.informatik.uni-bonn.de/~frinke/ibmbioa3.zip |archive-date=2003-10-04}} [https://web.archive.org/web/20181225154705/http://mirror.macintosharchive.org/max1zzz.co.uk/+Windows%20&%20DOS/DOS/System/Novell/Support/Bins/Op702src.zip<!-- Op702src.zip is an unofficial renamed distribution of the ibmbioa3.zip file -->]</ref>
<ref name="HP9100A_1998">{{cite web |title=HP 9100A/B |date=1998 |work=MoHPC - The Museum of HP Calculators |at=Overlapped Data and Program Memory / Self-Modifying Code|url=https://www.hpmuseum.org/hp9100.htm |access-date=2023-09-23 |url-status=live |archive-url=https://web.archive.org/web/20230923125424/https://www.hpmuseum.org/hp9100.htm |archive-date=2023-09-23}}</ref>
}}
== Further reading==
* {{cite web |title=GCR decoding on the fly |author-first=Linus |author-last=Åkesson |date=2013-03-31 |url=https://www.linusakesson.net/programming/gcr-decoding/index.php |access-date=2017-03-21 |url-status=live |archive-url=https://web.archive.org/web/20170321014657/https://www.linusakesson.net/programming/gcr-decoding/index.php |archive-date=2017-03-21}}
* {{cite book |title=Eine Bibliothek für Selbstmodifikationen zur Laufzeit in Java |language=de |trans-title=A library for self-modifications at runtime in Java |author-first=Christian Felix |author-last=Bürckert |date=2012-03-20 |type=Thesis |publisher=[[Universität des Saarlandes]], Naturwissenschaftlich-Technische Fakultät I, Fachrichtung Informatik |url=https://
==External links==
|