Self-modifying code

This is an old revision of this page, as edited by Saltine (talk | contribs) at 11:03, 2 October 2003. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer programming, self-modifying code is code that modifies itself. This is straightforward to write when using assembly language and is also supported by some high level language interpreters such as SNOBOL4 or the Lisp programming language. It is more difficult to implement on compilers but compilers such as Clipper and Spitbol make a fair attempt at it. Batch programming scripts often involve self modifying code as well.

Self-modifying code often executes slower on modern processors. This is because a modern processor will usually try to keep blocks of code in its cache memory. Each time the program rewrites a part of itself, the rewritten part must be loaded into the cache again, which results in a slight delay. Use of self-modifying code is not recomended when a viable alternative exists, because such code can be difficult to understand and maintain.

Self-modifying code was used in the early days of computers in order to save memory space, which was limited. It was also used to implement subroutine calls and returns when the instruction set only provided simple branching or skipping instructions to vary the flow of control (this is still relevant in certain ultra-RISC architectures, at least theoretically, e.g. one such system has a sole branching instruction with three operands: subtract-and-branch-if-negative).

Self-modifying code was used to hide copy protection instructions in 1980s MS-DOS based games. The floppy disk drive access instruction 'int 0x13' would not appear in the executable program's image but it would be written into the executable's memory image after the program started executing. Self-modifying code is also sometimes used by programs that do not want to reveal their presence such as computer viruses and some shellcodes.

Example algorithm (theoretical!)

Start:
GOTO Decryption_Code
Encrypted:
    ...
    lots of encrypted code!!!
    ...
Decryption_Code:
    *A = Encrypted
Loop:
    B = *A
    B = B XOR CryptoKey
    *A = B
    A = A + 1
    GOTO Loop IF NOT A = (Decryption_Code - Encrypted)
    GOTO Encrypted
 CryptoKey:
    some_random_number

This "program" will decrypt a part of itself and then jump to it.

(*A means "the ___location to which A points")