Polymorphic code

This is an old revision of this page, as edited by Connelly (talk | contribs) at 04:38, 12 February 2004 (spelling edits). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science (or more often, in computer underground terms) polymorphic code is code that mutates while keeping the original algorithm intact.

This is sometimes used by computer viruses, shellcodes and computer worms to hide their presence. Most anti virus-software and intrusion detection systems tries to find malicious code by searching through computer files and data packets sent over a computer network. If the security software finds any pattern that corresponds to an already known virus or worm, it reacts and erases the program. It cannot, however, find the program if it constantly mutates so that it never looks the same. This is the very idea of polymorphic code.

Most often, a virus/worm that makes any attempt to hide its presence will do that by encrypting itself. However, before being executed at a remote computer, it obviously first needs to decrypt itself. In order to decrypt the virus or worm, some part of the code has to be deliverd unencrypted. Thus, while not being able to detect the actual virus or worm, the anti virus-software/intrusion detection system will still be able to detect the virus decryption engine!

However, if the decryption engine is rewritten each time before it is transfered into a new computer (in the case of a worm/shellcode) or computer file (in the case of a virus), it becomes nearly impossible for any security software to detect the presence of the malicious program.

How it works

An algorithm that uses, for example, the variables A and B but not the variable C could stay intact even if you added lots of codes that changed the content in the variable C.

The original algorithm:

Start:
GOTO Decryption_Code
Encrypted:
    ...
    lots of encrypted code!!!
    ...
Decryption_Code:
    *A = Encrypted
Loop:
    B = *A
    B = B XOR CryptoKey
    *A = B
    A = A + 1
    GOTO Loop IF NOT A = (Decryption_Code - Encrypted)
    GOTO Encrypted
 CryptoKey:
    some_random_number

The same algorithm, but with lots of unnecessary C-altering codes..

Start:
GOTO Decryption_Code
Encrypted:
    ...
    lots of encrypted code!!!
    ...
Decryption_Code:
    C = C + 1
    *A = Encrypted
Loop:
    B = *A
    C = 3214 * A
    B = B XOR CryptoKey
    *A = B
    C = 1
    C = A + B
    A = A + 1
    GOTO Loop IF NOT A = (Decryption_Code - Encrypted)
    C = C^2
    GOTO Encrypted
 CryptoKey:
    some_random_number

The code inside "Encrypted" ("lots of encrypted code!!!") could then search the code between Decryption_Code and CryptoKey and remove all the code that alters the variable C. Before the next time the encryption engine is used, it could input new unnecessary codes that alters C, or ever exchange the code in the algorithm into new code that does the same thing.

see also: self-modifying code, alphanumeric code, shellcode, software cracking, security cracking