Last Updated: 2017-06-05 22:01:23 UTC
by Didier Stevens (Version: 1)
Malware authors often encode their malicious payload, to avoid detection and make analysis more difficult.
I regurlarly see payloads encoded with the XOR function. Often, they will use a sequence of bytes as encoding key. For example, let's take Password as encoding key. Then the first byte of the payload is XORed with the first byte of the key (P), the second byte of the payload is XORed with the second byte of the key (a), and so on until all bytes of the key have been used. And then we start again with the first byte of the key: the ninth byte of the payload is XORed with the first byte of the key (P), ...
Let's see what this gives with a Windows executable (a PE file), like this one:
The XOR function has some interesting properties for us analysts. XOR a byte with 0x00 (zero), and you get the same byte: XOR with 0x00 is the identity function (f(x) = x).
Since a normal PE file has many sequences of 0x00 bytes, an XOR encoded PE file will contain the encoding key, like here:
So just by opening a XOR encoded PE file with a binary editor, we can see the repeating key, provided that the key is smaller than the sequences of 0x00 bytes.
Second interesting property of the XOR function: if you XOR the original file (cleartext) with the encoded file (ciphertext), you get the key (or to be more precise, the keystream).
Let's take another example. We know that in many PE files, you can find the string "This program can not be run in DOS mode." in the MZ header (or something similar). Here is this encoded string in the encoded PE file:
If we XOR this encoded string with the unencoded string, we obtain the key:
So if we have the encoded file, and the partially unencoded file, we can also recover the key, provided again that the key is smaller than the unencoded text, and that we know where to line-up the encoded and unencoded text.
In a next diary entry, I will show a tool to automate this analysis process.