Last Updated: 2021-11-13 09:32:28 UTC
by Didier Stevens (Version: 1)
Reader Colin submitted a malicious document.
It's a Word document with VBA code, as we can see in oledump's report:
As streams A3 and A10 with VBA code don't look that large, I use options -s a -v to extract all VBA code with one command:
The VBA code is accessing keywords and the content of the document. Let's start with the keywords, and search in which streams string keywords appears using zipdump.py:
It's in the docProps/cord.xml file:
Let's do a pretty print with my tool xmldump.py:
Let's extract the text:
This looks like a reversed path & filename. Let's reverse it with translate.py:
So the VBA code with create an HTA file in the public user folder.
Let's now take a look at the content of the Word document. This is stored inside file document.xml:
And my xmldump.py tool has a command to extract the content of an OOXML Word document: wordtext:
Notice that string $1 appears a lot. This could well be an obfuscation method: the original script has been interspersed with string $1, and to deobfuscate it, one has just to replace that string with an empty string. Let's try this using sed:
This looks indeed like an HTA file: html code with scripts. And it seems to contain BASE64 code (in the beginning). Let's decode this with my tool base64dump.py:
This does not look like a script (PowerShell for example). What I've encountered before in malicious Office documents, is BASE64 that is a compressed script or that is shellcode. When I do take a look close at the scripts in the HTA file (I do some kind of pretty print by adding a newline after each semicolon), I don't see code that can do decompression or inject shellcode into memory:
But what I do see, is a string split statement using separator |||, and reverse strings statements. Searching for separator |||, I see this:
So the BASE64 string that I saw, actually consists of 2 BASE64 strings. And they are reversed: look at the string after the separator |||: ==gdh...
A BASE64 can end with = or ==, but it will never start with ==. So I need to reverse these BASE64 strings before decoding them (that's why the decoded BASE64 string we saw before doesn't make sense). We use translate.py to reverse the complete script and then base64dump.py to extract the BASE64 strings:
And now we have 2 base64 strings that decode to something that looks more familiar. Let's take a closer look at item 2:
That looks like a reversed script.
Let's take a look at item 3:
That too looks like a reversed script.
So let's use translate.py once more to reverse the decoded scripts:
So these are 2 scripts: the first one downloads a file and writes it to disk as a jpg file in the public user folder. And the second script runs regsvr32 with that jpg file as argument: that jpg file must be a PE file (dll).
Unfortunately, I was not able to download the file or find it on VirusTotal.