Maldoc Cleaned by Anti-Virus

Published: 2022-03-21
Last Updated: 2022-03-21 08:04:48 UTC
by Didier Stevens (Version: 1)
0 comment(s)

About a month ago, I received a Twitter message for an interesting maldoc sample: 0f609e43fa76afd4e2e916acb2ab54cc8fce64750ec372f716b42f34db3da0ce.

It is a PowerPoint add-in. Taking a look with oledump.py reveals VBA code that looks malicious, but I can't find autoexecution code:

So maybe the compressed VBA source code was removed (VBA stomping). I check the compiled code inside stream 5 (5c):

And here I see no trace of compiled code, only some ASCII text: "Deleted By Kaspersky Lab AV ".

It looks like the compiled VBA code and the compressed VBA source code has been removed from stream 5 by and anti-virus program.

Let's check if the compiled code in _VBA_PROJECT (stream 3) has also been removed:

I see Auto_OpeNV right after string izhar. That's an indication that stream 5 (izhar) did indeed contain autoexecute code, but that it has been removed.

I've observed in the past, that when anti-virus programs clean VBA code, they reduce the size of the stream(s) with the VBA code.

Let's check if that's the case here. Since this is an OOXML file, I will first extract the OLE file from the ZIP container (I'm going to use tool olemap, and this tool handles OLE files, not OOXML files).

Taking a look with zipdump.py:

File dsjhfsfhsjfh.c.vir is the OLE file. I extract it:

And double-check with oledump that I did indeed extract the correct file from the OOXML file (ZIP container):

It is indeed the right OLE file. Now I check the FAT of the olefile with olemap:

This File Allocation Table looks normal.

Next I check the mini FAT:

And here I see that there are 5 free mini sectors, right after a mini stream (End of Chain).

I will now try to find out if there is still data left in these free mini sectors. This is the content of stream 5:

Next I open the OLE file with a binary editor, and search for the end of stream 5 (by searching for bytes 3D 20 22 69 7A 68 00 61  72 22 0D 0A):

And there is indeed data following the end of the stream. I even see the following string: Au.toOpeN. I try to figure out how much data there is after the end of the stream, by selecting all bytes right before the first sequence of NULL bytes:

I selected 313 bytes. And this looks like compressed VBA code to me. It is found inside the free mini sectors, that is why oledump is not finding this data.

Next I will modify the OLE file so that this data is again part of stream 5. For that, I need to mark the free mini sectors as being used, and I need to increase the size of stream 5 with 313 bytes.

Sectors & mini sectors are referenced inside the FAT and mini FAT tables as little-endian, 32-bit integers.

A free sector is marked as 0xFFFFFFFF.

A used sector is marked by putting the sector number of the next sector inside the FAT / mini FAT table, and if it is the last sector, by marking it with 0xFFFFFFFE.

I'm assuming that mini sector 44 is not the real end of the chain, and that it is followed by mini sectors 45, 46, 47, 48 and 49. I will change the mini FAT table of the OLE file accordingly:

First I search for the end of chain sector inside the OLE file, by searching for byte sequence 44 00 00 00 FE FF FF FF with the binairy editor (remember, the integers are little-endian & 32-bit).

I mark the free mini sectors as being in used, by making the following changes to the mini FAT:

And I double-check by running olemap on the patched file:

The mini sectors are indeed no longer free.

That's one step: patch the mini FAT table.

What I also need to change is the size of the stream. Each stream has a header inside the OLE file, with metadata. Like the size of the stream. The size of a stream is encoded as a 64-bit little-endian integer. This is documented in [MS-CFB], as a Compound File Directory Entry data structure:

As the size of stream 5 is 1196 bytes, the little-endian 64-bit representation of that number is hexadecimal is: AC 04 00 00 00 00 00 00.

I search for that byte sequence with my binairy editor (010 Editor):

And again I'm lucky: there is only one hit for this sequence. Now I'm going to patch this value directly inside the header. I need to add 313 bytes to 1196: that's 1509, or E5 05 00 00 00 00 00 00 (little-endian, 64-bit integer).

I patch the OLE file:

Let's check with oledump if the stream size is indeed increased with 313 bytes:

Stream 5 is indeed 1509 bytes long now. Let's select the compressed VBA source code (5s):

I have indeed more data now. Let's see if it can be decompressed:

It fails to decompressed properly ... Although there is more output now than previously.

What is happening here, is the following: the anti-virus has also made some changes to the compressed VBA code. Compressed VBA code is composed of compressed chunks, and each chunk as a header with the size of the compressed data. This size has to be fixed too.

The data structures used for compressed data is explained in document [MS-OVBA].

The compressed data is called a CompressedContainer, ans consists of a signature byte followed by compressed chunks (CompressedChunk):

The signature byte is 0x01.

Each CompressedChunk consists of a CompressedHeader (2 bytes) followed by the compressed data.

The CompressedHeader has 12 bits (least significant) to encode the size (e.g., length of the CompressedData expressed in bytes minus 3) and 4 bits for flags.

Let's take another look at the actual compressed code:

SignatureByte: 01

CompressedChunkHeader: 20 B0. B are the flags, and 020 is the encoded length.

After the CompressedChunkHeader, we have 346 bytes of compressed data:

346 bytes minus 3 is 343 of 0157 hexadecimal. Adding the flags (B) gives B175 (big-endian), or 57 B1 little-endian. Thus we have to change the CompressedChunkHeader from 20 B0 to 57 B1.

And then finally, we can use oledump.py to decompress the VBA code:

So by fixing the mini FAT table and fixing the size fields in 2 different headers, we were able to recover the malicious VBA code that has been cleaned by the anti-virus: the anti-virus did not actually overwrite the compressed VBA code in stream 5 (although it did overwrite the compiled code in stream 5), it did just truncate the stream. And by undoing this truncation, we were able to recover the original VBA source code.

3 remarks:

1) this sample was given to me around a month ago, and I'm only publishing a diary entry now, because this sample inspired me to make a CTF challenge for the Cyber Security Challenge Belgium. As the qualifiers are over now, I can publish this howto :-). The students had to recover the VBA code from a Word document I prepared. To find the flag, they had to fix the 2 headers, but not the mini FAT table. I made my CTF challenge Word file so small, that the stream size reduction did not necessitate freeing mini sectors.

2) once I had restablished the original compressed VBA data, I was able to find back the original (uncleaned) maldoc on VirusTotal: ab8f0d66610dee220f744804623aaefe524dc9e18eb92100cec8beb365255c0a.

3) this is not the first time I'm looking into anti-virus cleaned maldocs: AV Cleaned Maldoc.

Didier Stevens
Senior handler
Microsoft MVP
blog.DidierStevens.com

Keywords:
0 comment(s)

Comments


Diary Archives