Last Updated: 2022-03-05 09:14:57 UTC
by Didier Stevens (Version: 1)
A colleague asked if it was possible with oledump.py, to search through a set of malicious documents and filter out all streams that have identical VBA source code.
Although oledump.py only operates on one document at a time, it is possible to achieve the desired result with some scripting.
oledump.py has an option, to calculate data for each stream inside an ole file. This is option -E (extra).
When you run oledump.py on a Word document with VBA code, without any options, you get output like this:
To add a column with the hash of the data inside each stream, you can use option -E. Like this:
(if you don't like to use MD5, there are other hashes available, like SHA256).
For the macro streams, this gives us the hash of the complete stream: the compiled code and the comrpessed VBA source code. What we actually want, is the hash of the decompressed VBA source code. This can be achieved by adding option -v to decompress the VBA code:
To get only the hash value, and nothing more, use prefix ! for option E, like this:
But we do lose some interesting information here, namely the indicator, which tells use which streams are macro streams and which are not.
We can just add this indicator, like this (I'm separating the fields with a comma, to produce a CSV file):
To keep only macro streams, grep for lines starting with letter m, like this:
It is also possible to add the stream name:
You can consult oledump's embedded man page to find out which fields are available:
Do this for every document, and then make statistics to find out which hashes are unique.