Last Updated: 2017-12-18 07:03:28 UTC
by Didier Stevens (Version: 1)
We continue the MSG analysis of yesterday.
There are several ways to take a look at the text contained in a Word .docx file without using MS Office.
Here we will look at the raw XML. The content of a Word file is stored in the following file:
As you can see, the text of the document is contained between XML tags. Filtering out these XML tags, for example with a regular expression and SED, reveals the text without any formatting:
But it can be harder to understand without any new lines. And sometimes, this method will strip away info you want to see.
That is why I wrote a simple tool in Python that reads XML and can extract various information: xmldump.py.
You can achieve the same result as with sed by using command xmldump.py text:
Command wordtext is like command text, but it looks for paragraphs (<w:p>) and inserts a newline after extracting the text of each paragraph:
From the content of the Word document, it's clear that this is a scam.
Just for the sake of trying to be thorough, I poked around a bit looking for exploits or feature abuse (like DDE), but found nothing.