Peeking into .msg files

Published: 2017-10-15
Last Updated: 2017-10-15 11:18:46 UTC
by Didier Stevens (Version: 1)
5 comment(s)

Readers often submit malware samples, and sometimes the complete email with attachment. For example exported from Outlook, as a .msg file.

Did you know that .msg files use the Compound File Binary Format (what I like to call OLE files), and can be analysed with oledump?

Reader Carlos Almeida submitted a .msg file with malicious .rar attachment.

I'm not that familiar with .msg file intricacies, but by looking at the stream names and sizes, I can often find what I'm looing for:

Stream 53 seems to contain the message:

From this hex-ascii dump, you can probably guess that the message is stored in UNICODE format. We can use option -t (translate) of oledump to decode it as UTF-16:

Stream 43 contains the headers. I don't want to disclose private information like our reader's email address, so I grepped for some headers that I can disclose:

The Subject header is encoded according to RFC1342 because the subject contains non-ASCII characters. It decodes to this:

These are chinese characters that seem to mean the same as FW: (forwarding).

Stream 3 contains the attachment:

You can see it's a RAR file.

I use 7zip to look into it, and it should be possible to do this without writing the file to disk, by just piping the data into 7zip (options -si and -so can help with piping). But unfortunately, I got errors trying this and resigned to saving it to disk:

It contains an unusually large .bat file:

It's actually a PE file:

This looks to be a VB6 executable (from the PEiD signature), I should dig up my VB6 decompiler and try to take a closer look.

Of course, it's malware.


Didier Stevens
Microsoft MVP Consumer Security

Keywords: email malware msg rar
5 comment(s)


RFC1342 is obsolete, RFC2047 is the current RFC. Unfortunately, Python 2.7 decoding with make_header(decode_header).__unicode() isn't always correct. Python 3.6 is better, but I've still had some issues.
If I had multiple MSG files, could I use oledump to automatically extract attachments from them, without necessarily knowing in advance which stream in the attachment stream?
:) Thank you Didier Stevens, for all your great work
and also your great SANS team, thank you
learn a lot
Carlos Almeida
You should be able to figure out what parts of a msg are the attachment and what types they are. The substrings have specific meanings.

The prefix indicates what the part type is

'3701': 'Attachment data',
'3703': 'Attachment extension',
'3704': 'Attachment short filename',
'3707': 'Attachment long filename',
'370E': 'Attachment mime tag',
'3712': 'Attachment ID (uncertain)'
I missed this back when it came out. I'm surprised that Microsoft didn't come out with an .MSGX format using ZIP like the other new Office formats. There must be many unexplored attack scenarios using these files, which I assure you are used heavily in the real world.

Diary Archives