Getting some intelligence from malspam

Published: 2017-09-18
Last Updated: 2017-09-18 06:05:50 UTC
by Xavier Mertens (Version: 1)
3 comment(s)

Many of us are receiving a lot of malspam every day. By "malspam", I mean spam messages that contain a malicious document. This is one of the classic infection vectors today and aggressive campaigns are started every week. Usually, most of them are blocked by modern antivirus or anti-spam but these files could help us to get some intelligence about the topic used by attackers to fool their victims. By checking the names of malicious files (often .rar, .gip or .7r archives), we found classic words like ‘invoice’, ‘reminder’, ‘urgent’, etc… From an attacker perspective, choosing the right name can increase the chances that the target will open the file by business needs or just…curiosity!

I collected files attached to malicious emails and tried to categorize them to determine what were the most common names. To achieve this, I created a list of simple regular expressions based on classic strings and assigned a category to them. Both are stored in a CSV files:


(Note that the two first lines have been obfuscated because they are related to really targeted attacks against an organization)

Then I built a list of 94387 filenames based on the data that I collected since the beginning of 2017. The best place to collect those data is on your incoming mail server or any anti-spam, anti-malware solution logfiles. This is a good opportunity to remind you that logs are critical, log as much as possible! How to check the filenames against all the regular expressions above and tag them with the second field ('Category'). To perform this in an efficient way, I used Splunk.

The regular expressions are stored in the ‘maldocs_re.csv’ file and the filenames into ‘maldocs.csv’ and the following query will return interesting statistics:

|inputlookup maldocs.csv
|eval count=0
|join max=0 count [| inputlookup maldoc_re.csv | eval count=0]
|eval test=if(match(filePath, Regex),1,0)
|where test=1
|stats count by Category

After a few seconds or minutes later, depending on the amount of data you have to process, you will get a nice graph like this:

You can see that most of the malicious files are based on media files but that we also have some hits against the 'Targeted' category. It would be worth to have a look at them! Finally, if you define a single category called ‘Targeted’ with good regular expressions matching your business activity, domain names, login formats, brands or whatever, you can generate alerts if such files are sent to your users and be aware of potential targeted attacks!

Happy hunting!

Xavier Mertens (@xme)
ISC Handler - Freelance Security Consultant

3 comment(s)


Is the third regular expression (.*inv[oi]ce.*) correct? Surely it will match "invoce" or "invice" but not "invoice"? Assuming that the regular expressions are executed in order, "invoice" will match the expression ".*voice.*" under Communications Services lower down.
You're right, there is a missing '+'...


I added this one because I already saw some document with the typo 'invioce-xxx.doc'
You might want to try PACK (Password Analysis and Cracking Toolkit). One of its features is the, which tries to figure out the base words and rules used to mangle it that created a given set of passwords. The "base word" part might be interesting for your project too.

Diary Archives