Malware analysis output sanitization

Published: 2017-09-09. Last Updated: 2017-09-09 19:50:09 UTC
by Didier Stevens (Version: 1)

An interesting conversation unfolded on my diary entry '"Malware analysis: searching for dots".

Back in the old days, on DOS, typing untrusted output to the console could result in escape sequences changing your environment. Catting binary data to your Linux terminal can also have unwanted effects.

Since Python can be used in many environments, there must be environments out there where escape sequences (or something similar) could still wreak havoc.

I decided to take on this (potential) problem by providing sanitization functions in my translate.py tool: Sani1 and Sani2 functions both take a byte as input and return a byte as output. If the input is a control character, Sani1 and Sani2 will sanitize it and return a space character (0x20), except for tabs (HT), linefeeds (LF) and carriage returns (CR). Sani2 goes further than Sani1: it also replaces all bytes equal to 0x80 or higher with a space character.

Hence if you would do malware analysis and output untrusted data in raw format to your screen, you could pipe it through translate.py to sanitize it, like this:

oledump.py -s 8 -v 0075733924IEMJ.doc.vir.zip | translate.py "Sani2(byte)"