Static analysis of malicious PDFs

Published: 2010-01-07
Last Updated: 2010-01-07 01:01:21 UTC
by Daniel Wesemann (Version: 1)
4 comment(s)

While we are still waiting for the patch and the malicious PDFs which exploit CVE-2009-4324 become more and more nasty, here's another quick excursion in dissecting and analyzing hostile PDF files. We'll take a closer look at the sample that fellow ISC Handler Bojan already analyzed, but will this time do a static analysis without actually running the hostile code.

$md5sum Requset.pdf
192829aa8018987d95d127086d483cfc Requset.pdf
$ls -ald Requset.pdf
-rw-r----- 1 daniel handlers 952206 2010-01-03 23:57 Requset.pdf

One of the tools that work very well to analyze PDFs is Didier Stevens' excellent script "pdf-parser.py" . Running pdf-parser.py -f Requset.pdf | more nicely dissects the PDF into its portions, and also de-compresses packed sections. Almost at the end of the output, we encounter Object #44:


The code is included here as an image, to keep your anti-virus from panicking. The blue box marks the surprisingly short and efficient shell code block of only 38 bytes length that Bojan mentioned in his earlier diary. The red box marks the call to "media.newPlayer" with a null argument, which is a tell-tale sign of an exploit for CVE 2009-4324, the currently still unpatched Adobe vulnerability.

If all we wanted to know is whether this PDF is hostile, we can stop here: Yes, it is.

 

Taking a completely different tack on the same sample, a brute force method in analysis that often works, and also works in this case, is to check the sample for XOR encoded strings. XORSearch, another one of Didier Stevens' cool tools (URL) helps with this task. Let's check if the sample contains a XOR encoded representation of the string "http"

$ XORSearch Requset.pdf http

Found XOR 00 position E6340: http://www.w3.org/1999/02/22-rdf-syntax-ns#">.
Found XOR 00 position E63A9: http://ns.adobe.com/xap/1.0/">. <xmp:Modif
[...etc...]
Found XOR 85 position D870: http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns
Found XOR 85 position D8A7: http://ns.adobe.com/iX/1.0/'>..<rdf:Description rd
Found XOR 85 position DAD4: http://ns.adobe.com/xap/1.0/mm/' xapMM:DocumentID=
Found XOR 85 position DB86: http://purl.org/dc/elements/1.1/' dc:format='appli
Found XOR 85 position 1054D: httpshellopencomMand.SoftwareMicrosoftActive

Well, a XOR with zero is not overly exciting, all this means is that the file contains these URLs in plain text. But a XOR with 85, and one that seems to be doing some sort of shell.open ... now that's intriguing. Let's simply XOR the entire PDF file with 0x85, and see what we get:

$cat Requset.pdf | perl -pe 's/(.)/chr(ord($1)^0x85)/ge' |strings | more

gfJV
w)pf
S>S2X4
[...etc...]
z<o*
7Fpo
!This program cannot be run in DOS mode.
L8Rich
M_*K
.text
`.rdata

Now lookie, it seems as if this PDF contains an embedded executable! And a bit further down in the de-xored file, we find

hepfixs.exe
baby
{38FC368D-A5D0-21DA-0404-080501030704}
cecon.flower-show.org
ws2_32
SOFTWAREClasseshttpshellopencomMand
SoftwareMicrosoftActive SetupInstalled Components

This gives us a potential domain name (cecon.flower-show. org), and also a ClassID. Searching for {38FC368D-A5D0-21DA-0404-080501030704} in Google, we find a recent ThreatExpert analysis http://www.threatexpert.com/report.aspx?md5=b0eeca383a7477ee689ec807b775ebbb that matches perfectly to what we found within this PDF.


Given the time later during my 24hr shift here at the ISC, I'll post another diary to take a look at other hostile PDF samples that we received.  If you got any interesting potentially hostile PDFs, please send them in!
 

4 comment(s)

Comments

Just wanted to say that pdf-parser.py is a great tool, along with pdftk. However, the one issue I keep coming up with is how to assist in deobfuscation of other Filter types. Since pdf-parser.py can only work with FlateDecode at this time, do you have any suggestions for alternate tools that handle the other Filter types?
pdf-parser.py has been updated to support ASCII85Decode:

http://blog.didierstevens.com/2009/08/06/update-pdf-parser-version-0-3-5/
The last version of pdf-parser supports:
/FlateDecode
/ASCIIHexDecode
/ASCII85Decode

The new version, to be published next week, adds support for:
/LZWDecode
/RunLengthDecode
Didier,

Excellent news. I just wanted to say thanks for a wonderful tool. Been using it for a while and it saves a lot of time.

Diary Archives