Analyzing PDF Streams

    Published: 2024-05-09
    Last Updated: 2024-05-09 15:02:37 UTC
    by Didier Stevens (Version: 1)
    0 comment(s)

    Occasionaly, Xavier and Jim will ask me specific students' questions about my tools when they teach FOR610: Reverse-Engineering Malware.

    Recently, a student wanted to know if my pdf-parser.py tool can extract all the PDF streams with a single command.

    Since version 0.7.9, it can.

    A stream is (binary) data, part of an object (optional), and can be compressed, or otherwise transformed. To view a single stream with pdf-parser, one selects the object of interest and uses option -f to apply the filters (like zlib decompression) to the stream:

     

    I added a feature that is present in several of my tools, like oledump.py and zipdump.py: extract al of the "stored items" into a single JSON document.

    When you use pdf-parser's option -j (--jsonoutput), all objects with a stream, will have the raw data (e.g., unfiltered) extracted and put into a JSON document that is sent to stdout:

    To have the filtered (e.g., decompressed data), use option -f together with option -j:

    What can you do with this JSON data? It depends on what your goals are. I have several tools that can take this JSON data as input, like file-magic.py and strings.py.

    Here I use file-magic.py to identify the type of each raw data stream:

    From this we can learn, for example, that object 143's stream contains a JPEG image.

    And here I use file-magic.py to identify the type of each filtered data stream:

    From this we can learn, for example, that object 881's stream contains a compressed TrueType Font file.

    What if you want to write all stream data to disk, in individual files, for further analysis (that's what the student wanted to do, I guess)?

    Then you can use my tool myjson-filter.py. It's a tool designed to filter JSON data produced by my tools, but it can also write items to disk.

    When you use option -l, this tool will just produce a listing of the items contained in de JSON data:

    And you can use option -W to write the streams to disk. -W takes a value that specifies what aming convention must be used to write the file to disk. vir will write items to disk with their sanitized name and extension .vir:

    hashvir will write items to disk with their sha256 value as name and extension .vir:

    Didier Stevens
    Senior handler
    Microsoft MVP
    blog.DidierStevens.com

    Keywords:
    0 comment(s)
    ISC Stormcast For Thursday, May 9th, 2024 https://isc.sans.edu/podcastdetail/8974

      Comments


      Diary Archives