Last Updated: 2023-08-14 14:10:57 UTC
by Didier Stevens (Version: 1)
pdfid.py is a triage tool: it's essentially a "string search tool", that looks for certain keywords, without parsing the document's PDF structure.
10 years ago, I adviced to use pdf-parser to search for those sequences.
From time to time, people still ask me about these false positives, and it's actually good to write a revisited diary entry about this.
If you have a PDFiD detection for a short string like /JS:
And you can't find it with pdf-parser.py:
Then use pdf-parser's option -a to calculate statistics:
If the /JS detection is a false positive, then it will not appear in pdf-parser's statistics: that's because pdf-parser is a PDF parser, and can distinguish between keywords found in the right place (/JS inside a dictionary) and the wrong place (/JS inside a binary stream).
Notice that it's best to use option -a together with -O, because then stream objects (/ObjStm) will also be parsed:
And just for reference, this is how the output of pdfid and pdf-parser looks with true positives: