Last Updated: 2020-05-22 13:46:43 UTC
by Didier Stevens (Version: 1)
When you handle unknown files, be it for malware analysis or other reasons, it helps to know some strings / hexadecimal sequences to quickly recognize file types and file content.
If you want to memorize some strings to improve your analysis skills, I recommend that the first string you memory is MZ, or 4D 5A in hexadecimal (ASCII table).
All Windows executables (PE file format) start with these 2 bytes: 4D 5A.
And that is not the only "skill" that you acquire by memorizing 4D 5A: as Z is the last letter of the alphabet, you also learned that all uppercase letters are smaller than or equal to 5A. You might already know that letter A is 41 (for example from PoC buffer overflows: AAAAAA -> 414141414141). Then you've learned that all uppercase letters are between hexadecimal values 41 and 5A.
Lowercase letters have their 6th most-significant bit set, while uppercase letters have that bit cleared. A byte with its 6th MSB set and all other bits cleared, has hexadecimal value 20. Add 20 to 41, and you have 61: letter a. Hence all lowercase letters are comprised between hexadecimal values 61 and 7A.
The next string I recommend to memorize, is PK: 50 4B. All records of a ZIP file start with PK (50 4B), and typical ZIP files start with a ZIP record (although this is not mandatory): hence typical ZIP files starts with PK. ZIP files are not only used for ZIP archives, but also for many other file formats, like Office documents (.docx, .docm, .xlsx, .xlsm, ...).
And when you memorize that PK is 50 4B, then it's not that difficult to memorize that PE is 50 45 (E is the fifth letter -> 45).
PE are the first 2 bytes of the header for PE files (Windows executables), and can be found after the MZ header (which is actually the DOS header).
- MZ -> 4D 5A
- PK -> 50 4B
- PE -> 50 45
- A-Z -> 41 - 5A
- a-z -> 61 - 7A
Please post a comment if you have more "memorable" strings. We might end up with a small cheat sheet.