Last Updated: 2020-09-29 17:44:15 UTC
by Didier Stevens (Version: 1)
I was asked to help with the decoding of a BASE64 string that my base64dump.py tool could not handle.
The issue was the following: this particular BASE64 string was corrupt, its length was not a multiple of 4. In BASE64, 64 characters are used to do the encoding: each BASE64 character represents 6 bits. When one byte (8 bits) is encoded, 2 BASE64 characters are needed (6 + 2
bytesbits). To indicate that the last 4 bits of the second BASE64 character should be discarded, 2 padding characters are added (==).
For example, the ASCII character I (8 bits) is represented by 2 BASE64 characters (SQ) followed by 2 padding characters (==). This gives SQ==: 4 bytes long.
When 2 bytes are encoded (16 bits), 3 BASE64 characters are needed (3 * 6 = 18 bits) and 2 bits should be discarded (one padding character =), thus 4 characters are used.
And when 3 bytes are encoded (24 bits), 4 base64 characters are needed (4 * 6 = 24 bits).
Conclusion: valid BASE64 strings have a length that is a multiple of 4.
My tool base64dump.py can handle BASE64 strings that have a length that is not a multiple of 4.
Here is an example. BASE64 string 12345678 is 8 characters long:
base64dump.py is able to recognize this BASE64 string, and decode it.
Let's add one character, resulting in a BASE64 string with a length that is not a multiple of 4 (length of 9 characters):
base64dump.py does not recognize this BASE64 string.
We can help base64dump.py to recognize this string, by using option -p. This option takes a Python function, that will be used to process the detected strings before they are decoded. In this case, we will use Python function L4, a function I defined: it truncates strings to a length that is a multiple of 4.
Using this function L4 with option -p, we can decode the corrupt string: