Вы находитесь на странице: 1из 12

Rene Nyffeneggers base64

Decoding Algorithm Explained

Michael Horii

Intro
Were using Rene Nyffeneggers code to
decode base64 strings.
Code is available at:

The following slides explain how the algorithm

works.

Encoded Symbols
Suppose we have four encoded symbols to
decode, where each symbol is represented by
a 6-bit vector. For ease of illustration, well
choose the encoded symbols as follows:
Encoded Symbol 1: ES1 = 111111
Encoded Symbol 2: ES2 = 000000
Encoded Symbol 3: ES3 = 111111
Encoded Symbol 4: ES4 = 000000

Decoding the String

To decode the base64 string, we concatenate the 6bit vectors, and group them in 8-bit vectors.
ES=Encoded Symbol

ES1

ES2

ES3

ES4

111111000000111111000000
DS=Decoded Symbol

DS1

DS2

DS3

111111000000111111000000
C

4-symbol encoded
string, and end up
with a 3-symbol
decoded string

We match each 8-bit vector to a symbol using a

look-up table.
11111100 C
00001111
11000000

A
T

The Catch
This decoding process is straightforward
conceptually, but the catch is that bytes are 8
bits, while the encoded symbols are
represented by 6-bit vectors.
That means well need to use three operators
to accomplish our goal of assembling the bits:
Bit shifting
ORing

Prepended Encoded Symbols

When we assign a 6-bit vector to a (1-byte) char
variable, two zeroes are prepended to the front
of the 6-bit vector:
111111 becomes 00111111
000000 becomes 00000000
Well refer to a prepended encoded symbol as PES:

PES1 = 00111111
PES2 = 00000000
PES3 = 00111111
PES4 = 00000000

(ES1 = 111111)
(ES2 = 000000)
(ES3 = 111111)
(ES4 = 000000)

Aligning Bit Vectors

Because were fitting 6-bit vectors into 8-bit
containers, we need to align the 6-bit vectors
using bit shifting.
ES1

ES2

ES3

ES4

111111000000111111000000
unsigned char PES1 = 0b111111;
unsigned char PES2 = 0b000000;
DS1
PES1 shifted left by 2
to align the bits

DS2

00111111
00000000

DS3

111111000000111111000000
00111111
00000000 Apply mask to PES2 to get the two digits

(underlined) needed for DS1, and then shift

right by 4 to align the bits

Plan of Action
The following slides show in detail how the
bytes are manipulated, from start to finish, to
achieve our goal.
Steps:
Populate Byte 1 (DS1) using ES1 and ES2
Populate Byte 2 (DS2) using ES2 and ES3
Populate Byte 3 (DS3) using ES3 and ES4
ES=Encoded Symbol

ES1

ES2

ES3

ES4

111111000000111111000000
DS=Decoded Symbol

DS1

DS2

DS3

111111000000111111000000

Populating Byte 1
Byte 1
Prepended ES1 = PES1

111111000000111111000000
00111111

Shift left by 2

11111100
11111100
Prepended ES2 = PES2
00000000

00000000

Blue: Digits
from ES1
Orange: Digits
from ES2

PES1 << 2
To populate Byte 1, we only want the first two nonprepended digits of ES2, so we use a mask to get these
digits

Shift right by 4

00000000
00000000

11111100

(PES1 << 2) | ((PES2 & 0x30) >> 4)

Populating Byte 2
Byte 2
PES2

111111000000111111000000
00000000 To populate Byte 2, we only want the
last 4 digits of ES2, so we use a mask to

Shift left by 4

00000000
00000000
00111111

PES3

00111100
Shift right by 2

(PES2 & 0xf) << 4

To populate Byte 2, we only want the first 4 nonprepended digits of ES3, so we use a mask to get
these digits

PES3 & 0x3c

00001111
00001111 (PES3 & 0x3c) >> 2
00001111 ((PES2 << 0xf) << 4) | ((PES3 & 0x3c) >> 2)

Populating Byte 3
Byte 3
PES3

111111000000111111000000
00111111

00000011

11000000
00000000
11000000

(PES3 & 0x3) << 6

Shift left by 6

PES4

PES4
((PES3 & 0x3) << 6) |PES4

Putting It All Together

Now we have three bytes, and concatenated
we get the desired byte array.
Byte 1

Byte 2

Byte 3

111111000000111111000000
C

11111100
00001111
11000000

C
A
T

Decoding just requires using the lookup table

to find out what symbol goes with Byte 1, Byte
2, and Byte 3. Using the look-up table shown
earlier, well get CAT as the decoded string.