In this problem we study an encoding scheme for compressing text files, and for efficient decoding. The
method is called Huffman coding.
Computers store alphanumeric characters as 8 bit binary strings. A text message of characters is thus
ultimately a string of bits. By exploiting the fact that not all characters appear with equal frequency in text,
we can encode the "rare" characters with long codes (bit sequences), and encode the frequently used
characters with short codes (sequences).
An optimal scheme - for a given set of character frequencies - is produced by Huffman coding. In this
method, a binary tree is constructed from a list of the character-frequency pairs. The construction of the
tree is accomplished by iteratively joining forests of trees into one final tree.
To begin with, each character is considered to be a single-node tree in a forest of such trees. The "value"
of each node is the frequency value associated with the character. At each stage of the building process,
the two trees with lowest values are taken as the children of a new node. The new node is assigned a
value equal to the sum of the values of its children and put back into the forest of trees. The process is
then iterated on the forest of trees which resulted from the previous stage. Eventually, there is only one
tree in the forest. A coding is assigned to the characters at its leaves by placing 0 on left links and 1 on
right links, and generating a binary sequence by following the links from the root to the character at its
leaf.