Huffman code is very effective technique for compress data .Huffman code is greedy algorithm. Normal storage 8 bits per character in ASCII form. That huffman algorithm compress the file and store it compactly. Lossless data compression technique. Huffman codes generates prefix codes.
Prefix codes
Code of one character is not a prefix of another character.
Example
a=00
b=11
c=01
d=10
entire code should not be a prefix of any other code.So that is prefix code for character.
Example
a=0
b=00
c=10
d=1
That is not prefix code. a is contain in b.
Benefit of compress data
- Less disk space.
- Faster file transfer.
- Reduce file size.
- Compress data read and written faster than original data.
Huffman code
- Fixed length code.
- Variable length code.
Fixed length code
Each letter represent equal number of bits.
Variable length code
In variable length code giving frequent characters short code words and infrequent character long code words.
Steps to find Average length
- Arrange in increasing order.
- build Huffman tree(Left<Right)
- Assign codes to the branches(left=0,Right=1).
- Count the number of bits of each character.
- Find the average length=∑(Number of bits* Frequency).
Main point
Char always leaf node.
Char having high probability/Frequency will have smallest codes
Example
A text is made up of the characters A,B,C,D,E each occurring with the probability 0.08,0.40,0.25,0.15,0.12 respectively. The optimal coding will have the average length .
Solution
A B C D E
.08 .40 .25 .15 .12
Arrange them in increasing order
A E D C B
.08 .12 .15 .25 .40
Build Huffman tree
condition
Left<Right
Character be always in leaf node.
D C B
0.15 0.20 0.25 0.40
C B
0.25 0.35 0.40
B
0.40 0.60
Assign codes to the branches
Left branch =0
right branch=1
So
A=1110
B=0
C=10
D=110
E=1111
Average length of coding=∑(Number of bits* Frequency of Character)
Average length of coding=4*0.08+1*0.40+2*0.25+3*0.15+4*0.12 =2.15
0 Comments