0% found this document useful (0 votes)
20 views

IT Project Par 1

The document describes the implementation of Huffman coding in MATLAB across 5 sections. Section 1 extracts symbols and calculates probabilities from a text file. Section 2 generates the Huffman code dictionary. Section 3 calculates coding parameters. Section 4 encodes a file, and Section 5 decodes the encoded file and verifies successful decoding.

Uploaded by

Mina Emad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

IT Project Par 1

The document describes the implementation of Huffman coding in MATLAB across 5 sections. Section 1 extracts symbols and calculates probabilities from a text file. Section 2 generates the Huffman code dictionary. Section 3 calculates coding parameters. Section 4 encodes a file, and Section 5 decodes the encoded file and verifies successful decoding.

Uploaded by

Mina Emad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

IT Major task part 1

Students:
Amr Khaled Mahmoud 19P8679
Mina Emad Maurice 19P
1. Code Sections:

1.1. The first section:

In this section we extracted the symbols out of the text files, the variable
‘Content’ is used to store the content of the text file in characters.
The unique function extracts the unique symbols ( in our case is the
characters) out of the text files so every unique character is counted as a
symbol.

In order to calculate probabilities first we define the vector to store it, then in
the later for-loop we found the no of occurrences of each letter in the text
file then we divided it by the total number of characters in order to get each
symbol probability, ismember function marks every occurrence of the
symbol then the combination of length and find is used to calculate the
number of occurrences.
1.2. The second Section

In order to begin generating the codes, we needed to combine both the


symbols and their probabilities in a cell as we will need it to connect the
symbols and the probabilities then we sorted the cell as a requirement for
Huffman, then we took a copy of the cell as we are going to edit in it.
Later we defined a new cell to include code words generated, we used cell as
it is defined as a null so we can use it to concatenate, then we introduced the
special case where the number of the symbols is equal to one so we assign
the code word ‘1’ to it.
In order to implement the Huffman algorithm we used the same steps we
used in our course, first we sorted out the symbols discendingly according to
their probability, in the next step we take out the last 2 symbols and we
iterate through them if it was a combination of symbols (if it is a single
symbol the length is equal to 1) then we assign ‘0’ or ‘1’ to the components
of the symbol if it was combined or directly if it was a single symbol.
At the end we add the probabilities and combine the symbols added also we
delete the last row of the combined symbols.
In the end in order to generate the dictionary we used a cell that contains the
symbols in the first column and codes in the second column.
1.3. Section three

In this section we calculate the parameters required the average length


(L_Avg) is the summation of code lengths multiplied with their probability.
The Entropy (H_of_S) is the summation of the symbol probability
multiplied by the log base 2 of 1/probability
The efficiency is the entropy divided by the average length
The compression ratio represents how much compression have we made
when we coded a single symbol, the average length of the symbols
MATLAB use to represent its characters is 8 (ASCII) so we divided the
average length of our code to the average length that MATLAB uses to find
how much have we compressed.
1.4. Section four

In this section we are encoding the trial.txt file with our generated code
The shown for loop loops for the length of the characters of the trial file and
compares each character to symbols in the created dictionary , once it finds
the character is equal to the symbol it inserts its code instead. Then we use
the combination of string and strjoin to combine the codes as a string with
no spaces, then we print it into a text file called ‘Tx.txt’
1.5. Section five

In this section we are trying to decode the file we previously coded, we


receive the string file in a variable called Rx, then we loop for the length of
the received file.
To decode we add a single bit for every iteration on our loop and compare it
to the codes we are having, since we are using a prefix code we will have no
problem with this method as no code is the beginning of another code.
In the end we compare the retrieved file (Retr.txt) to the content file
(trial.txt) using strcmp which gives us a value of one meaning that our
decoding was successful.
2.0. Results

2.1. Section one


Probabilities and Symbols:
2.2. Section two
Dictionary:
2.3. Section Three
Average Length

Info:
Efficiency:

Compression Ratio:

Entropy
2.4. Section four
Transmitted Bit stream as a string:
2.5. Section five
Retrieved (Decoded) File:

Comparison Value:

You might also like