Problem Set 4: MAS160: Signals, Systems & Information For Media Technology
Problem Set 4: MAS160: Signals, Systems & Information For Media Technology
Problem Set 4
DUE: October 20, 2003
Instructors: V. Michael Bove, Jr. and Rosalind Picard T.A. Jim McBride
PS 4-1
Lets see how much of a savings this method gives us. If we want to send a hundred of these symbols, ordinary binary code will require us to send 100 times 3 bits, or 300 bits. PS 4-2
In the S-F case, 75 percent of the symbols will be transmitted as 2-bit codes, 12.5 as 3-bit codes, and 12.5 as 4-bit codes, so the total is only 237.5 bits, on average. Thus the binary code requires 3 bits per symbol, while the S-F code takes 2.375. The entropy, or information content expression gives us a lower limit on the number of bits per symbol we might achieve.
m
H =
i=1
pi log2 (pi )
= [0.25 log 2 (0.25) + 0.25 log 2 (0.25) + 0.25 log 2 (0.25) + 0.125 log 2 (0.125) +0.0625 log 2 (0.0625) + 0.0625 log 2 (0.0625)] If your calculator doesnt do base-two logs (most dont), youll need the following high-school relation that many people forget: loga (x) = log10 (x)/ log 10 (a), so log2 (x) = log10 (x)/0.30103. And the entropy works out to 2.375 bits/symbol. So weve achieved the theoretical rate this time. The S-F coder doesnt always do this well, and more complex methods like the Human coder will work better in those cases (but are too time-consuming to assign on a problem set!). Now its your turn to do some coding. The below is a letter-frequency table for the English language (also available at http://ssi.www.media.mit.edu/courses/ssi/y03/ps4.freq.txt ). 5 E N H C Y V Q 13.105 7.098 5.259 2.758 1.982 0.919 0.121 T R D M P K Z 10.468 6.832 3.788 2.536 1.982 0.420 0.077 A I L U W X 8.151 6.345 3.389 2.459 1.539 0.166 O S F G B J 7.995 6.101 2.924 1.994 1.440 0.132
(a) Twenty-six letters require ve bits of binary. Whats the entropy in bits/letter of English text coded as individual letters, ignoring (for simplicity) capitalization, spaces, and punctuation? (b) Write a Shannon-Fano code for English letters. How many bits/letter does your code require? (c) Ignoring (as above) case, spaces, and punctuation, how many total bits does it take to send the following English message as binary? As your code? [You dont need to write out the coded message, just add up the bits.] There is too much signals and systems homework (d) Repeat (c) for the following Clackamas-Chinook sentence (forgive our lack of the necessary Native American diacritical marks!). nugwagimx lga dayaxbt, aga danmax wilxba diqelpxix.
PS 4-3
Problem 4:
Error Correction
A binary communication system contains a pair of error-prone wireless channels, as shown below. 1/8 Sender 1 Error Rate Receiver 1/Sender 2 1/16 Error Rate Receiver 2
Assume that in each channel it is equally likely that a 0 will be turned into a 1 or that a 1 into a 0. Assume also that in the rst channel the probability of an error in any particular bit is 1/8, and in the second channel it is 1/16. (a) For the combined pair of channels, compute the following four probabilities: a a a a 0 0 1 1 is is is is received received received received when when when when a a a a 0 1 1 0 is is is is transmitted, transmitted, transmitted, transmitted.
(b) Assume that a very simple encoding scheme is used: a 0 is transmitted as three successive 0s and a 1 as three successive 1s. At the decoder, a majority decision rule is used: if a group of three bits has more 0s than 1s (e.g. 000, 001, 010, 100), its assumed that a 0 was meant, and if more 1s than 0s that a 1 was meant. If the original source message has an equal likelihood of 1s and 0s, what is the probability that a decoded bit will be incorrect?
Problem 5:
Data Compression
You are given a data le that has been compressed to a length of 100,000 bits, and told that it is result of running an ideal entropy coder on a sequence of data. You are also told that the original data are samples of a continuous waveform, quantized to two bits per sample. The probabilities of the uncompressed values are s 00 01 p(s) 1/2 3/8 s 10 11 p(s) 1/16 1/16
(a) What (approximately) was the length of the uncompressed le, in bits? (You may not need to design a coder to answer this question!) (b) The number of (two-bit) samples in the uncompressed le is half the value you computed in part a). You are told that the continuous waveform was sampled at the minimum possible rate such that the waveform could be reconstructed exactly from the samples (at least before they were quantized), and you are told that the le represents 10 seconds of data. What is the highest frequency present in the continuous signal?
PS 4-4