The first step in speech coding is to transform the sound waves of our voices into an electrical signal. A microphone consists of a diaphragm, a magnet, and a coil of wire. The signal created by a microphone must be converted into digital form.
Download as DOC, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
51 views
8 Speech Coding
The first step in speech coding is to transform the sound waves of our voices into an electrical signal. A microphone consists of a diaphragm, a magnet, and a coil of wire. The signal created by a microphone must be converted into digital form.
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 13
Speech Coding
Analog to Digital Conversion
In order to fully understand speech and channel coding it is easier to start from the very beginning of the process. The first step in speech coding is to transform the sound waves of our voices (and other ambient noise) into an electrical signal. This is done by a microphone. A microphone consists of a diaphragm, a magnet, and a coil of wire. When you speak into it, sound waves created by your voice vibrate the diaphragm which is connected to the magnet which is inside the coil of wire. These vibrations cause the magnet to move inside the coil at the same freuency as your voice. A magnet moving in a coil of wire creates an electric current. This current which is at the same freuency as the sound waves is carried by wires to whereever you wish it to go like an amplifier, transmitter, etc. !nce it gets to its destination the process is reversed and it comes out as sound. "peakers basically being the opposite of microphones. The signal created by a microphone is an analog signal. "ince #"$ is an all digital system, this analog signal is not suitable for use on a #"$ network. The analog signal must be converted into digital form. This is done by using an Analog to %igital &onverter (A%&). In order to reduce the amount of data needed to represent the sound wave, the analog signal is first inputted into a band pass filter. 'and pass means that the filter only allows signal that fall within a certain freuency range to pass through it, and all other signals are cut off, or attenuated. The '( filter only allows freuencies between )**+, and ).- k+, to pass through it. This limits the amount of data that the Analog.%igital &onverter is reuired to process. Band Pass Filter The filtered signal is inputted into the analog.digital converter. The analog.digital converter performs two tasks. It converts an analog signal into a digital signal and it does the opposite, converts a digital signal into an analog signal. In the case of a cell phone, the analog signal created by a microphone is passed to the analog.digital converter. The A.% converter measures the analog signal, or samples it /*** times per second. This means that the A%& takes a sample of the analog signal every .012 sec (012 3s). 4ach sample is uantified with a 0)5bit data block. If we calculate 0) bits per sample at /*** samples per second, we determine a data rate of 0*-,*** bits per second, or 0*- kb.s. Analog/Digital Converter A data rate of 0*- kbps is far too large to be economically handled by a radio transmitter. In order to reduce the bitrate, the signal is inputted into a speech encoder.A speech encoder is a device that compresses the data of a speech signal. There are many types of speech encoding schemes available. The speech encoder used in #"$ is called 6inear (redictive &oding (6(&) and 7egular (ulse 48citation (7(4). 6(& is a very complicated and math5heavy process, so it will only be summari,ed here. Linear Predictive Coding (LPC) 7emember that the A%& uantifies each audio sample with a 0)5bit 9word9. In 6(&, 0:* of the 0)5bit samples from the converter are saved up and stored into short5term memory. 7emember that a sample is taken every 012 3s, so 0:* samples covers an audio block of 1*ms. This 1*ms audio block consists of 1*/* bits. 6(&57(4 analy,es each 1*ms set of data and determines / coefficients used for filtering as well as an e8citation signal. 6(& basically identifies specific bits that correspond to specific aspects of human voice, such as vocal modifiers (teeth, tongue, etc.) and assigns coefficients to them. The e8citation signal represents things like pitch and loudness. 6(& identifies a number of correlations of human voice and redundancies in human speech and removes them. The 6(&.7(4 seuence is then fed into the 6ong5Term (rediction (6T() Analysis function. The 6T( function compares the seuence it receives with earlier seuences stored in its memory and selects the seuence that most resembles the current seuence. The 6T( function then calculates the difference between the two seuences. ;ow the 6T( function only has to translate the difference value as well as a pointer indicating which earlier seuence it used for comparison. 'y doing this is prevents encoding redundant data. <ou can envision this by thinking about the sounds we make when we talk. When we pronounce a syllable, each little sound has a specific duration that seems short when we are talking but often lasts longer than 1*ms. "o, one sound might be represented by several 1*ms5block of e8actly the same data. 7ather than transmit redundant data, 6(& only includes data that tells the receiving which data is redundant so that it can be created on the receiving end. =sing 6(&.7(4 and 6T(, the speech encoder reduces the 1*ms block from 1,*/* bits to to 1:* bits. ;ote that this is a reduction by eight times. 1:* bits every 1*ms gives us a net data rate of 0) kilobits per second (kbps). Speech Encoding This bitrate of 0)kbps is known as >ull 7ate "peech (>"). There is another method for encoding speech called +alf 7ate "peech (+"), which results in a bit rate of appro8imately 2.:kbps. The e8planations in the remainder of this tutorial are based on a full5rate speech bitrate (0)kbps). Calculate the net data rate: Description Formula Result &onvert ms to sec 1* ms ? 0*** .*1 seconds &alculate bits per second 1:* bits ? .*1 seconds 0),*** bits per second (bps) &onvert bits to kilobits 0),*** bps ? 0*** 0) kilobits per sec (kbps) As we all know, the audio signal must be transmitted across a radio link from the handset to the 'ase "tation Transceiver ('T"). The signal on this radio link is sub@ect to atmospherics and fading which results in a large amount of data loss and degrades the audio. In order to prevent degradation of audio, the data stream is put through a series of error detection and error correction procedures called channel coding. The first phase of channel coding is called block coding. Block Coding A single 1:*5bit (1*ms) audio block is delivered to the block5coder. The 1:* bits are divided up into classes according to their importance in reconstructing the audio. &lass I are the bits that are most important in reconstructing the audio. The class II bits are the less important bits. &lass I bits are further divided into two categories, Ia and Ib. Classes of Bits The class Ia bits are protected by a cyclic code. The cyclic code is run on the 2* Ia bits and calculates ) parity bits which are then appended to the end of the Ia bits. !nly the class Ia bits are protected by this cyclic code. The Ia and Ib bits are then combined and an additional - bits are added to the tail of the class I bits (Ia and Ib together). All four bits are ,eros (****) and are needed for the ne8t step which is 9convolutional coding9. There is no protection for class II bits. As you can see, block coding adds seven bits to the audio block, ) parity bits and - tail bits, therefore, a 1:*5bit block becomes a 1:A5bit block. Block Coding Convolutional Coding This 1:A5bit block is then inputted into a convolutional code. &onvolutional coding allows errors to be detected and to be corrected to a limited degree. The class I 9protected9 bits are inputted into a comple8 convolutional code that outputs 1 bits for every bit that enters it. The second bit that is produced is known as a redundancy bit. The number of class I bits is doubled from 0/B to )A/. This coding uses 2 consecutive bits to calculate the redundancy bit, this is why there are - bits added to the class I bits when the cyclic code was calculated. When the last data bit enters the register, it uses the remaining four bits to calculate the redundancy bit for the last data bit. The class II bits are not run through the convolutional code. After convolutional coding, the audio block is -2: bits Convolutional Coding Reordering, Partitioning, and Interleaving ;ow, one problem remains. All of this error detection and error correction coding will not do any good if the entire -2:5bit block is lost or garbled. In order to alleviate this, the bits are reordered and partioned onto eight separate sub5blocks. If one sub5block is lost then only one5eighth of the data for each audio block is lost and those bits can be recovered using the convolutional code on the receiving end. This is known as interleaving. 4ach -2:5bit block is reordered and partitioned into / sub5blocks of 2A bits each These eight 2A5bit sub5blocks are then interleaved onto / separate bursts. As you remember from the T%$A Tutorial, each burst is composed of two 2A5bit data blocks, for a total data payload of 00- bits. The first four sub5blocks (* through )) are mapped onto the even bits of four consecutive bursts. The last four sub5blocks (- through A) are mapped onto the odd bits of the ne8t - consecutive bursts. "o, the entire block is spread out across / separate bursts. Taking a look at the diagram below we see three -2:5bit blocks, labeled A, ', and &. 4ach block is sub5divided into eight sub5blocks numbered *5A. 6etCs take a look at 'lock '. We can see that each sub5block is mapped to a burst on a single time5slot. 'lock ' is mapped onto / separate bursts or time5slots. >or illustrative purposes, the time5slots are labeled " through D. 6etCs e8pand time5slot E for a close5up view. We can see how the bits are mapped onto a burst. The bits from 'lock ', sub5block ) (')) are mapped onto the even numbered bits of the burst (bits *,1,-....0*/,00*,001). <ou will also notice that the odd bits are being mapped from data from block A, sub5block A (bits 0,),2....0*B,000,00)). 4ach burst contains 2A bits of data from two separate -2:5bit blocks. This process is known as interleaving. Reordering, Partitioning, and Interleaving In the following diagram, we e8amine time5slot W. We see that bits from '- are mapped onto the odd5number bits (bits 0,),2....0*B,000,00)) and we would see bits from &0 mapped onto the even number bits (bits *,1,-....0*/,00*,001). This process continues indefinitely as data is transmitted. Time5slots W, F, <, and D would all be mapped identically. The ne8t time5slot would have data from 'lock & and 'lock % mapped onto it. This process continues for as long as there is data being generated. Interleaving The process of interleaving effectively distributes a single -2: bit audio block over / separate bursts. If one burst is lost, only 0./ of the data is lost, and the missing bits can be recovered using the convolutional code. ;ow, you might notice that the data it takes to represent a 1*ms (-2:5bits) audio block is spread out across / timeslots. If you remember that each T%$A frame is appro8imately -.:02ms, we can determine that it takes about )Ams to transmit one single -2:5bit block. It seems like transmitting 1*ms worth of audio over a period of )Ams would not work. +owever, this is not what is truly happening. If you look at a series of blocks as they are mapped onto time5slots you will notice that one sub5block ends every four time5slots, which is appro8imately 0/ms. The only effect this has is that the audio stream is effectively delayed by 1*ms, which is truly negligible. In the diagram below, we can see how this works. The diagram shows 0: bursts. 7emember that a burst occurs on a single time5slot and the the duration of a time5slot is 2AA 3s. 4ight time5slots make up a T%$A frame, which is -.:02ms. "ince a single resource is only given one time5slot in which to transmit, we only get to transmit once every T%$A frame. Therefore, we only get to transmit one burst every -.:02ms. G If this is not clear, please review the T%$A Tutorial. %uring each time5slot, a burst is transmitted that carries data from two different -2:5bit blocks. In the diagram below, 'urst 0 carries data from A and ', burst 2 has ' and &, burst B has & and %, etc. 6ooking at the diagram, we can see that it does take appro8imately )Ams for 'lock ' to transmit all of its data, (bursts 05/). +owever, in bursts 25/, data from block & is also being transmitted. !nce block ' has finished transmitting all of its data (burst /), block & has already transmitted half of its data and only reuires - more bursts to complete its data. 'lock A completes transmitting its data at the end of the fourth burst. 'lock ' finishes in the eighth, block &, in the 01th, and block % in the 0:th. Eiewing it this way shows us that every fourth burst comepletes the data for one block, which takes appro8imately 0/ms. The following diagram illustrates the entire process, from audio sampling to partitioning and interleaving. %ata and signalling messages will be covered in a future tutorial.