100% found this document useful (1 vote)
577 views

Digital Communications - Proakis

Uploaded by

Shankar Shrestha
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (1 vote)
577 views

Digital Communications - Proakis

Uploaded by

Shankar Shrestha
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 937
Preface CONTENTS 1 Introduction 1-1 Elements of a Digital Communication System 1-2. Communication Channels.and Their Characteristics 1-3 Mathematical Models for Communication Channels 1-4 A Historical Perspective in the Development of Digital Communications 1-5 Overview of the Book 1-6 Bibliographical Notes and References 2 Probability and Stochastic Processes 2-1 Probability . 2-1-1 2-1-2 2-1-3 21-4 21-5 2-1-6 Random Variables, Probability Distributions, and Probability Densities Functions of Random Variables Statistical Averages of Random Variables Some Useful Probability Distributions Upper bounds on the Tail Probability Sums of Random Variables and the Central Limit Theorem 2-2 Stochastic Processes 22-1 2-2-2 22-3 2-2-4 22-5 226 Response of a Linear Time-Invariant System to a Random Input Signal « Sampling Theorem for Band-Limited Stochastic Processes Discrete-Time Stochastic Signals and Systems Cyclostationary Processes 23 Bibliographical Notes and References Problems 3 Source Coding 3-1 3-2 3-3 34 3-6 Mathematical Models for Information A Logarithmic Measure of Information 3-2-1 Average Mutual Information and Entropy 3-2-2 Information Measures for Continuous Random Variables Coding for Discrete Sources 3-3-1 Coding for Discrete Memoryless Sources 3-3-2 Discrete Stationary Sources 3-3-3. The Lempel-Ziv Algorithm Coding for Analog Sources—Optimum Quantization 3-4-1 Rate-Distortion Function 3-4-2 Scalar Quantization 3-4-3 Vector Quantization Coding Techniques for Analog Sources 3-5-1 Temporal Waveform Coding 3-5-2 Spectral Waveform Coding 3-5-3 Model-Based Source Coding Bibliographical Notes and References Problems 4 Characterization of Communication Signals and Systems 4-1 4-2 43 45 Representation of Bandpass Signals and Systems 4-1-1 Representation of Bandpass Signals 4-1-2 Representation of Linear Bandpass Systems 4-1-3. Response of a Bandpass System to a Bandpass Signal 4-1-4 Representation of Bandpass Stationary Stochastic Processes Signal Space Representation 4-2-1 Vector Space Concepts 4-2-2 Signal Space Concepts 42-3 Orthogonal Expansions of Signals Representation of Digitally Modulated Signals 4-3-1 Memoryless Modulation Methods 4-3-2 Linear Modulation with Memory 4-3-3 Nonlinear Modulation Methods with Memory Spectral Characteristics of Digitally Modulated Signals 4-4-1 Power Spectsa of Linearly Modulated Signals 4-4-2 Power Spectra of CPFSK and CPM Signals 4-4-3 Power Spectra of Modulated Signals with Memory Bibliographical Notes and References Problems 5 Optimum Receivers for the Additive White Gaussian Noise Channel 5-1 Optimum Receiver for Signals Corrupted by AWGN 5-1-1 Correlation Demodulator 5-1-2 Matched-Filter Demodulator 152 152 153 157 157 159 163 165 165 173 174 186 190 220 223 224 BEER 5-2 5-3 5-4 5-5 5-6 5-1-3 The Optimum Detector 5-14 The Maximum-Likelihood Sequence Detector 5-1-5 A Symbol-by-Symbol MAP Detector for Signals with Memory : Performance of the Optimum Receiver for Memoryless Modulation 5-2-1 Probability of Error for Binary Modulation 5-2-2 Probability of Error for M-ary Orthogonal Signals 5-23 Probability of Error for M-ary Biorthogonal Signals 5-2-4 Probability of Error for Simplex Signals 5-2-5 Probability of Error for M-ary Binary-Coded Signals 5-2-6 Probability of Error for M-ary PAM 5-2-7 Probability of Error for M-ary PSK 5-2-8 Differential PSK (DPSK) and its Performance 5-2-9 Probability of Error for QAM. 5-2-10 Comparison of Digital Modulation Methods Optimum Receiver for CPM Signals 5-3-1 Optimum Demodulation and Detection of CPM 5-3-2 Performance of CPM Signals 5-3-3 Symbol-by-Symbol Detection of CPM Signals Optimum Receiver for Signals with Random Phase in AWGN Channel 5-4-1 Optimum Receiver.for Binary Signals 5-4-2 Optimum Receiver for M-ary Orthogonal Signals 5-4-3 Probability of Error for Envelope Detection of M-ary Orthogonal Signals 5-4-4 Probability of Error for Envelope Detection of Correlated Binary Signals Regenerative Repeaters and Link Budget Analysis 5-5-1 Regenerative Repeaters 5-5-2. Communication Link Budget Analysis Bibliographical Notes and References Problems 6 Carrier and Symbol Synchronization 61 6-2 6-3 Signal Parameter Estimation 6-1-1 The Likelihood Function 6-1-2 Carrier Recovery and Symbol Synchronization in Signal Demodulation Carrier Phase Estimation 6-2-1 Maximum-Likelihood Carrier Phase Estimation 6-2-2 The Phase-Locked Loop 6-2-3 Effect of Additive Noise on the Phase Estimate 6-2-4 Decision-Directed Loops 6-2-5 Non-Decision-Directed Loops Symbol Timing Estimation 6-3-1 Maximum-Likelihood Timing Estimation 6-3-2 Non-Decision-Directed Timing Estimation 312 313 314 316 319 320 333 333 335 336 337 339 343 347 350 358 359 361 6-4 Joint Estimation of Carrier Phase and Symbol Timing 365 6-5 Performance Characteristics of ML Estimators 367 6-6 Bibliographical Notes and References 370 Problems a7 7 Channel Capacity and Coding 374 7-1 Channel] Models and Channel Capacity 375 7-1-1 Channel Models 375 7-1-2 Channel Capacity 380 7-1-3. Achieving Channel Capacity with Orthogonal Signals 387 7-1-4 Channel Reliability Functions 389 7-2. Random Selection of Codes 390 7-2-1 Random Coding Based on M-ary Binary-Coded Signals 390 7-2-2 Random Coding Based on M-ary Multiamplitude Signals 397 7-2-3. Comparison of R$ with the Capacity of the AWGN Channel 399 7-3 Communication System Design Based on the Cutoff Rate 400 7-4 Bibliographical Notes and References 406 Probiems 406 8 Block and Convolutional Channel Codes 413 8-1 Linear Block Codes 413 8-1-1 The Generator Matrix and the Parity Check Matrix 417 8-1-2 Some Specific Linear Block Codes 421 8-1-3 Cyclic Codes 423 8-1-4 Optimum Soft-Decision Decoding of Linear Block Codes 436 8-1-5 Hard-Decision Decoding 445 8-1-6 Comparison of Performance between Hard-Decision and Soft-Decision Decoding 456 8-1-7 Bounds on Minimum Distance of Linear Block Codes 461 8-1-8 Nonbinary Block Codes and Concatenated Block Codes 464 8-1-9 Enterleaving of Coded Data for Channels with Burst Errors 468 8-2 Convolutional Codes 470 8-2-1 The Transfer Function of a Convolutional Code a7 8-2-2 Optimum Decoding of Convolimional Codes— The Viterbi Algorithm 483 8-2-3. Probability of Error for Soft-Decision Decoding 486 8-2-4 Probability of Error for Hard-Decision Decoding 489 8-2-5 Distance Properties of Binary Convolutional Codes 492 8-2-6 Nonbinary Dual-k Codes and Concatenated Codes. 492 8-2-7 Other Decoding Algorithms for Convolutional Codes 500 8-2-8 Practical Considerations in the Application of Convolutional Codes 506 8-3 Coded Modulation for Bandwidth-Constrained Channels Sit 8-4 Bibliographical Notes and References 526 Problems S28 9 10 Signal Design for Band-Limited Channels 9-1 Characterization of Band-Limited Channels 9-2 Signal Design for Band-Limited Channels 9-2-1 Design of Band-Limited Signals for No Intersymbol Interference—The Nyquist Criterion 9-2-2 Design of Band-Limited Signals with Controlled ISI— Partial-Response Signals 9-2-3 Data Detection for Controlled ISI 9-2-4 Signal Design for Channels with Distortion 9-3 Probability of Error in Detection of PAM 9-3-1 Probability of Error for Detection of PAM with Zero ISI 9-3-2 Probability of Error for Detection of Partial-Response ‘Signals 9-3-3 Probability of Error for Optimum Signals in Channel with Distortion 9-4 Modulation Codes for Spectrum Shaping 9-5 Bibliographical Notes and References Problems Communication through Band-Limited Linear Filter Channels 10-1 Optimum Receiver for Channels with ISI and AWGN 10-1-1 Optimum Maximum-Likelihood Receiver 10-1-2 A Discrete-Time Model for a Channel with ISI 10-1-3| The Viterbi Algorithm for the Discrete-Time White Noise Filter Model 0-1-4 Performance of MLSE for Channels with [SI 10-2 Linear Equalization 10-2-1 Peak Distortion Criterion {0-2-2 Mean Square Error (MSE) Criterion 10-2-3 Performance Characteristics of the MSE Equalizer 10-2-4 Fractionally Spaced Equalizer 10-3 Decision-Feedback Equalization 10-3-1 Coefficient Optimization 10-3-2 Performance Characteristics of DFE 10-3-3 Predictive Decision-Feedback Equalizer 10-4 Bibliographical Notes and References Problems Adaptive Equalization 11-1 Adaptive Linear Equalizer 1-1-1 The Zero-Forcing Algorithm {1-1-2 The LMS algorithm 1-1-3 Convergence Properties of the LMS Algorithm {1-1-4 Excess MSE Due to Noisy Gradient Estimates 11-1-5 Baseband and Passband Linear Equalizers 11-2 Adaptive Decision-Feedback Equalizer 11-2-1 Adaptive Equalization of Trellis-Coded Signals 576 SEZESESSR SSE 12 13 14 11-3 An Adaptive Channel Estimator for ML Sequence Detection 11-4 Recursive Least-Squares Algorithms for Adaptive Equalization 11-4-1 Recursive Least-Squares (Kalman) Algorithm 11-4-2 Linear Prediction and the Lattice Filter 11-5 Self-Recovering (Blind) Equalization 11-5-1 Blind Equalization Based on Maximum-Likelihood Criterion 11-5-2 Stochastic Gradient Algorithms 11-5-3. Blind Equalization Algorithms Based on Second- and Higher-Order Signal Statistics 11-6 Bibliographical Notes and References Problems Multichannel and Multicarrier Systems 12-1 Muitichannel Digital Communication in AWGN Channels 12-1-1 Binary Signals 12-1-2. M-ary Orthogonal Signals 12-2. Multicarrier Communications 12-2-| Capacity of a Non-Ideal Linear Filter Channe} 2 An FFT-Based Multicarrier System 12-3 Bibiliographical Notes and References Problems Spread Spectrum Signals for Digital Communications 13-1 Model of Spread Spectrum Digital Communication System 13-2 Direct Sequence Spread Spectrum Signals 33-2-1 Error Rate Performance of the Decoder 13-2-2 Some Applications of DS Spread Spectrum Signals {3-2-3 Effect of Pulsed Interference on DS Spread Spectrum Systems 13-2-4 Generation of PN Sequences 13-3 Frequency-Hoppped Spread Spectrum Signals 13-3-1 Performance of FH Spread Spectrum Signals in AWGN Channet 13-3-2 Performance of FH Spread Spectrum Signals in Partial- Band Interference 13-3-3. A CDMA System Based on FH Spread Spectrum Signals 13-4 Other Types of Spread Spectrum Signals 13-5 Synchronization of Spread Spectrum Signals 13-6 Bibliographical Notes and References Probiems Digital Communication through Fading Multipath Channels 14-1 Characterization of Fading Multipath Channels 14-1-1 Channel! Correlation Functions and Power Spectra 14-1-2 Statistical Models for Fading Channels 673 675 676 758 759 762 167 14-2 14-3 14-4 14-6 14-7 The Effect of Characteristics on the Choice of a Channel Model Frequency-Nonselective, Slowly Fading Channel Diversity Techniques for Fading Multipath Channels 14-41 Binary Signals 14-42 Multiphase Signals 14-4.3 M-ary Orthogonal Signals Digital Signaling over a Frequency-Selective, Slowly Fading Channel 14-51 A Tapped-Delay-Line Channel Model 14-5-2. The RAKE Demodulator 14-5-3 Performance of RAKE Receiver Coded Waveforms for Fading Channels 14-6-1 Probability of Error for Soft-Decision Decoding of Linear Binary Block Codes 14-6-2 Probability of Error for Hard-Decision Decoding of Linear Binary Block Codes 14-6-3. Upper Bounds on the Performance of Convolutional Codes for a Raleigh Fading Channel 14-6-4 Use of Constant-Weight Codes and Concatenated Codes for a Fading Channel 14-6-5 System Design Based on the Cutoff Rate 14-6-6 Trellis-Coded Modulation Bibliographical Notes and References Problems 15 Multiuser Communications 15-1 15-2 15-3 15-4 15-5 Introduction to Multiple Access Techniques Capacity of Multiple Access Methods Code-Division Multiple Access 15-3-1_ CDMA Signal and Channel Models 15-3-2. The Optimum Receiver 15-3.3 Suboptimum Detectors 15-34 Performance Characteristics of Detectors Random Access Methods 15-41 ALOHA System and Protocols 15-4-2 Carrier Sense Systems and Protocols Bibliographical Notes and References Problems Appendix A The Levinson—Durbin Algorithm Appendix B_ Error Probability for Multichannel Binary Signals 770 772 7 778 787 811 gil 814 832 833 849 849 851 859 862 867 872 873 879 Appendix C Error Probabilities for Adaptive Reception of M-phase Signals C-1 Mathematical Model for an M-phase Signalin; Communications System . C-2 Characteristic Function and Probability Density Function of the Phase 6 C-3 Error Probabilities for Slowly Rayleigh Fading Channels C-4 Error Probabilities for Time-Invariant and Ricean Fading Channels Appendix D Square-Root Factorization References and Bibliography Index @& g eg 917 1 INTRODUCTION In this book, we present the basic principles that underlie the analysis and design of digital communication systems. The subject of digital communica- tions involves the transmission of information in digital form from a source that generates the information to one or more destinations. Of particular importance in the analysis and design of communication systems are the characteristics of the physical channels through which the information is transmitted. The characteristics of the channel generally affect the design of the basic building blocks of the communication system. Below, we describe the elements of a communication system and their functions. 1-1 ELEMENTS OF A DIGITAL COMMUNICATION SYSTEM Figure 1-1-1 illustrates the functional diagram and the basic elements of a digital communication system. The source output may be either an analog signal, such as audio or video signal, or a digital signal, such as the output of a teletype machine, that is discrete in time and has a finite number of output characters. In a digital communication system, the messages produced by the source are converted into a sequence of binary digits. Ideally, we should like to represent the source output (message) by as few binary digits as possible. In other words, we seek an efficient representation of the source output that results in little or no redundancy. The process of efficiently converting the output of either an analog or digital source into a sequence of binary digits is called source encoding or daia compression. The sequence of binary digits from the source encoder, which we call the 1 2 DIGITAL COMMUNICATIONS Information source and input transducer} Output signal FIGURE 1-1-1 Basic elements of a digital communication system. information sequence, is passed to the channel encoder. The purpose of the channel encoder is to introduce, in a controlled manner, some redundancy in the binary information sequence that can be used at the receiver to overcome the effects of noise and interference encountered in the transmission of the signal through the channel. Thus, the added redundancy serves to increase the reliability of the received data and improves the fidelity of the received signal. In effect, redundancy in the information sequence aids the receiver in decoding the desired information sequence. For example, a (trivial) form of encoding of the binary information sequence is simply to repeat each binary digit m times, where m is some positive integer. More sophisticated (nontrivial) encoding involves taking k information bits at a time and mapping each k-bit sequence into a unique n-bit sequence, called a code word. The amount of redundancy introduced by encoding the data in this manner is measured by the ratio n/k. The reciprocal of this ratio, namely k/n, is called the rate of the code or, simply, the code rate. The binary sequence at the output of the channel encoder is passed to the digital modulator, which serves as the interface to the communications channel. Since nearly all of the communication channels encountered in practice are capable of transmitting electrical signals (waveforms), the primary purpose of the digital modulator is to map the binary information sequence into signal waveforms. To elaborate on this point, let us suppose that the coded information sequence is to be transmitted one bit at a time at some uniform rate R bits/s. The digital modulator may simply map the binary digit 0 into a waveform S(t} and the binary digit 1 into a waveform s,(r). In this manner, each bit from the channel! encoder is transmitted separately. We call this binary modulation. Alternatively, the modulator may transmit b coded information bits at a time by using M = 2° distinct waveforms s,(t), i=0,1,...,M—1, one waveform for each of the 2° possible b-bit sequences. We call this M-ary modulation (M> 2). Note that a new b-bit sequence enters the modulator CHAPTER |: INTRODUCTION 3. every b/R seconds. Hence, when the channel bit rate R is fixed, the amount of time available to transmit one of the M waveforms corresponding to a b-bit sequence is b times the time period in a system that uses binary modulation. The communication channel is the physical medium that is used to send the signal from the transmitter to the receiver. In wireless transmission, the channel may be the atmosphere (free space). On the other hand, telephone channels usually employ a variety of physical media, including wire lines, optical fiber cables, and wireless {microwave radio). Whatever the physical medium used for transmission of the information, the essential feature is that the transmitted signal is corrupted in a random manner by a variety of possible mechanisms, such as additive thermal noise generated by electronic devices, man-made noise, ¢.g., automobile ignition noise, and atmospheric noise, e.g., electrical lightning discharges during thunderstorms. At the receiving end of a digital communications system, the digital demodulator processes the channel-corrupted transmitted waveform and re- duces the waveforms to a sequence of numbers that represent estimates of the transmitted data symbols (binary or M-ary). This sequence of numbers is passed to the channel decader, which attempts to reconstruct the original information sequence from knowledge of the code used by the channel encoder and the redundancy contained in the received data. A measure of how well the demodulator and decoder perform is the frequency with which errors occur in the decoded sequence. More precisely. the average probability of a bit-error at the output of the decoder is a measure of the performance of the demodulator-decoder combination. In general, the probability of error is a function of the code characteristics, the types of waveforms used to transmit the information over the channel, the transmitter power, the characteristics of the channel, i.e., the amount of noise, the nature of the interference, etc., and the method of demodulation and decoding. These items and their effect on performance will be discussed in detail in subsequent chapters. As a final step, when an analog output is desired, the source decoder accepts the output sequence from the channel decoder and, from knowledge of the source encoding method used, attempts to reconstruct the original signal from the source. Due to channel decoding errors and possible distortion introduced by ihe source encoder and, perhaps, the source decoder, the signal at the output of the source decoder is an approximation to the original source output. The difference or some function of the difference between the original signal and the reconstructed signal is a measure of the distortion introduced by the digital communication system. 1-2 COMMUNICATION CHANNELS AND THEIR CHARACTERISTICS As indicated in the preceding discussion, the communication channel provides the connection between the transmitter and the receiver. The physical channel 4 = picrtal. COMMUNICATIONS may be a pair of wires that carry the electrical signal, or an optical fiber that carries the information on a modulated light beam, or an underwater ocean channel in which the information is transmitted acoustically, or free space over which the information-bearing signal is radiated by use of an antenna. Other media that can be characterized as communication channels are data storage media, such as magnetic tape, magnetic disks, and optical disks. One commen problem in signal transmission through any channel is additive noise. In general, additive noise is generated internally by components such as resistors and solid-state devices used to implement the communication system. This is sometimes called thermal noise. Other sources of noise and interference may arise externally to the system, such as interference from other users of the channel. When such noise and interference occupy the same frequency band as the desired signal, its effect can be minimized by proper design of the transmitted signal and its demodulator at the receiver. Other types of signal degradations that may be encountered in transmission over the channel are signal attenuation, amplitude and phase distortion, and multipath distortion. The effects of noise may be minimized by increasing the power in the transmitted signal. However, equipment and other practical constraints limit the power level in the transmitted signal. Another basic limitation is the available channel bandwidth. A bandwidth constraint is usually due to the physical limitations of the medium and the electronic components used to implement the transmitter and the receiver. These two limitations result in constraining the amount of data that can be transmitted reliably over any communications channel as we shail observe in later chapters. Below, we describe some of the important characteristics of several communication channels. Wireline Channels The telephone network makes extensive use of wire lines for voice signal transmission, as weli as data and video transmission. Twisted-pair wire lines and coaxial cable are basically guided electromagnetic channels that provide relatively modest bandwidths. Telephone wire generally used to connect a customer to a central office has a bandwidth of several hundred kilohertz (kHz). On the other hand, coaxial cable has a usable bandwidth of several megahertz (MHz). Figure 1-2-1 illustrates the frequency range of guided electromagnetic channels, which include waveguides and optical fibers. Signals transmitted through such channels are distored in both amplitude and phase and further corrupted by additive noise. Twisted-pair wireline channels are also prone to crosstalk interference from physically adjacent channels. Because wireline channels carry a large percentage of our daily communications around the country and the world, much research has been performed on the characterization of their transmission properties and on methods for mitigating the amplitude and phase distortion encountered in signal transmission. In Chapter 9, we describe methods for designing optimum transmitted signals and their demodulation; in Chapters 10 and 11, we FIGURE 1-21 CHAPTER 1: INTRODUCTION § [ete | 108 Ha [Vinible tight 10%m 10! He 100 ma 100 GHz 10 GHz Oem {Giz 100 MHz Frequency Wavelength om 10 MHz 100m tkm 100 kHz 10km JOkKHz 100 km kHz Frequency range for guided wire channel. consider the design of channel equalizers that compensate for amplitude and phase distortion on these channels. Fiber Optic Channels Optical fibers offer the communications system designer a channel bandwidth that is several orders of magnitude larger than coaxial cable channels. During the past decade, optical fiber cables have been developed that have a relatively tow signal attenuation, and highly reliable photonic devices have been developed for signal generation and signal detection. These technofogical advances have resulted in a rapid deployment of optical fiber channels, both in domestic telecommunication systems as well as for trans-Atlantic and trans-Pacific communications. With the large bandwidth 6 DIGITAL COMMUNICATIONS available on fiber optic channels, it is possible for telephone companies to offer subscribers a wide array of telecommunication services, including voice, data, facsimile, and video. The transmitter or modulator in a fiber optic communication system is a light source, either a light-emitting diode (LED) or a laser. Information is transmitted by varying (modulating) the intensity of the light source with the message signal. The light propagates through the fiber as a light wave and is amplified periodically (in the case of digital transmission, it is detected and regenerated by repeaters) along the transmission path to compensate for signal attenuation. At the receiver, the light intensity is detected by a photodiode, whose output is an electrical signal that varies in direct proportion to the power of the light impinging on the photodiode. Soutces of noise in fiber optic channels are photodiodes and electronic amplifiers. It is envisioned that optical fiber channels will replace nearly all wireline channels in the telephone network by the turn of the century. Wireless Electromagnetic Channels In wireless communication systems, electromagnetic energy is coupled to the propagation medium by an antenna which serves as the radiator. The physical size and the configuration of the antenna depend primarily on the frequency of operation. To obtain efficient radiation of electromagnetic energy, the antenna must be longer than 7 of the wavelength. Consequently, a radio station transmitting in the AM frequency band, say at f.= 1 MHz (corresponding to a wavelength of A =c/f. = 300m), requires an antenna of at least 30m. Other important characteristics and attributes of antennas for wireless transmission are described in Chapter 5. Figure 1-2-2 illustrates the various frequency bands of the electromagnetic spectrum. The mode of propagation of electromagnetic waves in the atmo- sphere and in free space may be subdivided into three categories, namely, ground-wave propagation, sky-wave propagation, and line-of-sight (LOS) Propagation. In the VLF and audio frequency bands, where the wavelengths exceed 10km, the earth and the ionosphere act as a waveguide for electromag- netic wave propagation. In these frequency ranges, communication signals Practically propagate around the globe. For this reason, these frequency bands are primarily used to provide navigational aids from shore to ships around the world.. The channel bandwidths available in these frequency bands are relatively small (usually 1-10% of the center frequency), and hence the information that is transmitted through these channels is of relatively slow speed and generally confined to digital transmission. A dominant type of noise at these frequencies is generated from thunderstorm activity around the globe, especially in tropical regions. Interference results from the many users of these frequency bands. Ground-waye propagation, as illustrated in Fig. 1-2-3, is the dominant mode of propagation for frequencies in the MF band (0.3-3MHz). This is the frequency band used for AM broadcasting and maritime radio broadcasting. In AM broadcasting, the range with groundwave propagation of even the more 1ocm ‘Ultra high frequency Wavelength 10km 100 km FIGURE 1-2-2 FIGURE 1-23 Tem Satellite to satellite “tase mene (SHF) Earth satellite Radar Very high frequency VHF TV and FM Broadcast 100 Miz ‘adio (WHE) Mobile radio. 10m CHAPTER 1: INTRODUCTION Frequency band Use Unraviolet 10'S Hz 10 Hz Millimeter waves 100 GHz (EHF) Experimental Navigation Mobile radio 1GHz ul (HP) UHF TV and mobile radio Mobile, aeronautical Shortwave ; Business High frequency Amateur radio 1OMHz Frequency ar) International radio 100m - Citizen's band Medium frequency me tkm Longwave radio 100 kHz 10 kHz 1 kHz Frequency range for wireless electromagnetic channels. (Adapted from Carlson (1975), 2nd edition, © McGraw-Hill Book Company Co. Reprinted with permission of the publisher.} Hlustration of ground-wave propagation. 7 FIGURE 1-2-4 8 DIGITAL COMMUNICATIONS Allustration of sky-wave propagation. powerful radio stations is limited to about 150km. Atmospheric noise, man-made noise, and thermal noise from electronic components at the receiver are dominant disturbances for signal transmission in the MF band. Sky-wave propagation, as illustrated in Fig. 1-2-4 results from transmitted signals being reflected (bent or refracted) from the ionosphere, which consists of several layers of charged particles ranging in altitude from 50 to 400km above the surface of the earth. During the daytime hours, the heating of the lower atmosphere by the sun causes the formation of the lower layers at altitudes below 120 km. These lower layers, especially the D-layer, serve to absorb frequencies below 2 MHz, thus severely limiting sky-wave propagation of AM radio broadcast. However, during the night-time hours, the electron density in the lower layers of the ionosphere drops sharply and the frequency absorption that occurs during the daytime is significantly reduced. As a consequence, powerful AM radio broadcast stations can propagate over large distances via sky wave over the F-layer of the ionosphere, which ranges from 140 to 400km above the surface of the earth. A frequently occurring problem with electromagnetic wave propagation via sky wave in the HF frequency range is signal multipath. Signal multipath occurs when the transmitted signal arrives at the receiver via multiple propagation paths at different delays. It generally results in intersymbol interference in a digital communication system. Moreover, the signal components arriving via different propagation paths may add destructively, resulting in a phenomenon called signal fading, which most people have experienced when listening to a distant radio station at night when sky wave is the dominant propagation mode. Additive noise at HF is a combination of atmospheric noise and thermal norse. Sky-wave ionospheric propagation ceases to exist at frequencies above approximately 30 MHz, which is the end of the HF band. However, it is possible to have ionospheric scatter propagation at frequencies in the range 30-60 MHz, resulting from signal scattering from the lower ionosphere. It is also possible to communicate over distances of several hundred miles by use of tropospheric scattering at frequencies in the range 40-300 MHz. Troposcatter results from signal scattering due to particles in the atmosphere at altitudes of 10 miles or less. Generally, ionospheric scatter and tropospheric scatter CHAPTER |: INTRODUCTION 9 involve large signal propagation losses and require a large amount of transmitter power and relatively large antennas. Frequencies above 30 MHz propagate through the ionosphere with rela- tively little loss and make satcllite and extraterrestrial communications possible. Hence, at frequencies in the WHF band and higher, the dominant mode of electromagnetic propagation is line-of-sight (LOS) propagation. For terrestrial communication systems, this means that the transmitter and receiver antennas must be in direct LOS with relatively little or no obstruction. For this reason, television stations transmitting in the WHF and UHF frequency bands mount their antennas on high towers to achieve a broad coverage area. In general, the coverage area for LOS propagation is limited by the curvature of the earth. If the transmitting antenna is mounted at a height hm above the surface of the earth, the distance to the radio horizon, assuming no physical obstructions such as mountains, is approximately d = V15h km. For example, a TV antenna mounted on a tower of 300m in height provides a coverage of approximately 67 km. As another example, microwave radio relay systems used extensively for telephone and video transmission at frequencies above 1 GHz have antennas mounted on tall towers or on the top of tall buildings. The dominant noise limiting the performance of a communication system in VHF and UHF frequency ranges is thermal noise generated in the receiver front end and cosmic noise picked up by the antenna. At frequencies in the SHF band above 10 GHz, atmospheric conditions piay a major role in signal propagation. For example, at 10GHz, the attenuation ranges from about 0.003 dB/km in light rain to about 0.3 dB/km in heavy rain. At 100 GHz, the attenuation ranges from about 0.1 dB/km in light rain to about 6dB/km in heavy rain. Hence, in this frequency range, heavy rain introduces extremely high propagation losses that can result in service outages (total breakdown in the communication system). At frequencies above the EHF (extremely high frequency) band, we have the infrared and visible light regions of the electromagnetic spectrum, which can be used to provide LOS optical communication in free space. To date, these frequency bands have been used in experimental communication systems, such as satellite-to-satellite links. Underwater Acoustic Channels Over the past few decades. ocean ex- ploration activity has been steadily increasing. Coupled with this increase is the need to transmit data, collected by sensors placed under water, to the surface of the ocean. From there, it is possible to relay the data via a satellite to a data collection center. Electromagnetic waves do not propagate over long distances under water except at extremely low frequencies. However, the transmission of signals at such low frequencies is prohibitively expensive because of the large and powerful transmitters required. The attenuation of electromagnetic waves in water can be expressed in terms of the skin depth, which is the distance a signal is attenuated by 1/e. For sea water, the skin depth 6 =250/Vf, where f is 10 picitrat. COMMUNICATIONS, expressed in Hz and 4 is in m. For example, at 10 kHz. the skin depth is 2.5m. In contrast. acoustic signals propagate over distances of tens and even hundreds of kilometers. An underwater acoustic channel is characterized as a multipath channel due to signal reflections from the surface and the bottom of the sea. Because of wave motion, the signal multipath components undergo time-varying propaga- tion delays that result in signal fading. In addition, there is frequency- dependent attenuation, which is approximately proportional to the square of the signal frequciacy. The sound velocity is nominally about 1500-m/s, but the actual value will vary either above or below the nominal value depending on the depth at which the signal propagates. Ambient ocean acoustic noise is caused by shrimp, fish, and various mammals. Near harbors, there is also man-made acoustic noise in addition to the ambient noise. In spite of this hostile environment, it is possible to design and implement efficient and highly reliable underwater acoustic communica- tion systems for transmitting digital signals over large distances. Storage Channels Information storage and retrieval systems constitute a very significant part of data-handling activities on a daily basis. Magnetic tape, including digital audio tape and video tape, magnetic disks used for storing large amounts of computer data, optical disks used for computer data storage, and compact disks are examples of data storage systems that can be characterized as communication channels. The process of storing data on a magnetic tape or a magnetic or optical disk is equivalent to transmitting a signal over a telephone or a radio channel. The readback process and the signal processing involved in storage systems to recover the stored information are equivalent to the functions performed by a receiver in a telephone or radio communication system to recover the transmitted information. Additive noise generated by the electronic components and interference from adjacent tracks is generally present in the readback signal of a storage system, just as is the case in a telephone or a radio communication system. The amount of data that can be stored is generally limited by the size of the disk or tape and the density (number of bits stored per square inch) that can be achieved by fhe write/read electronic systems and heads. For example, a packing density of 10” bits per square inch has been recently demonstrated in an experimental magnetic disk storage system. (Current commercial magnetic storage products achieve a much lower density.) The speed at which data can be written on a disk or tape and the speed at which it can be read back are also limited by the associated mechanical and electrical subsystems that constitute an information storage system. Channel coding and modulation are essential components of a well-designed digital magnetic or optical storage system. In the readback process, the signal is demodulated and the added redundancy introduced by the channel encoder is used to correct errors in the readback signal. CHAPTER I: INTRODUCTION IE 1-3, MATHEMATICAL MODELS FOR COMMUNICATION CHANNELS FIGURE 1-3-1 In the design of communication systems for transmitting information through physical channels, we find it convenient to construct mathematical models that reflect the most important characteristics of the transmission medium. Then, the mathematical model for the channel is used in the design of the channel encoder and modulator at the transmitter and the demodulator and channel decoder at the receiver. Below, we provide a brief description of the channel models that are frequently used to characterize many of the physical channels that we encounter in practice. The Additive Noise Channel The simplest mathematical model for a communication channel is the additive noise channel, illustrated in Fig. 1-3-1. In this model, the transmitted signal s(s) is corrupted by an additive random noise process n(t). Physically, the additive noise process may arise from electronic components and amplifiers at the receiver of the communication system, or from interference encountered in transmission (as in the case of radio signal transmission). If the noise is introduced primarily by electronic components and amplifiers at the receiver, it may be characterized as thermal noise. This type of noise is characterized statistically as a gaussian noise process. Hence, the resulting mathematical model for the channel is usually called the additive gaussian noise channel, Because this channel model applies to a broad class of physical communication channels and because of its mathematical tractability, this is the predominant channe} mode! used in our communication system analysis and design. Channel attenuation is easily incorporated into the model. When the signal undergoes attenuation in transmission through the channel, the teceived signal is r(t) = as(t) + n(t) (1-3-1), where a is the attenuation factor. The Linear Filter Channel In some physical channels, such as wireline telephone channels, filters are used to ensure that the transmitted signals do not exceed specified bandwidth limitations and thus do not interfere with one another. Such channels are generally characterized mathematically as linear filter channels with additive noise, as illustrated in Fig. 1-3-2. Hence, if the rit) = sr + mit) The additive noise channel. FIGURE 1-3-2 FIGURE 1-33 12 DIGITAL COMMUNICATIONS rity = st) wet) ene) ‘The linear fiher channet with additive noise. channel input is the signal s(t), the channel output is the signal r(t) = s(.) & c(t) + a(t) =[{ c(nse—n deta (1-322) where c(t) is the impulse response of the linear filter and * denotes convolution. The Linear Time-Variant Filter Channel Physical channels such as under- water acoustic channels and ionospheric radio channels that result in time- variant multipath propagation of the transmitted signal may be characterized mathematically as time-variant linear filters. Such linear filters are charac- terized by a time-variant channel impulse response c(t; 1), where c(z;7) is the response of the channel at time f due to an impulse applied at time t — t. Thus, T represents the “age” (elapsed-time) variable. The linear time-variant filter channel with additive noise is illustrated in Fig. 1-3-3. For an input signal s(r), the channel output signal is rt) = (1) w(t.) + A) = f e(ti thst - t)dt+n(e} (1-3-3) A good model for multipath signal propagation through physical channels, such as the ionosphere (at frequencies below 30 MHz) and mobile cellular radio channels, is a special case of (1-3-3) in which the time-variant impulse response has the form i c(t) = > ag()8(t - %) (1-3-4) eel nn Linear time-variant filter channel with additive noise. CHAPTER |: INTRODUCTION 13 where the {a,(t)} represents the possibly time-variant attenuation factor for the L multipath propagation paths and {1,} are the corresponding time delays. If (1-3-4) is substituted into (1-3-3), the received signal has the form L r(t)= z ay (t)s(t — %]) +n(e) (1-3-5) Hence, the received signal consists of L multipath components, where each component is attenuated by {a,(¢)} and delayed by {t,}. The three mathematical models described above adequately characterize the great majority of the physical channels encountered in practice. These three channel models are used in this text for the analysis and design of communica- tion systems. 1-4 A HISTORICAL PERSPECTIVE IN THE DEVELOPMENT OF DIGIFAL COMMUNICATIONS Tt is remarkable that the earliest form of electrical communication, namely telegraphy, was a digital communication system. The electric telegraph was developed by Samuel Morse and was demonstrated in 1837. Morse devised the variable-length binary code in which letters of the English alphabet are represented by a sequence of dots and dashes (code words). In this code, more frequently occurring letters are represented by short code words, while letters occurring less frequently are represented by longer code words. Thus, the Morse code‘was the precursor of the variable-length source coding methods described in Chapter 3. Nearly 40 years later, in 1875, Emile Baudot devised a code for telegraphy in which every letter was encoded into fixed-length binary code words of length 5. In the Baudot code, binary code elements are of equal length and designated as mark and space. Although Morse is responsible for the development of the first electrical digital communication system (telegraphy), the beginnings of what we now regard as modem digital communications stem from the work of Nyquist (1924), who investigated the problem of determining the maximum signaling rate that can be used over a telegraph channel of a given bandwidth without intersymbol interference. He formulated a model of a telegraph system in which a transmitted signal has the general form s(t) =D a,g(- nT) (1-4-1) where’ g(t) represents a basic pulse shape and {a,} is the binary data sequence of {+1} transmitted at a rate of 1/T bits/s. Nyquist set out to determine the optimum pulse shape that was bandlimited to W Hz and maximized the bit rate under the constraint that the pulse caused no intersymbol interference at the 14 picitaL COMMUNICATIONS sampling time k/7. k =0, +1, £2,.... His studies led him to conclude that the maximum pulse rate is 2W pulses/s, This rate is now called the Nyquist rate. Moreover, this pulse rate can be achieved by using the pulses g(t) = {sin 24W1)/2xWi. This pulse shape allows recovery of the data without intersymbol interference at the sampling instants. Nyquist’s result is equivalent to a version of the sampling theorem for bandlimited signals, which was later stated precisely by Shannon (1948). The sampling theorem states that a signal of bandwidth W can be reconstructed irom samples taken at the Nyquist rate of 2W sampies/s using the interpolation formula _ At) sin (22W(¢ ~ nv /2W)) = S559) 2nW(t—n/2W) In light of Nyquist's work, Hartley (1928) considered the issue of the amount of data that can be transmitted reliably over a bandlimited channel when multiple amplitude levels are used. Due to the presence of noise and other interference, Hartley postulated that the receiver can reliably estimate the received signal amplitude to some accuracy, say Az. This investigation led Hartley to conclude that there is a maximum data rate that can be communicated reliably over 2 bandlimited channe] when the maximum signal amplitude is limited to Aj, (fixed power constraint) and the amplitude resolution is A5. Another significant advance in the development of communications was the work of Wiener (1942), who considered the problem of estimating a desired signal waveform s(f) in the presence of additive noise n(t), based on observation of the received signal r(¢)=s(r)+n(t). This problem arises in signal demodulation. Wiener determined the linear filter whose output is the best mean-square approximation to the desired signal s(). The resulting filter is called the optimum tinear (Wiener) filter. Hartley's and Nyquist’s results on the maximum transmission rate of digital information were precursors to the work of Shannon (1948a,b), who establ- ished the mathematical foundations for information transmission and derived the fundamental limits for digital communication systems. In his pioneering work, Shannon formulated the basic problem of reliable transmission of information in statistical terms, using probabilistic models for information sources and communication channels. Based on such a statistical formulation, he adopted a logarithmic measure for the information content of a source. He also demonstrated that the effect of a transmitter power constraint, a bandwidth constraint, and additive noise can be associated with the channel and incorporated into a single parameter, called the channel capacity, For example, in the case of an additive white (spectrally fiat) gaussian noise interference, an ideal bandlimited channel of bandwidth W has a capacity C given by (1-4-2) C= Whog, {t + wa) bits/s (1-4-3) CHariER b NtRopL CON 1S where P is the average transmitted power and N, is the power spectral density of the additive noise. The significance of the channel capacity is as follows: If the information rate R from the source is less than C (RC,_ reliable transmission is not possible regardless of the amount of signal processing performed at the transmitter and receiver. Thus, Shannon established basic limits on communication of information, and gave birth to a new field that is now called information theory. Another important contribution to the field of digitat communication is the “work of Kotelnikov (1947), who provided a coherent analysis of the various digital communication systems based on a geometrical approach. Kotelnikov's approach was later expanded by Wozencraft and Jacobs (1965). Following Shannon's publications, came the classic work of Hamming (1950) on error-detecting and error-correcting codes to combat the detrimental effects of channel noise. Hamming’s work stimulated many researchers in the years that followed, and a variety of new and powerful codes were discovered, many of which are used today in the implementation of modern communica- tion systems. The increase in demand for data transmission during the last three to four decades, coupled with the development of more sophisticated integrated circuits, has led to the development of very efficient and more reliable digital communication systems. In the course of these developments, Shannon's original results and the generalization of his results on maximum transmission limits over a channel and on bounds on the performance achieved have served as benchmarks for any given communication system design. The theoretical limits derived by Shannon and other researchers that contributed to the development of information theory serve as an ultimate goal in the continuing efforts to design and develop more efficient digital communication systems. There have been many new advances in the area of digital communications following the early work of Shannon, Kotelnikov, and Hamming. Some of the most notable developments are the following: * The development of new block codes by Muller (1954), Reed (1954), Reed and Solomon (1960), Bose and Ray-Chaudhuri (1960a,b), and Goppa (1970, 1971). * The development of concatenated codes by Forney (1966). * The development of computationally efficient decoding of BCH codes, e.g., the Berlekamp—Massey algorithm (see Chien, 1964; Berlekamp, 1968). * The development of convolutional codes and decoding algorithms by Wozencraft and Reiffen (1961), Fano (1963), Zigangirov (1966), Jelinek {1969), Forney (1970, 1972), and Viterbi (1967, 1971). * The development of trellis-coded modulation by Ungerboeck (982), Forney et al. (1984), Wei (1987), and others. * The development of efficient source encodings algorithms for data 16 pictTaL CoMMUnICATIONS compression, such as those devised by Ziv and Lempel (1977, 1978) and Linde et al. (1980). 1-5 OVERVIEW OF THE BOOK Chapter 2 presents a brief review of the basic notions in the theory of probability and random processes. Our primary objectives in this chapter are to present results that are used throughout the book and to establish some necessary notation. In Chapter 3, we provide an introduction to source coding for discrete and analog sources, Included in this chapter are the Huffman coding algorithm and the Lempel-Ziv algorithm for discrete sources, and scalar and vector quantiza- tion techniques for analog sources. Chapter 4 treats the characterization of communicatign signals and systems from a mathematical viewpoint. Included in this chapter is a geometric representation of signal waveforms used for digital communications. Chapters 5-8 are focused on modulation/demodulation and channel coding/decoding for the additive, white gaussian noise channel. The emphasis is on optimum demodulation and decoding techniques and their performance. The design of efficient modulators and demodulators for linear filter channels with distortion is treated in Chapters 9-11. The focus is on signal design and on channel equalization methods to compensate for the channel distortion. The final four chapters treat several more specialized topics. Chapter 12 treats multichannel and multicarrier communication systems. Chapter 13 is focused on spread spectrum signals for digital communications and their performance characteristics. Chapter 14 provides a in-depth treatment of communication through fading multipath channels. Included in this treatment is a description of channel characterization, signal design and demodulation techniques and their performance, and coding/decoding techniques and their performance. The last chapter of the book is focused on multiuser communica- tion systems and multiple access methods. 1-6 BIBLIOGRAPHICAL NOTES AND REFERENCES There are several historical treatments regarding the development of radio and telecommunications during the past century. These may be found in the books by McMahon (1984), Millman (1984), and Ryder and Fink (1984}. We have already cited the classical works of Nyquist (1924), Hartley (1928), KoteInikov (1947), Shannon (1948), and Hamming (1950), as well as some of the more important advances that have occurred in the field since 1950. The collected papers by Shannon have been published by IEEE Press in a book edited by Sloane and Wyner (1993). Other collected works published by the IEEE Press that might be of interest to the reader are Key Papers in the Development of Coding Theory, edited by Berlekamp (1974), and Key Papers in the Development of Information Theory, edited by Slepian (1974). 2 PROBABILITY AND STOCHASTIC PROCESSES The theory of probability and stochastic processes is an essential mathematical tool in the design of digital communication systems. This subject is important in the statistical modeling of sources that generate the information, in the digitization of the source output, in the characterization of the channel through which the digital information is transmitted, in the design of the receiver that processes the information-bearing signal from the channel, and in the evaluation of the performance of the communication system. Our coverage of this rich and interesting subject is brief and limited in scope. We present a number of definitions and basic concepts in the theory of probability and stochastic processes and we derive several results that are important in the design of efficient digital! communication systems and in the evaluation of their performance. We anticipate that most readers have had some prior exposure to the theory of probability and stochastic processes, so that our treatment serves primarily as a review. Some readers, however, who have had no previous exposure may find the presentation in this chapter extremely brief. These readers will benefit from additional reading of engineering-level treatments of the subject found in the texts by Davenport and Root (1958), Davenport (1970), Papoulis (1984), Helstrom (1991), and Leon-Garcia (1994). 2-1 PROBABILITY Let us consider an experiment, such as the rolling of a die, with a number of possible outcomes. The sample space S of the experiment consists of the set of all possible outcomes. In the case of the die, S={l, 2, 3, 4, 5, 6} (2-1-1) 17 18 picirat communications where the integers 1,...,6 represent the number of dots on the six faces of the die. These six possible outcomes are the sample points of the experiment. An event is a subset of §, and may consist of any number of sample points. For example, the event A defined as A= {2,4} (2-1-2) consists of the outcomes 2 and 4. The compiement of the event A, denoted by A, consists of all the sample points in § that are not in A and, hence, A={1,3, 5, 6} (2-1-3) Two events are said to be mutally exclusive if they have no sample points in common—that is, if the occurrence of one event excludes the occurrence of the other. . or example, if A is defined as in (2-1-2) and the event B is defined as B={1,3, 6} (2-1-4) then A and B are mutually exclusive events. Similarly, A and A are mutually exclusive events. The union (sum) of two events is an event that consists of all the sample points in the two events. For example, if B is the event defined in (2-1-4) and C is the event defined as C={i,2, 3} (2-1-5) then, the union of B and C, denoted by B UC, is the event D=BUC ={1, 2,3, 6} (2-1-6) Similarly, AU.A =S, where S$ is the entire sample space or the certain event. On the other hand, the intersection of two events is an event that consists of the points that are. common to the two events. Thus, if E = B MC represents the intersection of the events B and C, defined by (2-1-4) and (2-1-5), respectively, then E={1, 3} When the events are mutually exclusive, the intersection is the null event, denoted as @. For example. ANB =, and ANA =. The definitions of union and intersection are extended to more than two events in a straightfor- ward manner. Associated with each event A contained in S is its probability P(A). In the assignment of probabilities to events, we adopt an axiomatic viewpoint. That CHAPTER 2: PROBABILITY AND STOCHASTiC PRocEsses 19 is, we postulate that the probability of the event A satisfies the condition P(A) 2 0. We also postulate that the probability of the sample space (certain event) is P(S)=1. The third postulate deals with the probability of mutually exclusive events. Suppose that A;,, i= 1,2,..., are a (possibly infinite) number of events in the sample space S such that ANA, =O i¥j7=1,2,... Then the probability of the union of these mutually exclusive events satisfies the condition PU A) = 2 P(A) (2-1-7) For example, in a roll of a fair die, each possible outcome is assigned the probability 3. The event A defined by (2-1-2) consists of two mutually exclusive subevents or outcomes, and, hence, P(A) = %=4. Also, the probability of the event AUB, where A and B are the mutually exclusive events defined by (2-1-2) and (2-1-4), respectively, is P(A) + P(B)=4+4=% Joint Events and Joint Probabilities Instead of dealing with a single experiment, let us perform two experiments and consider their outcomes. For example, the two experiments may be two separate tosses of a single die or a single toss of two dice. In either case, the sample space S consists of the 36 two-tuples (i,j) where i, j= 1,2,...,6. If the dice are fair, each point in the sample space is assigned the probability %. We may now consider joint events, such as {i is even, j= 3}, and determine the associated probabilities of such events from knowledge of the probabilities of the sampie points. In general, if one experiment has the possible outcomes A,, i= 1,2,....7, and the second experiment has the possible outcomes B,, j=1,2,...,m, then the combined experiment has the possible joint outcomes (A;, B,), i= 1,2,...,,7=1,2,...,m. Associated with each joint outcome (A;, B,) is the joint probability P(A,, B;) which satisfies the condition 0 P(A;, Bi) = P(A;) (2-1-8) j=l Similarly, if the outcomes A,, i=1,2,... >”, are mutually exclusive then > P(A,, B,) = P(B)) (2-1-9) i=l 20) diGITAL COMMUNICATIONS Furthermore, if all the outcomes of the two experiments are mutually exclusive then 3 2 The generalization of the above treatment to more than two experiments is straightforward. P(A, B)=1 (2-1-10) * Conditional Probabilities Consider a combined experiment in which a joint event occurs with probability P(A, B). Suppose that the event B has occurred and we wish to determine the probability of occurrence of the event A. This is calied the conditional probability of the event A given the occurrence of the event B and is defined as P(A, B) Pla! B)= Fe) (2-1-11) provided P(B)>0. In a similar manner, the probability of the event B conditioned on the occurrence of the event A is defined as P(A, BY P(B|A)= P(A) (2-1-12) provided P(A)>0. The relations in (2-I-11) and (2-1-12) may also be expressed as P(A, B)= P(A | B)P(B)= P(B | A)P(A) (2-1-13) The relations in (2-1-11), (2-1-12), and (2-1-13) also apply to a single experiment in which A and B are any two events defined on the sample space S and P(A, B) is interpreted as the probability of the 4 8. That is, P(A, B) denotes the simultaneous occurrence of A and B. For example, consider the events B and C given by (2-1-4) and (2-1-5), respectively, for the single toss of a die. The joint event consists of the sample points {1,3}. The conditional probability of the event C given that B occurred is =i 3 P(C|B)= onion In a single experiment, we observe that when two events A and B are mutually exclusive, AM B = @ and, hence, P(A | B) =0. Also, if A is a subset of B then AM B =A and, hence, P Pa|ay=FO CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 21 On the other hand, if B is a subset of A, we have AM B = B and, hence, PB Pia|B)= Fel An extremely useful relationship for conditional probabilities is Bayes’ theorem, which states that if A,, i=1,2,...,#, are mutually exclusive events such that and B is an arbitrary event with nonzero probability then P(A, B) P(B) _—P(B | A)P(A,) aa) 3 Pe lappa) P(A, | B)= We use this formula in Chapter 5 to derive the structure of the optimum receiver for a digital communication system in which the events A,, i = 1,2,...,”, represent the possible transmitted messages in a given time interval, P(A,) represent their a priori probabilities, 8 represents the received signal, which consists of the transmitted message (one of the A,) corrupted by noise, and P(A, | B) is the a posteriori probability of A, conditioned on having observed the received signal B. Statistical Independence The statistical independence of two or more events is another important concept in probability theory. It usually arises when we consider two or more experiments or repeated trials of a single experiment. To explain this concept, we consider two events A and B and their conditional probability P(A | B), which is the probability of occurrence of A given that B has occurred. Suppose that the occurrence of A does not depend on the occurrence of B. That is, P(A | B) = P(A) (2-1-15) Substitution of (2-1-15) into (2-1-13) yields the result P(A, B) = P(A)P(B) (2-1-16) That is, the joint probability of the events A and B factors into the product of 220 iGtAL COMMUNICATIONS the elementary or marginal probabilities P(A) and P(B). When the events A and B satisfy the relation in (2-1-16), they are said to be statistically independent. For example, consider two successive experiments in tossing a die. Let A represent the even-numbered sample points {2,4,6} in the first toss and B represent the even-numbered possible outcomes {2, 4,6} in the second toss. In a fair die, we assign the probabilities P(A) = $ and P(B)= 5. Now, the joint probability of the joint event ‘even-numbered outcome on the first toss and even-numbered outcome on the second toss” is just the probability of the nine pairs of outcomes (i,j), i= 2, 4,6. 7 = 2, 4,6, which is 4. Also, P(A, B) = P(A)P(B)= 4 Thus, the events A and B are statistically independent. Similarly, we may say that the outcomes of the two experiments are statistically independent. The definition of statistical independence can be extended to three or more events. Three statistically independent events A,, Ar, and A, must satisfy the following conditions: P(A,, Ax) = P(A,)P(A2) P(Ay, Ax) = P(A\)P(A3) (2-1-17) P(A, Ax) = P(Az)P(Aa) P(A;, Az, As) = P(A,)P(A2)P(Aa) In the general case, the events A,, i= 1,2,...,n, are statistically independent provided that the probabilities of the joint events taken 2,3,4,..., and nata time factor into the product of the probabilities of the individual events. 2-1-1 Random Variables, Probability Distributions, and Probability Densities Given an experiment having a sample space S and elements s « S$, we define a function X(s) whose domain is S and whose range is a set of numbers on the real line. The function X(s) is called a random variable. For example, if we flip a coin the possible outcomes are head (H) and tail (T), so § eontains two points labeled H and T. Suppose we define a function X(s) such that 1 (s=H) “wn (21-18) x)= Thus we have mapped the two possible outcomes of the coin-flipping FIGURE 2-1-1 CHAPTER 2 PROBABILITY AND STOCHASTIC PROCESSES 23 experiment into the two points (+1) on the real line. Another experiment is the toss of a die with possible outcomes 5 = {1, 2, 3, 4, 5, 6}. A random variable defined on this sample space may be X(s)=s, in which case the outcomes of the experiment are mapped into the integers 1,...,6,‘or, perhaps, X(s)=5s°, in which case the possible outcomes are mapped into the integers {1,4. 9, 16, 25, 36}. These are examples of discrete random variables. Although we have used as examples experiments that have a finite set of possible outcomes, there are many physical systems (experiments) that generate continuous outputs (outcomes). For example, the noise voltage generated by an electronic amplifier has a continuous amplitude. Conse- quently, the sample space S of voltage amplitudes v € S is continuous and so is the mapping X(v) =v. In such a case, the random variablet X is said to be a continuous random variable. Given a random variable X, let us consider the event {X =0. When the random variable is discrete or of a mixed type, the pdf contains impulses at the points of discontinuity of F(x). In such cases, the discrete part of p(x) may be expressed as peny= 3 P(X =x) 5 —x) (2-1-2) is where x, /=1,2,...,”, are the possible discrete values of the random ‘An example of the cumulative distribution function of a random variable of a mixed type. CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 25, variable; P(X =x,), i=1,2,...,a, are the probabilities, and 6(x) denotes an impulse at x = 0. Often we are faced with the problem of determining the probability that a random variable X falls in an interval (x,, x), where x2 >x,. To determine the probability of this event, let us begin with the event {X OK) 2-1-31 F(x;) — F(x, — Ax2) ( ) Assuming that the pdfs p(x,, x2) and p(x2) are continuous functions over the interval (x2 — Ax2, x2), we may divide both numerator and denominator in (2-1-31) by 4x. and take the limit as Ax. 0. Thus we obtain OF (X1, X2)/dx. OF (x2)/dx2 — OP Fe Pur, tte) die du,]/dx2 O[f2. p(u2) dua]/axz _ St p(ur, X2) du, P(x2) P(X, Sx, | X_= x2) = Fx, |) = (2-1-32) which is the conditional cdf of the random variable X, given the random variable X2. We observe that F(~«|x,)=0 and F(~|x.)=1. By differentiating (2-1-32) with respect to x,, we obtain the corresponding pdf p(x; |x2) in the form P(t1, X2) P(%2) Altematively, we may express the joint pdf p(x,,x.) in terms of the conditional pdfs, p(x, | x2) or p(x2 { x1), as P(x, | x2)= (2-1-33) P(r, x2) = P(X | x2)p (er) = p(x? | x1)p (x) (2-1-34) The extension of the relations given above to multidimensional random variables is also easily accomplished. Beginning with the joint pdf of the random variables X;, i =1,2,...,, we may write P(X 15 X00 Sa) = (Lt, Kaye Me | Keay Me Pleats Xn) (2-1-35) where k is any integer in the range 1 + duty (2-1-36) PR kev Xn 28 DIGITAL COMMUNICATIONS This conditional cdf satisfies the properties previously established for these functions, such as F (2, X25 He Kee ts Mn) = Pty Xap te | Meee Mn) F(~®, 82,00 Xe | Meet ke) =O Statistically Independent Random Variables. We have already defined statistical independence of two or more events of a sample space S. The concept of statistical independence can be extended to random variables defined on a sample space generated by 2 combined experiment or by repeated trials of a single experiment. If the experiments result in mutually exclusive outcomes, the probability of an outcome in one experiment is independent of an outcome in any other experiment. That is, the joint probability of the outcomes factors into a product of the probabilities corresponding to each outcome. Consequently, the random variables corresponding to the outcomes in these experiments are independent in the sense that their joint pdf factors into a product of marginal pdfs. Hence the multidimensional random variables are statistically independent if and only if F(X), X25.) Xn) = FOC) F(x) ++ Fen) (2-1-37) or, alternatively, PR Bay He) = pl )p(%2) * * PEn) (2-4-38) 2-1-2 Functions of Random Variables A problem that arises frequently in practical applications of probability is the following. Given a random variable X, which is characterized by its pdf p(x), determine the pdf of the random variable Y = g(X), where g(X) is some given function of X. When the mapping g from X to Y is one-to-one, the determination of p(y) is relatively straightforward. However, when the mapping is not one-to-one, as is the case, for example, when Y = X?, we must be very careful in our derivation of p(y). Example 2-1-1 Consider the random variable Y defined as Y=aX+b (2-1-39) where a and b are constants, We assume that a > 0. If a <0, the approach is similar (see Problem 2-3). We note that this mapping, illustrated in Fig. 2-1-4(a) is linear and monotonic. Let Fy(x) and F;( y) denote the cdfs for X and Y, respectively.t Then Fy) = PLY Sy) = Plax +60 \ (a) Priv) FIGURE 2-1-4 A linear transformation of a random variable X and an example of the corresponding pdfs of and Y. By differentiating (2-1-40) with respect to y, we obtain the relationship between the respective pdfs. It is 1 -b Py(y) =i ex” a ) (2-1-41) Thus (2-1-40) and (2-1-41) specify the cdf and pdf of the random variable Y in terms of the cdf and pdf of the random variable X for the linear transformation in (2-1-39). To illustrate this mapping for a specific pdf Px(x), consider the one shown in Fig. 2-1-4(b). The pdf py(y) that results from the mapping in (2-1-39) is shown in Fig. 2-1-4(c). Example 2-1-2 Consider the random variable Y defined as Y=aX?+b, a>0 (2-1-42) As in Example 2-1-1, the mapping between X and Y is one-to-one. Hence Fy) = P(Y 0 (2-1-45) In contrast to Examples 2-{-1 and 2-1-2, the mapping between X and Y, illustrated in Fig. 2-1-5, is not one-to-one. To determine the cdf of Y, we observe that Fry) = P(Y Sy) = P(aX? +b by yp x2 = Dd by yj... =D Pn Jana pax 2 Yi X2 2 25 Yi x, 2 uy} idet Al (2-1-60) CHAPTER 2 PROBABILITY AND STOCHASTIC PROCESSES 33 2-1-3 Statistical Averages of Random Variables Averages play an important role in the characterization of the outcomes of experiments and the random variables defined on the sample space of the experiments. Of particular interest are the first and second moments of a single random variable and the joint moments, such as the correlation and covari- ance, between any pair of random variables in a multidimensional set of random variables. Also of great importance are the characteristic function for a single random variable and the joint characteristic function for a multidimen- sional set of random variables. This section is devoted to the definition of these important statistical averages. First we consider a single random variable X characterized by its pdf p(x). The mean or expected value of X is defined as E(x) =m, =| xp(e)de (2.1-61) where E( ) denotes expectation (statistical averaging). This is the first moment of the random variable X. In general, the nth moment is defined as E(X")= [ X"p(x) dx (2-1-62) Now, suppose that we define a random variable Y = g(X), where g(X) is some arbitrary function of the random variable X. The expected value of Y is E(Y)= Ele(X)]= | gta)pte) dr (21.83) In particular, if Y = (X — m,)" where m, is the mean value of X, then E(Y) = BUX ~m.)"I= [te mY p(x) dx (241-64) This expected value is called the nth central moment of the random variable X, because it is a moment taken relative to the mean. When n =2, the central moment is called the variance of the random variable and denoted as on. That is, oi= | = m,iptadr 21-45) This parameter provides a measure of the dispersion of the random variable X. By expanding the term (x — m,)’ in the integral of (2-1-65) and noting that the expected value of a constant is equal to the constant, we obtain the expression that relates the variance to the first and second moments, namely, of = E(X’) - [E(X)P = E(X?) -— m? (2-1-66) 340 Diarra, COMMUNICATIONS In the case of two random variables, X, and X2, with joint pdf p(x,, x2), we define the joint moment as © sodxg= [of tspty nan de (21-67) and the joint central moment as EO, mK ~ ms)" = [fers mtea—maypte, 2) desde (21-68) where m, = E(X;). Of particular importance to us are the joint moment and joint central moment corresponding to k=n=1. These joint moments are called the cofrelation and the covariance of the random variables X, and X2, respectively. In considering multidimensional random variables, we can define joint moments of any order. However, the moments that are most useful in practical applications are the correlations and covariances between pairs of random variables. To elaborate, suppose that X,, i= 1,2,...,m, are random variables with joint pdf p(x,,x2,..., Xn). Let p(x, x,;) be the joint pdf of the random variables X, and X;. Then the correlation between X; and X, is given by the joint moment 6G) = [fF xapta x) dd, 2-169) and the covariance of X; and X, is Hy = E[(X; — m,)(X; ~ m,)} = [fee mice mote, 3) aay = i I XX; (%;, Xj) dx, dx; — mum; = E(X,X)) - mim, (2-1-70) The n Xn matrix with elements u, is called the covariance matrix of the random variables X,, i=1,2,..., 7. We shall encounter the covariance matrix in our discussion of jointly gaussian random variables in Section 2-1-4. Two random variables are said to be uncorrelated if E(X,X)) = E(X,)E(X;) = mm. In that case, the covariance 4, = 0. We note that when X; and X; are statistically independent, they are aiso uncorrelated. However, if X; and X; are uncorrelated, they are not necessarily statistically independent. CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 35 Two random variables are said to be orthogonal if E(X;X;}=0. We note that this condition holds when X; and X; are uncorrelated and either one or both of the random variables have zero mean. Characteristic Functions The characteristic function of a random variable X is defined as the statistical average E(e)=u(n)= [pear (1-71) where the variable v is real and j = V—1. We note that ¢(jv) may be described as the Fourier transformt of the pdf p(x). Hence the inverse Fourier trans- form is poy=f wUvye"** du (2-1-72) One useful property of the characteristic function is its relation to the moments of the random variable. We note that the first derivative of (2-1-71) with respect to v yields dw) * He) xe p(x) de dy oe By evaluating the derivative at v = 0, we obtain the first moment (mean) Avr) dv E(X) =m, = (2-1-73) v=0 The differentiation process can be repeated, so that the nth derivative of p(jv) evaluated at v =0 yields the nth moment Bon = (yp HD de" (2-1-74) ve Thus the moments of a random variable can be determined from the characteristic function. On the other hand, suppose that the characteristic function can be expanded in a Taylor series about the point v = 0. That is, = faut) . oGo)= & (| (244-75) Using the relation in (2-1-74) to eliminate the derivative in (2-1-75), we obtain t Usually the Fourier transform of a function g(u) is defined as G(v) = [7. g(u)e“”™ du, which Giffers from (2-1-71) by the negative sign in the exponential. This is a trivial difference, however, so we call the integral in (2-1-71) a Fourier transform. 36 DIGHTAL COMMUNICATIONS an expression for the characteristic function in terms of its moments in the form = ivy" wy = ¥ Bue) EE (244.76) no ! The characteristic function provides a simple method for determining the pdf of a sum of statistically independent random variables. To illustrate this point, let ¥,, /=1,2,...,n, be a set of n statistically independent random variables and let y=Sx, (2-1-77) A The problem is to determine the pdf of Y. We shall determine the pdf of Y by first finding its characteristic function and then computing the inverse Fourier transform. Thus ’ bv) = Ee”) = t[on w S x)] =e[ Mee] mt -{ . a (I 2 pry Xa tn) day dey dx, (21-78) x ae Since the random variables are statistically independent, p(x), x2,..-, 9) = P(x;)p(%2)- ++ p(x,). and, hence, the nth-order integral in (2-1-78) reduces to a product of » single integrals, each corresponding to the characteristic function of one of the X,. Hence, by Gu) = T dy, (v) (2-1-79) If, in addition to their statistical independence, the , are identically distributed then all the y,(jv) are identical. Consequently, dy (iv) = [bx (ie)]" (2-1-80) Finally, the pdf of Y is determined from the inverse Fourier transform of yCjv), given by (2-1-72), : Since the characteristic function of the sum of n statistically independent random variables is equal to the product of the characteristic functions of the individual random variables X,, i =1,2,...,n, it follows that, in the transform domain, the pdf of Y is the n-fold convolution of the pdfs of the X,. Usually the n-fold convolution is more difficult to perform than the characteristic function method described above in determining the pdf of Y. When working with n-dimensional random variables, it is appropriate to define and n-dimensional Fourier transform of the joint pdf. In particular, if CHAPTER 2; PROBABILITY AND STOCHASTIC PROCESSES 37 X, i=1,2,...,m, are random variables with pdf p(x, x2,...,%,), the n-dimensional characteristic function is defined as WM, jPa- ++ sn) = Elexp (73 vx.) aml =| . f exp (j 3) vas)pteiste-rtn)dey dg +-de, 2-1-81) a i i= Of special interest is the two-dimensional characteristic function vgn) [fetta rd, dts 21-82) We observe that the partial derivatives of (jv, jv2) with respect to v, and v; can be used to generate the joint moments. For example, it is easy to show that FP piv, iva) EX) = ay 1 v2 (2-1-83) y= v2=0 Higher-order moments are generated in a straightforward manner. 2-1-4 Some Useful Probability Distributions In subsequent chapters, we shall encounter several different types of random variables. In this section we list these frequently encountered random variables, their pdfs, their cdfs, and their moments. We begin with the binomial distribution, which is the distribution of a discrete random variable, and then we present the distributions of several continuous random variables. Binomial Distribution Let X be a discrete random variable that has two possible values, say X =1 or X=0, with probabilities p and 1 —p, respectively. The pdf of X is shown in Fig. 2-1-6. Now, suppose that y=>%, where the X,, i=1,2,...,, are statistically independent and identically FIGURE 2-1-6 The probability distribution function of X. 0 1 * 38) DictTAL ComMuNICATIONS distributed random variables with the pdf shown in Fig. 2-1-6. What is the probability distribution function of Y? To answer this question, we observe that the range of Y is the set of integers from 0 to n. The probability that Y = 0 is simply the probability that all the X; = 0. Since the X, are statistically independent, PY =0)=(1 py" The probability that Y = 1 is simply the probability that one X, = 1 and the rest of the X, = 0. Since this event can occur in different ways, P(Y =1)=np(t~ py"! To generalize, the probability that ¥ = is the probability that k of the X, are equal to one and n — k are equal to zero, Since there are (") = (2-1-84 k) kia —kt 1-84) different combinations that result in the event {Y = k}, it follows that por =k)=(T)ps(- py (241-85) where 7) is the binomial coefficient. Consequently, the pdf of Y may be expressed as Py)= & PCY =k) Bly — k) a(n) ne => (Jena - py'* 809 ~ kp (24-86) The cdf of Y is F(y)= P(¥ (jpta-pr* (21-87) where [y] denotes the largest integer m such that m m,, the complementary error function is Proportional to the area under the tail of the gaussian pdf. For large values of x, the complementary error function erfc(x) may be approximated by the asymptotic series 1 (1:3. 1-3-5 (-set ae t) (21-96) where the approximation error is less than the last term used. The function that is frequently used for the area under the tail of the gaussian pdf is denoted by Q(x) and defined as 1 ee =— e 2 -1-97 Ox) zl ee? dy x20 (2-1-97) erfe (x) = XN By comparing (2-1-95) with (2-1-97), we find O(x) = Serfe (3) (2-1-98) CHAPIER 2 PROBABILITY AND STOCHASTIC Processes 4 The characteristic function of a gaussian random variable with mean m, and variance «is * 1 der? anim fem [eee tome = glen U2 (2-1-99) The central moments of a gavssian random variable are 1-3---(k-1)o* (evenk) = ‘J=u,= -1-100) E(X = m= a = toad) i100) and the ordinary moments may be expressed in terms of the central moments as k E(X)=> (Ponta. (2-1-101) 10 The sum of x statistically independent gaussian random variables is also a gaussian random variable. To demonstrate this point, let fn Y=> x, (2-1-102) where the X;, i=1,2,...,m, are statistically independent gaussian random variables with means m, and variances 7. Using the result in (2-1-79), we find that the characteristic function of Y is vr(i0) = FT wat) = I eivm vain aT = elm vei (2-1-103) where m= > m, “ (2-1-104) a= Do? it Therefore, Y is gaussian-distributed with mean m, and variance o. Chi-Square Distribution A chi-square-distributed random variable is re- lated to a gaussian-distributed random variable in the sense that the former can be viewed as a transformation of the latter. To be specific, let Y = X?, where XY is a gaussian random variable. Then Y has a chi-square distribution. We distinguish between two types of chi-square distributions. The first is called a 42 DIGIAL COMMUNICATIONS central chi-square distribution and is obtained when X has zero mean. The second is called a non-central chi-square distribution, and is obtained when X has a nonzero mean. First we consider the central chi-square distribution. Let X be gaussian- distributed with zero mean and variance o”. Since Y = X?, the result given in (2-1-47) applies directly with a = i and b = 0. Thus we obtain the pdf of Y in the form 1 Pry) Vino e'P" yO (2-1-105) The cdf of Y is Fy) I Py(u) du 0 Lod : “veh ae (2-1-106) f which cannot be expressed in closed form. The characteristic function, however, can be determined in closed form. It is 1 wy) = ~ 1-107 WY) = Gaia (2-1-107) Now, suppose that the random variable Y is defined as Y=> x? (2-1-108) ma where the X,, i=1,2,...,a, are statistically independent and identically distributed gaussian random variables with zero mean and variance o”. As a consequence of the statistical independence of the X,, the characteristic function of Y is oe 1 Wi) = Fah (2-1-109) The inverse transform of this characteristic function yields the pdf = 1 wi2~1,-¥i2o" PM) Soap ys en, yO (241-110) where [(p) is the gamma function. defined as T(p) -| Pole 'd, p>o 0 (2-1-111) I(p)=(p- Ut. pan ititeger, p > 0 TQ)=Vz, TG)=}va This pdf, which is a generalization of (2-1-105), is called a chi-square (or FIGURE 2-1-9 CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 43 The pdf of a chi-square-distributed random. variable for several degrees of freedom. gamma) pdf with n degrees of freedom. It is illustrated in Fig. 2-1-9. The case n= 2 yields the exponential distribution. The first two moments of Y are E(Y) = no? E(¥’) = Ino* + n?a* (2-1-112) a3 = 2no* The cdf of Y is = . 1 AID= iy ~ulda? Fy) [ rrr in)" e du, y20 (2-1-113) This integral can be easily manipulated into the form of the incomplete gamma function, which is tabulated by Pearson (1965). When a is even, the integral in (2-1-113) can be expressed in closed form. Specifically, let m= 4n, where m is an integer. Then, by repeated integration by parts, we obtain mt ‘ Foy) = 1-27 S 4(%5) , yO (2-1-114) Kao kK! Let us now consider a noncentral chi-square distribution, which results from squaring a gaussian random variable having a nonzero mean. If X is gaussian with mean m, and variance a’, the random variable Y = X? has the pdf Pry) = % en mti20 cosh (St, y20 (21-115) o 1 Via o which is obtained by applying the result in (2-1-47) to the gaussian pdf given by (2-1-92). The characteristic function corresponding to this pdf is vy Gu) = ac aes ermbutit ~ 72v0%) (2-1-116) aia 44 DIGtTAL COMMUNICATIONS To generalize these results, let Y be the sum of squares of gaussian random variables as defined by (2-1-108). The X,, i=1,2,...,n, are assumed to be Statistically independent with means m,, i= 1,2,...,n, and identical variances equal to o*. Then the characteristic function of Y, obtained from (2-1-116) by applying the relation in (2-t-79), is jv 3 mi dy Gv) = (2-1-117) (= pave? *P \T ave? This characteristic function can be inverse-Fourier-transformed to yield the pdf " ~2y4 1 ta yytet Pry) =>3(5 erry, (vys) y=0 (2-111) 207 \s' o where, by definition, s7= Dm? (2-1-119) fl and [,(x) is the ath-order modified Bessel function of the first kind, which may be represented by the infinite series = (4/2)214 1,(«) = > > “1 a(x) Zite+ho0" x20 (2-1-120) The pdf given by (2-1-118) is called the noncentral chi-square pdf with n degrees of freedom. The parameter s? is called the noncentrality parameter of the distribution. The cdf of the noncentral chi square with n degrees of freedom is ~L pyyerre . #)= ['55(4) ee 02° Iyaa( Va) de (2-1-121) There is no closed-form expression. for this integral. However, when m = in is an integer, the cdf can be expressed in terms of the generalized Marcum’s Q function, which is defined as Q,n(a, b) a dx tape mai b\k =O1(a, b) + err? > (?) I,{ab) (2-1-122) kar \@ where renew (a) O(a, b) =e 72 Y (3) i,(ab), b>a>0 (2-1-123) Ko \b If we change the variable of integration in (2-1-121) from u to x, where x?=ufe? CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 45 and let a? = s*/o?, then it is easily shown that Vi Fy)=1- on(£,~2) (2-1-124) o Finally, we state that the first two moments of a noncentral chi-square- distributed random variable are E(¥)= no? +s? E(¥?) = not + 4a?s? + (no? + 527 (2-4-125) o3 = 2no* + 4a? Rayleigh Distribution The Rayleigh distribution is frequently used to model the statistics of signals transmitted through radio channels such as cellular radio. This distribution is closely related to the central chi-square distribution. To illustrate this point, let Y =.X3+X3 where X, and X; are zero-mean statistically independent gaussian random variables, each having a variance a”. From the discussion above, it follows that Y is chi-square- distributed with two degrees of freedom. Hence, the pdf of Y is 1 a pry)= sae, y=0 (2-1-126) Now, suppose we define a new random variable R=VXi+ Xi= VY (2-1-127) Making a simple change of variable in the pdf of (2-1-126), we obtain the pdf of R in the form r>=0 (2-1-128) This is the pdf of a Rayleigh-distributed random variable. The corresponding cdf is Fair)= [ ee au =1-e7"?" pg (2-1-129) The moments of R are E(R*) = (207)°P(1 + 4k) (2-1-130) 46 ait TAL COMMUNICATIONS and the variance is o2=(2-4a)o? (2-1-131) The characteristic function of the Rayleigh-distributed, random variable is Wel jv) = I ere ar (2-4-132) 9 oT This integral may be expressed as Wr(jv) = f °°" cos ur dr + i [3 e072 sin ur dr * bo lo =A, $: — ve?) + jVinvore ""? (2-1-133) where ,F,{1, 3; —a) is the confluent hypergeometric function, which is defined as voy. we Dla + KE (B)x* Fla Bix)= Tah (B+ OK ko. B#0,-1,-2,... (2-1-134) Beaulieu (1990) has shown that , F,(1, $; ~@) may be expressed as A(1, 3; ~a) = -e7* (2-1-135) k0 (2k — 1)kI As a generalization of the above expression, consider the random variable R=,/3 x? (21-136) where the X;, i=1,2,...,, are statistically independent, identically distrib- uted zero mean gaussian random variables. The random variable R has a generalized Rayleigh distribution. Clearly, Y=? is chi-square-distributed with m degrees of freedom. Its pdf is given by (2-1-110). A simple change in variable in (2-1-110) yields the pdf of R in the form rt et PR(r)= 2 Dagny” rns 120 (2-1-137) As a consequence of the functional relationship between the central chi-square and the Rayleigh distributions, the corresponding cdfs are’ similar. Thus, for any n, the cdf of R can be put in the form of the incomplete gamma function. In the special case when n is even, i.e; m = 2m, the cdf of R can be expressed in the closed form m-i 1 r ik Fy(r) = 1-72" 5) ZS). r=0 (2-1-138) CHAPTER 2; PROBABILITY AND STOCHASTIC PRocEsses 47 Finally, we state that the kth moment of R'is TG@ +k) EUR) = 007? > k20 (2-1-1339) which holds for any integer 7. Rice Distribution Just as the Rayteigh distribution is related to the central chi-square distribution, the Rice distribution is related to the noncentral chi-square distribution. To illustrate this relation, let ¥Y = X}+ X3, where X, and X2 are statistically independent gaussian random variables with means m,, i=1,2, and common variance o”. From the previous discussion, we know that Y has a noncentral chi-square distribution with noncentrality parameter s? =m} + m3. The pdf of Y, obtained from (2-1-118) for n = 2, is 1 teyyze? Ss privi=zze OP"h(VIS), y20 (21-140) Now, we define a new random variable R = VY. The pdf of R, obtained from (2-1-140) by a simple change of variable, is Putas? rs Pair) =e a mani( 5), r>0 (2-1-141) This is the pdf of a Ricean-distributed random variable. As will be shown in Chapter 5, this pdf characterizes the statistics of the envelope of a signal corrupted by additive narrowband gaussian noise. It is also used to mode! the signal statistics of signals transmitted through some radio channels. The cdf of R is easily obtained by specializing the results in (2-1-124) to the case m = 1. This yields Fe(r) = 1 - a(:, ), r>0 (2-1-142) where Q,(a, b) is defined by (2-1-123). As a generalization of the expressions given above, let R be defined as in (2-1-136) where the X,, i=1,2,...,” are statistically independent gaussian random variables with means m,, i= 1,2,...,m, and identical variances equal to 0. The random variable R? = Y has a noncentral chi-square distribution with n degrees of freedom and noncentrality parameter s? given by (2-1-119). Its pdf is given by (2-1-118). Hence the pdf of R is Patt) = arma Ian sf 720 (2-1-143) 48 ictal COMMUNICATIONS and the corresponding cdf is Fy(r) = P(R Sr) = P(VY Sr) = PY 1, the tail of the pdf decays faster than that of the Rayleigh. Figure 2-1-10 illustrates the pdfs for different values of m. Multivariate Gaussian Distribution Of the many multivariate or multi- dimensional distributions that can be defined, the multivariate gaussian distribution is the most important and the one most likely to be encountered in practice. We shalt briefly introduce this distribution and state its basic properties. Let us assume that X;, i= 1,2,...,n, are gaussian random variables with means m,, i=1,2,...,n, variances o?, i=1,2,...,n, and covariances Bip i,j=1,2,...,n. Clearly, w,= 07, i=1,2,...,”. Let M denote the n Xn SO diGttat COMMUNICATIONS covariance matrix with elements {y,,}, let X denote the n x 1 column vector of random variables, and let m, denote the 1 X 1 column vector of mean values m, i=1,2,...,a. The joint pdf of the gaussian random variables X,, i=1,2,...,x, is defined as 1 P(t. X20 6 tn) = (my (der myer? [2 ~m,)'M ‘(x —m,)} (2-1-150) where M' denotes the inverse of M and x’ denotes the transpose of x. The characteristic function corresponding to this n-dimensional joint pdf is diy) = Ee”) where v is an n-dimensional vector with elements v,, i=1,2,...,”. Evaluation of this n-dimensional Fourier transform yields the result wiv) = exp (jm,v — tv’My) (2-1-151) An important special case of (2-1-150) is the bivariate or two-dimensional gaussian pdf. The mean m, and the covariance matrix M for this case are 2 m, = ("| M= | “ ae] (2-1-152) m2 Biz O% where the joint central moment y,2 is defined as Hi = E[(X, — m)(X2 — m2)] It is convenient to define a normalized covariance a Hy oo," py ixj (21-153) where p, satisfies the condition 0<{p,;|<1. When dealing with the two- dimensional case, it is customary to drop the subscripts on y,, and p,7. Hence the covariance matrix is expressed as 2 m=| a pores] 2-1-154 po, o} ¢ ) Its inverse is 1 oa —poyo. Mole [ 2 1 "| 21- aol pAl-poer of 1155) CHAPTER 2, PROBABILITY AND STOCHASTIC PROCESSES S¥ and det M = o703(1 — p?). Substitution for M~' into (2-1-150) yields the desired bivariate gaussian pdf in the form P(X, x2) = 2 2 2 2 _ O2(t, — MY — 2payaAx1 — Mi)2 — M2) + 712 = M2) | xer| 2otoX(l — 6°) (2-1-156) We note that when p =0, the joint pdf p(x,, x2) in (21-156) factors into the product p(x,)p(x2), where p(x;), i=1, 2, are the marginal pdfs. Since p is a measure of the correlation between X, and X>, we have shown that when the gaussian random variables X, and X, are uncorrelated, they are also statistically independent. This is an important property of gaussian random variables, which does not hold in general for other distributions. It extends to n-dimensional gaussian random variables in a straightforward manner. That is, if pj = 0 for i *j then the random variables X;, i= 1,2,...,n are uncorrelated and, hence, statistically independent. Now, let us consider a linear transformation of n gaussian random variables X;, i =1,2,...,n, with mean vector m, and covariance matrix M. Let Y=AX (2-1-157) where A is a nonsingular matrix. As shown previously, the jacobian of this transformation is J =1/det A. Since X=A7~'Y, we may substitute for X in (2-1-150) and, thus, we obtain the joint pdf of Y in the form 1 (@ay*(det My? aera RPI HA Ty m,)'M>'(A~'y ~ m,)] p(y) = 1 ~ Gay (der Qt xP [BY = MOY ~m,)] (2-1-158) where the vector m, and the matrix Q are defined as m= Am, (2-1-159) Q=AMA Thus we have shown that a linear transformation of a set of jointly gaussian random variabies results in another set of jointly gaussian random variables. Suppose that we wish to perform a linear transformation that results in 1 statistically independent gaussian random variables. How should the matrix A be selected? From our previous discussion, we know that the gaussian random 520 DIGITAL COMMUNICATIONS variables are statistically independent if they are pairwise-uncorrelated, i.e., if the covariance matrix Q is diagonal. Therefore, we must have AMA’ =D (2-1-160) where D is a diagonal matrix. The matrix M is a covariance matrix; hence, it is positive definite. One solution is to select A to be an orthogonal matrix (A’= A“) consisting of columns that are the eigenvectors of the covariance matrix M. Then D is a diagonal matrix with diagonal elements equal to the eigenvalues of M. Example 2-1-5 Consider the bivariate gaussian pdf with covariance matrix 13 wy] = ha Let us determine the transformation A that will result in uncorrelated random variables. First, we solve for the eigenvalues of M. The characteris- tic equation is det (M — AI) = 0 (i-ay-t=0 Anat Next we determine the two eigenvectors. If a denotes an eigenvector, we have (M~ADa=0 With A, = 3 and A, =}, we obtain the eigenvectors oh ei a-vil! ‘] 1-1 Therefore, It is easily verified that A~’ = A’ and that AMA'=D where the diagonal elements of D are 3 and }. CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 53, 2-1-5 Upper Bounds on the Tail Probability In evaluating the performance of a digital communication system, it is often necessary to determine the area under the tail of the pdf. We refer to this area as the tail probability. In this section, we present two upper bounds on the tail probability. The first, obtained from the Chebyshev inequality, is rather loose. The second, called the Chernoff bound, is much tighter. Chebyshev Inequality Suppose that X is an arbitrary random variable with finite mean m, and finite variance 2. For any positive number 5, 2 PUX ~m,|>8)< 5 (2-1-161) This relation is called the Chebyshev inequality. The proof of this bound is relatively simple. We have ’ beg ; (x ~m,)’p(x) de al p(x) dx = 8° P(X — m,| > 8) aie Thus the validity of the inequality is established. It is apparent that the Chebyshev inequality is simply an upper bound on the area under the tails of the pdf p(y), where Y =X —m,, i.e., the area of p(y) in the intervals (—%, —8) and (5, %). Hence, the Chebyshev inequality may be expressed as 2 1- [F8)— F-B)< (2-1-162) or, equivalently, as 2 1- [Fy(on, + 8) — Fy(m, ~ 8)] <2 (2-1-163) There is another way to view the Chebyshev bound. Working with the zero mean random variable Y= X —m,, for convenience, suppose we define a function g(Y) as 1 (y/>6 soya {h no 0 (W¥I<68) (21164) Since g(Y) is either 0 or 1 with probabilities P(Y|<5) and P(¥|>8), Tespectively, its mean value is Elg(¥)] = PUY| > 8) (2-1-165) FIGURE 2-41-11 54> DIGITAL COMMUNICATIONS A quadratic upper bound on g(Y) used in obtaining the tail probability (Chebyshev bound). Now suppose that we upper-bound g(Y) by the quadratic (Y/8)”, i.e., y 2 ar<(5) (2-1-166) The graph of g(Y) and the upper bound are shown in Fig, 2-1-11. It follows that y? E y? 2 2 aeonee(g) EP ad Since E(g(¥)] is the tail probability, as seen from (2-1-165), we have obtained the Chebyshev bound. For many practical applications, the Chebyshev bound is extremely loose. The reason for this may be attributed to the looseness of the quadratic (Y/8) in overbounding g(¥). There are certainly many other functions that can be used to overbound g(¥). Below, we use an exponential bound to derive an upper bound on the tail probability that is extremely tight, Chernoff Bound The Chebyshev bound given above involves the area under the two tails of the pdf. In some applications we are interested only in the area under one tail, either in the interval (5, ©) or in the interval (~~, 5). In such a case we can obtain an extremely tight upper bound by overbounding the function g(Y) by an exponential having a parameter that can be optimized to yield as tight an upper bound as possible. Specifically, we consider the tail probability in the interval (4, »). The function g(¥) is overbounded as g(Y) serra) (2-1-167) where g(Y) is now defined as _ fl (v=8) an={i (Y<6) (2-1-168) CHAPTER 2: PROBABILITY AND STOCHASTIC Processes 55 FIGURE 2-1-2 An exponential upper bound on g(Y) used in obtaining the tail probability (Chernoff bound). and v>O is the parameter to be optimized. The graph of g(Y) and the exponential upper bound are shown in Fig. 2-1-12. The expected value of g{Y) is Elg(Y)] = P(Y > 8) < E(e""~) (2-1-169) This bound is valid for any v0. The tightest upper bound is obtained by selecting the value of v that minimizes E(e“”—®)). A necessary condition for a minimum is 4 E(e""-®)=0 (2-1-170) But the order of differentiation and expectation can be interchanged, so that d d = vuy-8)) — pl © ,wr-3) dv Ee ) eS, é ) = E[(¥ — 8)e" 9] =e" “LE(Ye"") — 5E(e"")}=0 Therefore the value of v that gives the tightest upper bound is the solution to the equation E(Ye'") ~ 5E(e*Y) =0 (2-1-171) Let 9 be the solution of (2-1-171). Then, from (2-1-169), the upper bound on the one-sided tail probability is P(Y 28) > 1, (2-1-179) reduces to p(y >ay 8) where, by definition, 5,,=m, +6. Since 6>0, it follows that 5, > m,. Let 8(X) be defined as _fl (X26,) a(X)= {5 (X<8,) (2-1-181) and upper-bounded as B(X) se" -) (2-1-182) From this point, the derivation parallels the steps contained in (2-1-169)- (2-1-172). The final result is P(X > 6,,) m, and 9 is the solution to the equation E(Xe*) - 5,,E(e"*) =0 (2-1-184) In a similar manner, we can obtain the Chernoff bound for the lower tail probability. For 6 <0, we have P(X — m, <8) = P(X 5)=0 (2-1-189) ane amt Therefore, the probability that the estimate of the mean differs from the true mean m, by more than & (6 > 0) approaches zero as n approaches infinity. This statement is a form of the law of large numbers. Since the upper bound converges to zero relatively slowly, i.e., inversely with n, the expression in (2-1-188) is called the weak law of large numbers. The Chernoff bound applied to the random variable Y yields an exponential dependence of n, and thus provides a tighter upper bound on the one-sided tail probability. Following the procedure developed in Section 2-1-5, we can determine that the tail probability for y is P(Y -m,>5)=P 1S ym, 265 ly nm = ap X, =n8,,) = Elexp [>( where 6,,=m,+6 and 6>0. But the X,, i=1,2,...,n, are Statistically independent and identically distributed. Hence, Elexp [( Xi- rn) } = ee exp (v 3 x) fst 3x, -n6,,) |} (2-1-190) i=t net] Ee) ant = [eE(e*}]" (2-1-191) where X denotes any one of the X,. The parameter v that yields the tightest upper bound is obtained by differentiating (2-1-191) and setting the derivative equal to zero. This yields the equation E(Xe*) ~ 5,E(e"*) =0 (2-1-192) © DIGITAL COMMUNICATIONS Let the solution of (2-1-192) be denoted by %. Then, the bound on the upper tail probability is (= x2 bn Ke E(e*))", 5, > my (2-1-1493) it Ina similar manner, we find that the lower tail probability is upper-bounded as P(Y <6,,)<[e7 "E(e™*)]", 8, < my (2-1-194) where ? is the solution to (2-1-192). Example 2-1-7 Let X;, i=1,2,...,n, bea set of statistically independent random variables defined as x ={ 1 with probability p <4 ‘(-1 — with probability 1 - p We wish to determine a tight upper bound on the probability that the sum of the X; is greater than zero. Since p <4, we note that the sum will have a negative value for the mean; hence we seek the upper tail probability. With 6,, = 0 in (2-1-193), we have P( px 0) <(E(e)" (2-1-195) it where ¢ is the solution to the equation E(Xe"*) =0 (2-1-196) Now E(Xe’*) = —(1- p)e-’ + pe’ =0 Hence i_ o=In ( <) (2-1-197) Furthermore, E(e*) = pe? + (1- pe? Therefore the bound in (2-1-195) becomes P(3 x0) <[pe'+(-pe Y = ([4p(1- py"? (2-1-198) We observe that the upper bound decays exponentially with n, as expected. CHAPTER 2: PROBABILITY AND STOCHASTIC PROCESSES 6 In contrast, if the Chebyshev bound were evaluated, the tai! probability would decrease inversely with n. Central Limit Theorem We conclude this section with an extremely useful theorem concerning the cdf of a sum of random variables in the limit as the number of terms in the sum approaches infinity. There are several versions of this theorem. We shall prove the theorem for the case in which the random variables X,, (= 1,2,...,”, being summed are statistically independent and identically distributed, each having a finite mean m, and a finite variance o%. For convenience, we define the normalized random variable X;—m, U,=———, i =1,2,...,0 oO, Thus U; has a zero mean and unit variance, Now, let 12 Y=—=>dvy, 2-1- Be (2-1-199) Since each term in the sum has a zero mean and unit variance, it follows that the normalized (by 1/Vn) random variable Y has zero mean and unit variance. We wish to determine the cdf of Y in the limit as n > ~. The characteristic function of Y is we U, i=1 Uy(ju) = Ele*") = E} exp Va =Tle,{ ~Tu(Z) [of eam where U denotes any of the U,, which are identically distributed. Now, let us expand the characteristic function of U in a Taylor series. The expansion yields u VV Be 2 cry, ey voli) T+) BW) > re) + AI BU) (2-1-201) Since E(U) = 0 and E(U?) = 1, (2-1-201) simplifies to - 2 wo) zt 2 Rn) (2-1-202) where R(v,n)/n denotes the remainder. We note that R(v,n) approaches 62 DIGITAL COMMUNICATIONS zero as n -» %, Substitution of (2-1-202) into (2-1-200) yields the characteristic function of Y in the form dr) = [1-5 SOY (2-1-203) Taking the natural logarithm of (2-1-203), we obtain In dev) =n in| -£ Ren (2-1-204) For smali values of x, In (1 +.) can be expanded in the power series In(Qlit¢x)ex—dethr—.., This expansion applied to (2-1-204) yields 2 2 2 in vv(ju) = n[ -2 Att) _1(-= 4 Rin) +} (2-1-205) Finally, when we take the limit as n—><, (2-1-205) reduces to lim,_.. In #y(jv) = ~4v?, or, equivalently, lim py(jv) =e"? (2-1-206) But, this is just the characteristic function of a gaussian random variable with zero mean and unit variance. Thus we have the important result that the sum of statistically independent and identically distributed random variables with finite mean and variance approaches a gaussian cdf as n— =. This result is known as the central limit theorem. Although we assumed that the random variables in the sum are identically distributed, the assumption can be relaxed provided that additional restrictions are imposed on the properties of the random variables. There is one variation of the theorem, for example, in which the assumption of identically distributed random variables is abandoned in favor of a condition on the third absolute moment of the random variables in the sum. For a discussion of this and other variations of the central limit theorem, the reader is referred to the book by Cramer (1946). 2.2. STOCHASTIC PROCESSES Many of the random phenomena that occur in nature are functions of time. For example, the meteorological phenomena such as the random fluctuations in air temperature and air pressure are functions of time. The thermal noise voltages generated in the resistors of an electronic device such as a radio receiver are also’a function of time. Similarly, the signal at the output of a source that generates information is characterized as a random signal that CHAPTER 2; PROBABILITY AND STOCHASTIC PROCESSES 63, varies with time. An audio signal that is transmitted over a telephone channel is an example of such a signal. All these are examples of stochastic (random) processes. In our study of digital communications, we encounter stochastic processes in the characterization and modeling of signals generated by information sources, in the characterization of communication channels used to transmit the information, in the characterization of noise generated in a receiver, and in the design of the optimum receiver for processing the received random signal. At any given time instant, the value of a stochastic process, whether it is the value of the noise voltage generated by a resistor or the amplitude of the signal generated by an audio source, is a random variable. Thus, we may view a stochastic process as a random variable indexed by the parameter ¢. We shail denote such a process by X(t). In general, the parameter ¢ is continuous, whereas X may be either continuous or discrete, depending on the characteris- tics of the source that generates the stochastic process. The noise voltage generated by a single resistor or a single information source represents a single realization of the stochastic process. Hence, it is called a sample function of the stochastic process. The set of all possible sample functions, e.g., the set of all noise voltage waveforms generated by resistors, constitute an ensemble of sample functions or, equivalently, the stochastic process X(i). In general, the number of sample functions in the ensemble is assumed to be extremely large; often it is infinite. Having defined a stochastic process X(t) as an ensemble of sample functions, we may consider the values of the process at any set of time.instants >t, >t;>...>1, where a is any positive integer. In general, the random variables X, = X(t), i =1,2,...,n, are characterized statistically by their joint Pdf p(x, x,,,---.%,,)- Furthermore, all the probabilistic relations defined in Section 2-1 for multidimensional random variables carry over to the random variables X,, i=1,2,...,m. Stationary Stochastic Processes As indicated above, the random variables X,, '=1,2,...,2, obtained from the stochastic process X(r1) for any set of time instants 1, >) >t,>...>f, and any n are characterized Statistically by the joint pdf p(x,,, x,, ,,). Let us consider another set of n random variables X,,,= X(t,+ 9), i=1,2,...,n, where ¢ is an arbitrary time shift. These random variables are characterized by the joint pdt P(Xi40Xq+n+++>%:,+1). The joint pdfs of the random variables X, and X,4,, f=1,2,...,n, may or may not be identical. When they are identical, ie.. when P(Xiy Nigro Ly) = Pliny en kya os Xess) (2-2-1) for all ¢ and all , the stochastic process is said to be stationary in the strict sense. That is, the statistics ef a stationary stochastic process are invariant 10 any translation of the time axis. On the other hand, when the joint pdfs are different, the stochastic process is nonsiationary. 64 DIGITAL COMMUNICATIONS 2-2-1 Statistical Averages Just as we have defined statistical averages for random variables, we may similarly define statistical averages for a stochastic process. Such averages are also called ensemble averages. Let X(t) denote a random process and let X,, = X(t,). The nth moment of the random variable X;, is defined as eap= [xtptidar, 2-22) In general, the value of the nth moment will depend on the time instant 1; if the pdf of X,, depends on 4. When the process is stationary, however, p(x,.,) = p(x,) for all, Hence, the pdf is independent of time, and, as a consequence, the nth moment is independent of time. Next we consider the two random variables X,=X(s,), i= 1,2. The correlation between X,, and X,, is measured by the joint moment EX,X,)= | i Xi XP Krys Xr.) Ay, AX, (2-2-3) Since this joint moment depends on the time instants 1, and 1;, it is denoted by (t), 2). The function $(t,,%) is called the autocorrelation function of the stochastic process. When the process X(t) is stationary, the joint pdf of the pair (X,,. X,.) is identical to the joint pdf of the pair (X,,,,, X,,+1) for any arbitrary ¢. This implies that the autocorrelation function of X(t) does not depend on the specific time instants ¢, and 1, but, instead, it depends on the time difference t ~ tj. Thus, for a stationary stochastic process, the joint moment in (2-2-3) is EX, X..) = Oh, 62) = $C ~ 2) = 6(t) (2-2-4) where 7 = ¢, — f or, equivalently, t. = t, — 7. If we let = t+ 7, we have $(—1) = E(X,,X,,+) = E(XgXa-2) = O(t) Therefore, ¢(r) is an even function. We also note that $(0) = E(X?) denotes the average power in the process X(t). There exist nonstationary processes with the property that the mean value of the process is independent of time (a constant) and where the autocorrela- tion function satisfies the condition that $(t, te) = (0, — &). Such a Process is called wide-sense stationary. Consequently, wide-sense stationarity is a less stringent condition than strict-sense stationarity. When reference is made to a stationary stochastic process in any subsequent discussion in which correlation functions are involved, the less stringent condition (wide-sense stationarity) is implied. Related to the autocorrelation function is the autocovariance function of a stochastic process, which is defined as A(t, &) = EA{X,, — m(t)]LX, — m(e)}} = (6, b) - m(t)m() (2-2-5)

You might also like