0% found this document useful (0 votes)
10 views

Low ComplexityCFR

This paper is also on Crest Factor reduction in 5G and LTE transmit paths. The CFR block sits after the IFFT and before the DPD(Digital RPE distortion) in the 5g RU(Radio Unit). In this paper the signal amplitude and phase information are separated. the phase information is applied to the samples in other path and in the amplitude path the signal amplitude is reduced , the two paths direct and indirect are delay equalized and recombined at the end leading to the improvement of the PAPR.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Low ComplexityCFR

This paper is also on Crest Factor reduction in 5G and LTE transmit paths. The CFR block sits after the IFFT and before the DPD(Digital RPE distortion) in the 5g RU(Radio Unit). In this paper the signal amplitude and phase information are separated. the phase information is applied to the samples in other path and in the amplitude path the signal amplitude is reduced , the two paths direct and indirect are delay equalized and recombined at the end leading to the improvement of the PAPR.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Song and Ochiai EURASIP Journal on Wireless Communications and

Networking (2015) 2015:85


DOI 10.1186/s13638-015-0319-0

RESEARCH Open Access

A low-complexity peak cancellation scheme


and its FPGA implementation for
peak-to-average power ratio reduction
Jiajia Song* and Hideki Ochiai

Abstract
The power amplifier (PA) is the most power-hungry component in a wireless base station transmitter, and reducing
the peak-to-average power ratio (PAPR) of wireless signals is an important issue for its effective use. In this paper, we
focus on a field-programmable gate array (FPGA) implementation of the peak cancellation (PC) technique, which is
known as the simplest method for PAPR reduction. The design issue of effective peak-cancelling pulses under the
constraint on the out-of-band emission is addressed. In order to reduce its hardware complexity, a novel approach for
generating peak-cancelling pulses is also presented. The experimental results based on long-term evolution
(LTE)/LTE-Advanced and multi-band Wideband Code Division Multiple Access (WCDMA) signals demonstrate the
validity of the proposed scheme. It has been shown that the proposed PC scheme can achieve lower in-band
distortion than the conventional PC with an acceptable loss in out-of-band performance. Our study also includes
mapping the signal processing methods onto a Xilinx virtex-7 FPGA device running at 245.76 MHz and addresses the
resource utilization and the hardware design in detail.
Keywords: PAPR reduction; Peak cancellation; FPGA; OFDM; LTE; LTE-Advanced; Multi-band; WCDMA

1 Introduction occurs. To mitigate this issue, extensive studies have been


Signals such as orthogonal frequency division multiplex- performed [4-6].
ing (OFDM) [1] and direct-sequence spread spectrum Some of the PAPR-reduction techniques for Wideband
(DS-SS) for code division multiple access (CDMA) [2] Code Division Multiple Access (WCDMA) system can
are widely adopted in modern wireless communication be found in, for example, [7,8]. On the other hand,
systems due to their remarkable advantages such as flexi- those for multi-carrier and OFDM signals are much
ble allocation of resources and high-spectrum utilization. more abundant and appear in many forms including
As these signals are essentially a sum of multiple sub- selected mapping (SLM) [9] and partial transmit sequence
carriers/codes of multiple users in either frequency or (PTS) [10,11], just to mention a few. The aforemen-
time domain, the probability density functions (PDFs) tioned techniques do not incur distortion, but they
of their signals tend to approach Gaussian, and thus, either have a large computational complexity or have
they exhibit high peak-to-average power ratio (PAPR) to modify the signal, which makes their implementa-
[3]. This poses strict demands on the dynamic range tion in high-speed real-time systems challenging or hin-
of data converters and especially limits efficient opera- ders standard-compliant operations. It should be noted
tion of the power amplifiers (PAs). Reducing the PAPR that, as the digital techniques in a transmitter continue
is hence important for boosting the PA efficiency by to scale, the power consumed by digital circuits also
allowing higher average input power before saturation takes up a large portion of the total power consumption.
Implementing the PAPR reduction with a high-speed and
power-hungry digital signal processor (DSP) is obviously
*Correspondence: [email protected] detrimental to the cost and power efficiency of the entire
Department of Electrical and Computer Engineering, Yokohama National system.
University, 79-5 Tokiwadai, Hodogaya, Yokohama 240-8501, Japan

© 2015 Song and Ochiai; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction
in any medium, provided the original work is properly credited.
Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 2 of 14

Motivated by the above observations, simple tech- PC scheme with notably low-hardware complexity and
niques such as clipping and filtering (CAF) [12-14], peak improved error vector magnitude (EVM) performance is
windowing (PW) [7,15,16] and peak cancellation (PC) proposed with its effectiveness experimentally demon-
[17-19], which have much lower complexity, can be con- strated.
sidered as more realistic approaches from the viewpoint This paper is organized as follows. Section 2 begins
of practical implementation. These techniques essentially with the introduction of the basic model of PC considered
introduce nonlinear operations so that distortions are throughout the paper, where the design of cancelling pulse
inevitable. Given that some degree of distortion is gener- is described. In Section 3, two conventional approaches
ally allowed for the transmitted signals, such techniques with their respective advantages and drawbacks for imple-
are very attractive. The major drawback of CAF is the peak menting the PC are presented. Furthermore, a novel PC
regrowth caused by the filtering effect, and the amount of approach with much reduced hardware complexity is pro-
regrowth is generally intractable. This is undesirable for posed and its implementation issues are discussed. Exper-
a transmitter with digital predistorter (DPD), as the DPD iments using various signals are performed in Section 4
needs to strictly keep the peak below a predefined value to demonstrate the benefits that can be achieved with the
to ensure that no signal sweeps into the saturation region proposed PC scheme. Finally, our conclusion is given in
of the PA. Although PAPR regrowth can be somewhat Section 5.
alleviated by iterative use of CAF [20,21], the resulting
complexity will be increased several fold because of the 2 PAPR reduction by peak cancellation
duplicated functional blocks. Furthermore, the latency A basic diagram of the PC process considered in this work
issue becomes also prohibitive. is sketched in Figure 1. Its principle is to generate can-
In contrast, PC is a much overlooked technique that has celling pulses at the time instants where the peaks higher
advantages in several aspects. PC simply generates inde- than the predetermined threshold are found. The gener-
pendent cancelling pulses to cancel the peak values to a ated pulses are linearly scaled and rotated with appropri-
given threshold. It allows more cost-effective hardware ate phase shift such that after their addition the original
implementation than the CAF as no filtering operation, signals have the peaks reduced to the threshold [17].
which involves either a large number of multipliers or a
bank of fast Fourier transform (FFT) blocks, is required. 2.1 Peak-cancelling process
Moreover, PC can be easily configured to make com- To perform peak cancellation on the complex baseband
pliant operation to signals of different communication signal, the target signal should be oversampled as the
standards. This is because the cancelling pulses can be Nyquist-rate-sampled signals cannot correctly represent
updated to support a variety of carrier configurations and the actual amplitude of peaks of the continuous-time sig-
bandwidths. The concept of PC can be also used to facil- nals [3]. The discrete complex baseband signal sn has the
itate generating cancelling pulses in ACE [22] and tone general form of:
reservation (TR) [23,24]. In [24], the cancelling pulse is
generated by performing inverse fast Fourier transform sn = rn ejθn , (1)
(IFFT) of the distorted signal after clipping and filtering
where rn and θn represent the amplitude and phase,
located in the peak reduction tones (PTRs). This method
respectively, at the nth time instant. Suppose that there
thus generates neither in-band distortion nor out-of-band
are Np peaks that are larger than the predefined threshold
(OOB) emission but at a cost of data rate loss. In [19],
Ath within a given time period T, and let ρ1 , ρ2 , · · · , ρNp
the cancelling pulse generated using the PRTs is repeti-
denote the corresponding successive peaks observed at
tively employed without FFT and IFFT for low-complexity
the time instants n1 , n2 · · · , nNp , respectively. Let gn
implementation. Nevertheless, given the high computa-
denote the impulse response of the cancelling pulse cen-
tional complexity and high latency, the above-mentioned
tred at n = 0, i.e. g0 representing its maximum value.
approaches may not be suitable for practical applications.
Then, the ith peak cancelling pulse at the time instant ni ,
Until now, only a few papers address hardware imple-
where i ∈ {1, · · · , Np }, is expressed as:
mentation of the PC, and even fewer of them have
 
mentioned its application to actual signals observed in p(i)
n = rni − Ath gn−ni e
jθni
, (2)
commercial transmitters. Therefore, it is meaningful to
investigate the applicability of PC in practical settings where the phase is rotated by ejθni to match the phase of
through field-programmable gate array (FPGA)-based the corresponding complex-valued peak sample, and the
experiments, and this is our main contribution in this amplitude is scaled by |rni − Ath | such that the peak value
work. Specifically, we investigate the feasibility and real- at n = ni is equal to Ath after peak cancellation. Then,
izability of PC through elaboration of hardware design the overall signal after cancellation of the entire peaks is
issues upon FPGA implementation. Furthermore, a novel expressed as:
Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 3 of 14

Figure 1 A simplified block diagram of the PC process.

Np
 s̄n = sn − p(i)
n
s̄n = sn − p(i)    
n
= s̃kJ hn−kJ − rni − Ath ejθni gn−ni . (5)
i=1
⎡ ⎤ k
Np
   jθ (3)
= sn − ⎣ rni − Ath e ni gn−ni ⎦, Suppose that the peak position is precisely given at
i=1 ni = ki J + bi , where ki and bi are some integers. Then,
Equation 5 is rewritten as:
=pn
   
s̄n = s̃kJ hn−kJ − rni − Ath ejθni gn−(ki J+bi )
where pn is all the combined cancelling pulses located k
at the time instant ni . If we ignore the change of the     
= s̃kJ hn−kJ + s̃ki J − rni − Ath ejθni hn−ki J
amplitude and average power due to the addition of
k,k =ki
all the cancelling pulses, Ath is the maximum ampli-    
tude after peak cancellation. In what follows, we refer + rni − Ath ejθni hn−ki J − gn−ki J−bi ,
to the corresponding PAPR determined by Ath as a (i)
dn
target PAPR.
(6)
2.2 Effect of the cancelling pulse which indicates that a proper design of gn will avoid the
The impulse response of the cancelling pulse gn deter- out-of-band emission, but we still observe that distortion
mines the resulting OOB radiation. In general, gn should (i)
component dn will affect all the other sampling instants.
be compliant to the spectral mask of a given target stan- This distortion component can be nullified only if gn is set
dard. Suppose that sn is the oversampled version of the equal to hn and the peak position occurs at the Nyquist
band-limited signal and let J denote the oversampling fac- point, i.e. hn−ki J = gn−ki J−bi .
tor such that the s̃kJ represents the samples at the Nyquist In other words, if the cancelling pulse gn is identical to
rate for an integer k. hn , then no out-of-band regrowth will occur as the sig-
The resulting signal sn is then expressed as: nal power is confined inside the pass-band of hn . In fact,
 the clipping and filtering approach presented in [13] cor-
sn = s̃kJ hn−kJ , (4) responds to this special case where gn is the periodic sinc
k function [25]. Since the periodic sinc function has non-
negligible impulse response over entire OFDM symbol,
where hn is the corresponding impulse response of the it causes considerable peak regrowth. Therefore, in prac-
pulse-shaping filter, and the summation is over the range tice, we wish to choose gn such that its side lobe (in time
of k where the impulse response has a non-negligible domain) vanishes rapidly, and yet, its frequency response
effect. It is worth mentioning that even though the OFDM has acceptable out-of-band emission in terms of adjacent
signal is not explicitly shaped by a filter, we can still find channel leakage ratio (ACLR).
an equivalent form [25] to represent the virtual pulse-
shaping filter. Therefore, Equation 4 also applies to the 2.3 Design of the cancelling pulse
conventional OFDM signals. As we have seen, the impulse response of the peak-
Now we consider the scenario where one peak ρi is cancelling pulse gn , which is essentially a finite impulse
detected and subtracted by a cancelling pulse p(i)n . It then response (FIR) filter, serves as a trade-off between the
follows that: out-of-band radiation and in-band distortion. Specifically,
Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 4 of 14

Figure 2 Performance comparison of the three different cancelling pulses designed based on different FIRs. The left hand side figure shows
their impulse responses, whereas the right hand side figure shows the corresponding frequency responses. RC, raised cosine; WS, windowed sinc;
ER, equal ripple.

shorter impulse response results in lower in-band dis- performance of the peak cancellation based on the three
tortion, but it will cause an increasing amount of out- cancelling pulses is demonstrated in Figure 3 using
of-band radiation that may violate the specified spectral numerical simulation, where a WCDMA signal is used
mask. Therefore, careful design of the cancelling pulse is as its test signal. In this figure, as a practical measure
essential. for PAPR, we adopt the complementary cumulative dis-
However, there exists no solid algorithm or closed- tribution function (CCDF) of the instantaneous power
form deviation for finding the best cancelling pulse, and normalized by its average power.
thus, exhaustive attempts are necessary to find the suit- From the left hand side of Figure 3, we observe that sim-
able one for a specified signal and to satisfy the design ilar PAPR performance is achieved. However, comparison
requirements. For instance, three different filters (can- of the power spectra in Figure 3 with their corresponding
celling pulses) of the same length are illustrated in Figure 2 frequency responses in Figure 2 reveals that the frequency
with their respective impulse response and frequency response of the cancelling pulses has the dominant effect
response. Here, the windowed sinc (WS) is obtained by on the resulting spectrum after peak cancellation.
multiplying Kaiser window to sinc function. The raised The effects of pulse length on the in-band distortion
cosine (RC) and sinc are both Nyquist filters as can be (i.e. EVM) and out-of-band distortion (i.e. ACLR) are
seen from the left hand of the figure. The equal ripple (ER) reported in Figure 4a,b, respectively. The measurement
filter is obtained by the well-established Parks-McClellan of EVM and ACLR follows the 3rd Generation Partner-
algorithm [26] which minimizes the error in pass and ship Project (3GPP) frequency division duplexing (FDD)
stop bands by employing Chebyshev approximation. The WCDMA downlink specification [27]. We have concluded

Figure 3 Peak-cancelled performance due to the different peak-cancelling pulses introduced in Figure 2. The left hand side figure shows
the PAPR performance in terms of CCDF, whereas the right hand side figure shows the corresponding power spectra. CCDF, complementary
cumulative distribution function; RC, raised cosine; WS, windowed sinc; ER, equal ripple.
Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 5 of 14

distortion but should be long enough to give admissible


ACLR.
It can also be observed from Figure 4a,b that the can-
celling pulse generated by an equal ripple filter gives the
best performance both in terms of EVM and ACLR. Bet-
ter EVM performance is due to the lower side lobe in
time domain, and lower ACLR is due to higher out-of-
band attenuation of the cancelling pulse, as can be seen
from Figure 2. It can also be seen from Figure 4a that
even though the raised-cosine filter completely conforms
to the pulse-shaping filter for WCDMA signal, it shows
worst performance when the cancelling pulse length is
short. This is mainly due to the distortion effect and the
high sidelobe of the raised-cosine filter. In general, design-
ing cancelling pulse with Parks-McClellan algorithm leads
to better performance than other filter types that are
frequently found in the literature.

3 Hardware implementation of peak cancellation


From this section and later, we focus on efficient imple-
mentation of PC by hardware. We first introduce the
conventional approach for the implementation of PC as
well as its alternative method, which is followed by the
description of the proposed PC as well as their detailed
implementation issues.

3.1 Conventional implementation: scheme 1


A conventional scheme for implementing the PC [28],
which we refer to as a scheme 1 in what follows, is shown
in Figure 6. The instantaneous magnitude and phase of
the complex signal are computed with coordinate rotation
digital computer (CORDIC) algorithm. The ‘Peak Detect’
block, which contains some registers and comparators,
identifies the peak magnitude higher than the thresh-
Figure 4 Effect of different pulse length. (a) Pulse length effect
old. When a peak is detected, the corresponding phase
on EVM. (b) Pulse length effect on ACLR. RC, raised cosine; WS,
windowed sinc; ER, equal ripple; EVM, error vector magnitude; ACLR, and the magnitude amount higher than the threshold are
adjacent channel leakage ratio. latched. These values are used to scale and rotate the nor-
malized real cancelling pulses, and these operations are
accomplished by multiple CORDIC cores.
in the last subsection that a longer pulse introduces The cancelling-pulse generator block contains one
more symbol errors to the target signal, and this is val- counter, which is directly connected to the address port of
idated by Figure 4a. On the other hand, it can be easily the read only memory (ROM). The pre-determined can-
grasped by inspection of Figure 4b that ACLR reduces celling pulse, which is compliant with the target signal
with longer cancelling pulse length. However, the curves spectrum, is stored in the ROM. The counter will be trig-
shown in the figure have some fluctuations. This can be gered as soon as one peak is found, and it is reset when it
understood by inspecting Figure 5, where the impulse counts to the length of cancelling pulse L.
response for a typical low-pass filter is shown. Trun- To deal with the occurrence of intensive peaks, multi-
cation length of the impulse response also affects the ple cancelling-pulse generators are necessary to generate
shape of the resulting frequency response. For instance, independent cancelling pulses, and Figure 7 illustrates
L1 has worse out-of-band attenuation than L3 but may such an operation where the signal is propagating from
be better than L2 , because L2 has non-zeroes at the head left to right. When the first peak in the left is detected,
and end. This heuristic observation reveals a basic clue the first generator will be turned on for L clocks to gener-
for choosing the cancelling pulse length: the cancelling ate the first cancelling pulse. In the event that the second
pulse should be as short as possible to minimize the and third peaks are detected when the first generator is
Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 6 of 14

Figure 5 An example of typical impulse response of cancelling pulse with various truncation lengths.

still on, the second and third generators will be triggered have a higher probability to be used while the cancelling-
to generate the second and third cancelling pulses. If the pulse generators in the bottom may be idle most of the
fourth peak is detected when ROM 1 is free, the first gen- time. Therefore, some compromise is necessary to deter-
erator will be reused to generate the fourth cancelling mine how many cancelling-pulse generators are used. As
pulse. In summary, when the previous cancelling-pulse a rule of thumb, five or six cancelling-pulse generators
generator is triggered and the next peak is found, the next are enough when a reasonable threshold value is assumed,
cancelling-pulse generator will be triggered in sequel. This as the average number of the peaks above the thresh-
successive process continues until no peak higher than the old monotonically decreases as the threshold increases
threshold is found. All the outputs of the cancelling-pulse [3]. However, since the number of peaks itself is a ran-
generators are summed and finally subtracted from the dom variable, with the fixed number of pulse generators,
delayed original signal. a failure to peak cancellation may occasionally happen. In
It becomes clear that the resource complexity of this this case, some iterative processing structure should be
scheme is bounded by the number of available cancelling- introduced, which may lead to increasing latency.
pulse generators. Note that whether this scheme can can-
cel all the peaks in one pass or not depends on the specific 3.2 Conventional implementation: scheme 2
parameters such as the required pulse length, targeted To overcome the issue of a peak cancellation failure, an
PAPR and the number of cancelling-pulse generators. It is alternative implementation scheme [29], which we refer to
easy to see that the generators in the upper side of Figure 6 as scheme 2, can be applied. Let us rewrite Equation 3 as:

Figure 6 Architecture of conventional PC hardware implementation - scheme 1. ROM, read only memory.
Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 7 of 14

Figure 7 Principle of multiple cancelling pulse generation.

Np
    In summary, the first scheme has lower complexity in
s̄n = sn − rni − Ath · δ (n − ni ) ejθni ∗ gn , (7) terms of fewer multipliers but has the problem of peak
i=1 generation failure. The second scheme is easier to imple-
ment but has higher resource complexity due to the use
where gn serves as the coefficients (impulse response) of of FIR filter. In order to cope with the peak generation
an FIR filter and ∗ denotes the convolution operation. The failure, however, the first scheme may require more iter-
cancelling pulses can be thus generated by propagating ations which in turn increase the complexity several fold.
the delta function train to this filter, and this principle is In this sense, the second scheme that has a fixed hardware
illustrated in Figure 8 and its hardware implementation is overhead is preferred.
given in Figure 9.
Similar to the first scheme, a CORDIC core is used to 3.3 The proposed peak cancellation
compute the magnitude and phase of the signal. The ‘Pulse As can be observed from Figure 7, a sum of multiple
Gen’ block outputs a sample which is properly scaled and overlapped pulses forms the final pulse when intensive
rotated when a peak is detected. The resulting delta func- peaks occur. The tails of the previous pulses may hap-
tion train is complex-valued. It is easy to see from Figure 9 pen to be added in-phase to the peaks in sequel, result-
that the resource complexity relies primarily on the fil- ing in less effective peak reduction. Inspired by these
ter, while the complexity of its counterpart in Figure 6 observations, we proposed a novel approach of peak
depends on the number of CORDIC cores. Implement- cancellation. The principle of the proposed approach is
ing an FIR filter requires a large number of multipliers, illustrated in Figure 10. Instead of generating complete
which makes the scheme 1 preferable as it consumes less cancelling pulses stored in the ROMs as the one shown
resources. in Figure 6, the proposed scheme generates truncated

Figure 8 Principle of cancelling pulse generated by delta function trains.


Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 8 of 14

Figure 9 Architecture of conventional hardware implementation of peak cancellation - scheme 2. FIR, finite impulse response.

cancelling pulses when they are overlapping with each a short-length smoothing filter is adequate to further
other. More specifically, when the interval of two con- improve the ACLR performance. As the filter length is
tiguous peaks is detected to be less than the predefined very short, it gives negligible effect on EVM.
cancelling-pulse length, the generation of the first can-
celling pulse terminates in the middle of the interval and 3.4 Implementation of the proposed peak cancellation
immediately triggers the second cancelling pulse. Since We now describe the hardware implementation aspects
this scheme reduces the length for overlapping cancelling of the proposed scheme. A detailed block diagram of the
pulses, lower in-band error can be expected in view of the proposed scheme is given in Figure 11. Some example
observation given in the Section 2.3. waveforms of the internal signals labelled in Figure 11 are
However, as can be seen from Figure 10, the cancelling plotted in Figure 12.
pulses show obvious discontinuity which would produce Similar to the previous schemes, a CORDIC core is used
substantial out-of-band emission. To mitigate the out-of- to compute the instantaneous magnitude and phase of the
band emission, a filter can be applied to smooth the dis- signal. The ‘Peak Detect’ block, which contains some reg-
continuities, and the resulting waveform of the smoothed isters and comparators, outputs a ‘1’ when a magnitude
cancelling pulse is also denoted by the dashed curve in peak is found, as shown in the second plot in Figure 12.
Figure 10. In our method, we use a simple moving aver- The output of the ‘Peak Detect’ block is connected to the
age filter as a smoothing filter, with which the out-of-band enable ports of the two latches which store the magnitude
spurious level caused by non-continuous cancelling pulses and phase of the corresponding peak. The ‘Interval Loca-
can be reduced. Note that the smoothing filter is actu- tor’ block generates a ‘1’ in the middle of two peaks when
ally optional as we can still obtain a moderate out-of-band the interval of these peaks is less than the cancelling-pulse
emission without it, considering that the occurrence of length. The outputs of ‘Interval Locator’ and ‘Peak Detect’
high peaks is a rare event [3]. The smoothing filter is only (I) and (II) are combined with an OR gate. A ‘Delay’ block
necessary in the ACLR-prior circumstance. Normally, is used to align these two signals as the ‘Interval Locator’
has a fixed delay. The ‘Cancelling Pulse Duration” block
produces enable signal (III) for the counter which outputs
the address of the ROM. The counting direction of the
counter is controlled by a latch output which is reversed
when triggered by the output of the OR gate. Operation of
these signals can be easily seen from the second and third
plots in Figure 12.
The ROM output (V) is then scaled and rotated by the
latched magnitude and phase to form the cancelling pulses
and fed to the FIR filter. The smoothed cancelling pulses
(VI) are subtracted from the delayed original signal to
form the PAPR-reduced signal. It should be noted that
the four ‘Delay’ blocks shown in Figure 12 do not neces-
sarily indicate that the delay values for these blocks are
identical.
Figure 10 Principle of the proposed peak cancellation approach. The moving average filter used here for smoothing the
truncated pulses is much shorter than the one used in
Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 9 of 14

Figure 11 Hardware circuit of the proposed PC scheme using FPGA. ROM, read only memory.

Figure 9. Thus, it has less hardware complexity, even XC7VX485T-2FFG1761C device. The test WCDMA/LTE
though both of them contain two CORDIC cores. signal (baseband IQ) is generated by Matlab on the
computer and is stored in a bank of RAMs in the FPGA
4 Performance evaluation of the proposed peak as the signal source. The output of peak cancellation is
cancellation captured by a series of integrated logic analyzers (ILAs) in
In this section, the experimental results for implementa- parallel, which is arranged in a time-interleaved manner
tion with FPGA are reported. We will compare the hard- so as to receive the signal of long length. This signal is then
ware complexity (in terms of resource utilization) of the transferred to computer through USB port and circularly
proposed PC and conventional PC first. This is followed shifted to align with the original signal and is analysed by
by the experimental results using a standard-conforming Matlab.
long-term evolution (LTE) signal as well as multi-standard A 245.76-MHz clock is synthesized by the on-chip
signals. mixed-mode clock manager (MMCM), which uses the on-
board 200-MHz oscillator as the reference. The clock is
4.1 Experiment description set to integer times (64 times in our example implementa-
The implementation is carried out using an FPGA eval- tion) of 3.84 MHz, targeting the specification of the 3GPP
uation board VC707, which contains a Xilinx Virtex-7 WCDMA and LTE signals.

Figure 12 Internal signal illustration of the proposed PC circuit.


Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 10 of 14

The cancelling-pulse length is set to 115, and the order fewer multipliers, the peak-missing problem will make it
of the moving averaging filter is set to 6 for the proposed unworkable as will be demonstrated in the subsequent
PC. The complex cancelling pulse is assumed to support subsections. In view of this, the proposed PC yields the
the asymmetric signal spectrum. As the smoothing filter lowest complexity.
is symmetric and one complex-variant multiplication uses The signal has 16-bit precision, which gives a noise floor
three multipliers, the filter consumes 6 ÷ 2 × 3 multipli- of approximately −80 dB. Note that since the systems such
ers. Furthermore, to optimize the speed, three multipliers as LTE and WCDMA require an ACLR of −55 dB at most,
are used to implement a CORDIC algorithm and the total actual implementation needs less bits.
of six multipliers by the two CORDIC cores. Another two
multipliers are used to scale the cancelling pulse (real 4.3 Test results using LTE signals
magnitude multiplying the complex cancelling pulse in The rest of the paper is devoted to the comparison of the
I/Q) so that the total consumption of multipliers is 17. three approaches in terms of in-band distortion, out-of-
band emission and realizable PAPR.
4.2 Resource utilization The applied LTE signal is a 16-QAM OFDM signal with
The FPGA resource utilization, which is evaluated with 1,200 subcarriers within 18.015-MHz occupied band-
Resource Estimator using post-mapping, for the three PC width and 20-MHz channel bandwidth. The basic sam-
approaches introduced are summarized in Table 1 where pling rate for such signal is 30.72 MHz since the FFT
the scheme 1 is operated either without iteration or with size is 2,048. The PC for LTE signal is operated at 245.76
one iteration. Only primary resources such as the slices MHz which represents an oversampling rate of 8. The sig-
and look-up tables (LUTs) are listed here, since they are nal spectrum and cancelling-pulse spectrum are demon-
generic for any FPGA, while the other resources used for strated in Figure 13. The cancelling pulse used here was
our test (primarily block RAMs and IOs for ChipScope) designed by Chebyshev approximation, as described in
are specific to the FPGA board and thus are not taken into Section 2.3.
consideration.
In scheme 1, the PC is comprised of four cancelling-
4.3.1 Peak power reduction capability comparison for
pulse generators, and therefore, four RAMs are used to
different PC schemes
store the cancelling pulse and accordingly four CORDIC
In what follows, we evaluate the CCDF of the normal-
cores are needed. One can see that with a single iter-
ized instantaneous power after employing the PC schemes
ation, the hardware resource of scheme 1 is roughly
considered in this work.
doubled. It is obvious that all of the three schemes con-
The CCDF plots of the signal with the proposed PC
sume only a small portion of the resources such as flip
and the two conventional PC schemes are demonstrated
flops (FFs) and slice LUTs. The conventional scheme 2
in Figure 14 where the target PAPR values are set as 6 and
costs more multipliers. This is because it requires imple-
8 dB. As can be seen from the figure, the conventional
menting the FIR filter to generate the cancelling pulses
scheme 1 without iteration exhibits high peaks due to
of equivalent length. For circuit integration, the multi-
pliers are most concerned as they generally cost more
power and take up larger area than other simple arith-
metic elements. In this sense, the proposed PC is rather
cost effective as it consumes fewer multipliers. Although
the conventional scheme 1 without iteration requires even

Table 1 FPGA resource utilization table


FFs Slice LUTs DSP48 RAM
(Multipliers)
Proposed 3,920 3,439 17 1
Conventional scheme1 3,354 3,117 15 4
(no iteration)
Conventional scheme1 6,785 6,341 30 8
(one iteration)
Conventional scheme2 5,749 5,383 171 0
Figure 13 Demonstration of the spectra of LTE signal and its
Available in FPGA 607,200 303,600 2,800 1,030
corresponding cancelling pulse.
FF, flip flop; LUT, look-up table.
Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 11 of 14

Figure 15 Realizable PAPR versus target PAPR for LTE signal. The
Figure 14 CCDF comparison in terms of normalized realizable PAPR is defined as the corresponding threshold-normalized
instantaneous power for different PC schemes with the target instantaneous power values measured at the CCDF of 10−4 and 10−5 .
PAPR of 6 and 8 dB. CCDF, complementary cumulative distribution PAPR, peak-to-average power ratio.
function.

4.3.2 In-band and out-of-band distortion comparison for


the cancelling peak generation failure under the condi- different PC schemes
tion that only four ROMs are made available for this The constellation plots of the user data, which are 16-
purpose. This problem can be solved by adding a sin- QAM signal, with target PAPR of 5 and 7 dB, are delin-
gle iteration, but this doubles the required resources as eated in Figure 16. The pilot points (reference signal) in
can be seen from Table 1. The conventional scheme 2 red, which is QPSK, are used to rotate, rescale and equal-
outperforms the scheme 1 without iteration, at a cost ize the user data. It can be inspected from the figure that,
of increasing hardware complexity. The proposed PC, for a target PAPR of 5 dB, the constellation of the user data
on the other hand, achieves performance comparable to is rather dispersive.
the scheme 1 with iteration, even with lower hardware The measured EVM and ACLR with different target
complexity. PAPRs are demonstrated in Figures 17 and 18, respec-
Considering the fact that the scheme 1 may not nec- tively. It can be inspected from the figures that the pro-
essarily achieve a target peak power reduction perfor- posed scheme yields lower in-band distortion (in terms of
mance without iteration, the experiments hereinafter will lower EVM) and higher out-of-band distortion (in terms
be focused exclusively on the proposed scheme and con- of higher ACLR). This stems from the fact that since
ventional scheme 2. the proposed method generates shorter cancelling pulses,
An important factor for evaluating the performance it can essentially reduce the in-band distortion in time
of a PAPR-reduction technique is the actually realizable domain but with the broader side lobes in frequency
PAPR. It is well known that the CAF causes an unavoid- domain, as discussed in Section 2. Also, even though
able peak power regrowth, and this will make the precise the proposed scheme has higher out-of-band emission,
control of PAPR challenging without resorting to increas- the observed ACLR with this parameter setting is still
ing complexity such as iterative use of CAF [20], which satisfactory for practical applications.
may also introduce a prohibitive amount of latency. In
contrast, the peak power achieved by PC schemes can 4.4 Test results using various signals
reduce the effect of the peak power regrowth and thus Finally, multiple tests have been performed to validate
achieve the PAPR close to the target PAPR. Figure 15 the proposed scheme for a more general framework with
compares the peak power reduction capabilities of the multi-carrier and multi-standard signals.
schemes 2 and the proposed scheme. In this figure, in
response to the target PAPR, the actual threshold values 4.4.1 Carrier-aggregated LTE signals
of the normalized instantaneous power at given specific The first test here assumes an LTE-Advanced signal with
CCDF of 10−4 and 10−5 are plotted as a realizable PAPR. carrier aggregation of three 20-MHz carrier components
It is observed that the proposed scheme outperforms (CCs). The spacing of the two carriers are set to 20.1
the scheme 2 from the viewpoint of PAPR reduction as MHz, which is an integer multiple of 15 KHz in order
well. to maintain the OFDM subcarrier spacing. The target
Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 12 of 14

Figure 17 EVM versus target PAPR for LTE signal. EVM, error
vector magnitude; PAPR, peak-to-average power ratio.

different CCs almost overlap with each other, indicating


the effectiveness of the proposed PC under the system
operated with multiple carrier frequencies.

4.4.2 Multi-standard signals


The second test uses a multi-standard signal, which con-
tains three WCDMA carriers and a 20-MHz LTE carrier,
where the two standard signals are spaced by 40 MHz. The
WCDMA signal used in this paper is generated following
the specification defined by 3GPP test model 3, for which
64 users (16-QAM) are multiplexed in dedicated physical
data channel (DPDCH) to form the 3.84 Mc/s chip rate.
The WCDMA chip is oversampled 64 times, and the PC is
operated at a corresponding sampling frequency of 245.76
MHz.

Figure 16 Constellation plots of user data. (a) When target PAPR =


5 dB. (b) When target PAPR = 7 dB. PAPR, peak-to-average power ratio.

PAPR is set to 7 dB which can yield reasonable EVM.


The resulting CCDF is plotted in Figure 19a to show the
effectiveness of the proposed scheme. The spectral den-
sity plots provided in Figure 19b show that, with the given
cancelling pulse, the proposed scheme yields very lim-
Figure 18 ACLRs (both lower and upper side bands) versus
ited spectral regrowth. The EVMs for the respective CCs
target PAPR for LTE signal. ACLR, adjacent channel leakage ratio;
under the condition of different realizable PAPRs are plot- PAPR, peak-to-average power ratio.
ted in Figure 20. We observe that the EVM curves for
Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 13 of 14

The resulting CCDF and power spectral plots are given


in Figure 21a,b, which demonstrate the adaptability of
the proposed PC to the signal with non-contiguous spec-
trum. The proposed scheme, with a realizable PAPR of
around 7 dB, can meet the imposed requirement on power
spectrum well as can be inspected from Figure 21b.

5 Conclusions
In this paper, the peak cancellation technique as a gen-
eral purpose PAPR reduction has been addressed. The
design issues of cancelling pulses that determine the per-
formance of PC have been also discussed. Our main focus
was on their FPGA implementation with a special consid-
eration on the hardware complexity. A novel PC scheme
with notably low-hardware complexity has been also pre-
sented. The experimental results using various signals
have demonstrated the validity of the proposed approach.

Figure 19 The performance comparison of PC for three-carrier


LTE-Advanced signal. (a) CCDF when target PAPR is set to 7 dB. (b)
The power spectral density plots when target PAPR is set to 7 dB.
CCDF, complementary cumulative distribution function; PC, peak
cancellation.

Figure 21 The performance comparison of PC for the aggregated


signal of three-carrier WCDMA and LTE signal. (a) CCDF
performance when target PAPR is set to 7 dB. (b) The power spectral
Figure 20 EVM of each LTE-Advanced component carrier. EVM, density plots when target PAPR is set to 7 dB. CCDF, complementary
error vector magnitude; PAPR, peak-to-average power ratio. cumulative distribution function; PC, peak cancellation.
Song and Ochiai EURASIP Journal on Wireless Communications and Networking (2015) 2015:85 Page 14 of 14

Competing interests 24. L Wang, C Tellambura, Analysis of clipping noise and tone-reservation
The authors declare that they have no competing interests. algorithms for peak reduction in OFDM systems. Vehicular Technol. IEEE
Trans. 57(3), 1675–1694 (2008)
Received: 31 October 2014 Accepted: 4 March 2015 25. H Ochiai, On instantaneous power distributions of single-carrier FDMA
signals. Wireless Commun. Lett. IEEE. 1(2), 73–76 (2012)
26. IW Selesnick, CS Burrus, Exchange algorithms that complement the
Parks-Mcclellan algorithm for linear-phase FIR filter design. Circuits Syst. II:
References
Analog Digital Signal Process. IEEE Trans. 44(2), 137–143 (1997)
1. R van Nee, R Prasad, OFDM for Wireless Multimedia Communications.
27. ETSI 3rd Generation Partnership Project (3GPP). Base station conformance
(Arthech House, Inc., Norwood, MA, USA, 2000)
testing (FDD). TS 25.141, the European Telecommunications Standards
2. H Honkasalo, K Pehkonen, MT Niemi, AT Leino, WCDMA and WLAN for 3G
Institute, September 2010
and beyond. IEEE Wireless Commun. 9(2), 14–18 (2002)
28. Inc Xilinx, LogiCORE IP Peak Cancellation Crest Factor Reduction v2.0. (Xilinx,
3. H Ochiai, H Imai, On the distribution of the peak-to-average power ratio
Inc., 1–28, 2009)
in OFDM signals. Commun. IEEE Trans. 49(2), 282–289 (2001)
29. CA Azurdia-Meza, K Lee, K Lee, PAPR reduction by pulse shaping using
4. SH Han, JH Lee, An overview of peak-to-average power ratio reduction
Nyquist linear combination pulses. IEICE Electron. Express. 9(19),
techniques for multicarrier transmission. Wireless Commun IEEE. 12(2),
1534–1541 (2012)
56–65 (2005)
5. T Jiang, Y Wu, An overview: peak-to-average power ratio reduction
techniques for OFDM signals. Broadcasting IEEE Trans. 54(2), 257–268
(2008)
6. Y Rahmatallah, S Mohan, Peak-to-average power ratio reduction in OFDM
systems: a survey and taxonomy. Commun. Surveys Tutorials IEEE. 15(4),
1567–1592 (2013)
7. O Väänänen, J Vankka, K Halonen, Simple algorithm for peak windowing
and its application in GSM, EDGE and WCDMA systems. IEE Proc.
Commun. 152(3), 357–362 (2005)
8. W-J Kim, K-J Cho, SP Stapleton, J-H Kim, Doherty feed-forward amplifier
performance using a novel crest factor reduction technique. Microwave
Wireless Components Lett. IEEE. 17(1), 82–84 (2007)
9. RW Bauml, RFH Fischer, JB Huber, Reducing the peak-to-average power
ratio of multicarrier modulation by selected mapping. Electron. Lett.
32(22), 2056–2057 (1996)
10. SH Müller, RW Bäuml, RFH Fischer, JB Huber, OFDM with reduced
peak-to-average power ratio by multiple signal representation. Ann. Des
Té, lécommunications. 52(1-2), 58–67 (1997)
11. RJ Baxley, GT Zhou, Comparing selected mapping and partial transmit
sequence for PAR reduction. Broadcasting IEEE Trans. 53(4), 797–803
(2007)
12. X Li, LJ Cimini, Effects of clipping and filtering on the performance of
OFDM. Commun. Lett. IEEE. 2(5), 131–133 (1998)
13. H Ochiai, H Imai, Performance analysis of deliberately clipped OFDM
signals. Commun. IEEE Trans. 50(1), 89–101 (2002)
14. R O’Neill, LB Lopes, in IEEE International Symposium on Personal, Indoor and
Mobile Radio Communications (PIMRC’95). Wireless: Merging onto the
Information Superhighway. Envelope variations and spectral splatter in
clipped multicarrier signals. vol. 1 IEEE (Toronto, 1995) 71–75
15. HN Mistry, Implementation of a peak windowing algorithm for crest
factor reduction in WCDMA Master’s thesis, School of Engineering
Science-Simon Fraser. University, Burnaby, BC, Canada, (2006)
16. M Pauli, H-P Kuchenbecker, Minimization of the intermodulation
distortion of a nonlinearly amplified OFDM signal. Wireless Personal
Commun. 4(1), 93–101 (1997)
17. T May, H Rohling, in IEEE Vehicular Technology Conference (VTC98).
Reducing the peak-to-average power ratio in OFDM radio transmission
systems, vol. 3 (IEEE Ottawa, 1998), pp. 2474–2478
18. L Dan, Y Xiao, W Ni, S Li, Improved peak cancellation for PAPR reduction in
OFDM systems. Commun. IEICE Trans. E93-B(1), 198–202 (2011)
19. H-B Jeon, J-S No, D-J Shin, A new PAPR reduction scheme using efficient Submit your manuscript to a
peak cancellation for OFDM systems. Broadcasting IEEE Trans. 58(4), journal and benefit from:
619–628 (2012)
20. J Armstrong, Peak-to-average power reduction for OFDM by repeated 7 Convenient online submission
clipping and frequency domain filtering. Electron. Lett. 38(5), 246–247
7 Rigorous peer review
(2002)
21. Y Wang, Z Luo, Optimized iterative clipping and filtering for PAPR 7 Immediate publication on acceptance
reduction of OFDM signals. Commun. IEEE Trans. 59(1), 33–37 (2011) 7 Open access: articles freely available online
22. BS Krongold, DL Jones, in Proceedings of 2003 IEEE International Conference 7 High visibility within the field
on Acoustics, Speech, and Signal Processing (ICASSP ’03). PAR reduction in 7 Retaining the copyright to your article
OFDM via active constellation extension, vol. 4, (April 2003), pp. IV–525–8
23. J Tellado, JM Cioffi, PAR reduction in multicarrier transmission systems.
ANSI Document, T1E1.4 Tech. Subcommittee. 4, 1–14 (1998) Submit your next manuscript at 7 springeropen.com

You might also like