Basic Thesis
Basic Thesis
A THESIS SUBMITTED TO
THE DEPARTMENT OF ELECTRONICS AND ELECTRICAL
ENGINEERING
FACULTY OF ENGINEERING
UNIVERSITY OF GLASGOW
IN FULFILMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
By
Maher Assaad
January 2009
Maher Assaad 2009
All Rights Reserved
In Memory of my father Mohammad
This thesis describes the design and implementation of a fully monolithic 10 Gb/s phase
and frequency-locked loop based clock and data recovery (PFLL-CDR) integrated circuit,
as well as the Verilog-A modelling of an asynchronous serial link based chip to chip
communication system incorporating the proposed concept. The proposed design was
implemented and fabricated using the 130 nm CMOS technology offered by UMC (United
Microelectronics Corporation). Different PLL-based CDR circuits topologies were
investigated in terms of architecture and speed. Based on the investigation, we proposed a
new concept of quarter-rate (i.e. the clocking speed in the circuit is 2.5 GHz for 10 Gb/s
data rate) and dual-loop topology which consists of phase-locked and frequency-locked
loop. The frequency-locked loop (FLL) operates independently from the phase-locked loop
(PLL), and has a highly-desired feature that once the proper frequency has been acquired,
the FLL is automatically disabled and the PLL will take over to adjust the clock edges
approximately in the middle of the incoming data bits for proper sampling. Another
important feature of the proposed quarter-rate concept is the inherent 1-to-4 demultiplexing
of the input serial data stream. A new quarter-rate phase detector based on the non-linear
early-late phase detector concept has been used to achieve the multi-Giga bit/s speed and to
eliminate the need of the front-end data pre-processing (edge detecting) units usually
associated with the conventional CDR circuits. An eight-stage differential ring oscillator
running at 2.5 GHz frequency centre was used for the voltage-controlled oscillator (VCO)
to generate low-jitter multi-phase clock signals. The transistor level simulation results
demonstrated excellent performances in term of locking speed and power consumption. In
order to verify the accuracy of the proposed quarter-rate concept, a clockless asynchronous
serial link incorporating the proposed concept and communicating two chips at 10 Gb/s has
been modelled at gate level using the Verilog-A language and time-domain simulated.
Publications
Conference Contributions
1. M.ASSAAD and D. R. S. Cumming, CMOS IC Design and Verilog-A Modeling
of 10-Gb/s PLL-Based Deserializer for Inter-Chip Communication in SOC.,
international symposium on system on chip 2007, Nov. 2007.
ii
Acknowledgments
I am grateful to many people who made this work possible. First of all, I would like to
deeply express my great gratitude for Professor David R. S. Cumming, my PhD supervisor,
for his support throughout this work. I am very grateful to him especially for the ideal
3-years fully funded studentship and the freedom of choosing my own research subject, I
am also grateful to him for his constant encouragement to complete my PhD work.
I would like to thank Dr. Mark Milgrew for his CAD tools help, Billy Allan for his
computer support, Douglas Iron, Karen Phillips, Alexander Ross and Stuart Fairbairn.
I would like to deeply thank my ex-wife Lucie St-Laurent for her endless listening and
encouragement even when she is ill and still suffering from her cancer. I would like to
thank my son Shady for the wonderful time I spent with him in Glasgow and his patience
and understanding for leaving him at home for long hours while I am working in the office
and his mother Lucie in Montreal to continue fighting against her cancer with the painful
radiotherapy and chemotherapy. I would like to deeply thank my mother Fatima Harfoush
for her continuous moral support and encouragement in my private life and to complete my
PhD work.
Finally, I would like to thank my little princess and future wife Dima Elkhadem for her
I am frankly considering myself so lucky having all above great people around me during
iii
Contents
1 Introduction ..................................................................................................................1
1.1 Background and Motivation...................................................................................1
1.2 Research Objectives and Summary of Contributions ............................................4
1.3 Organisation of the Thesis .....................................................................................4
1.3.1 Chapter 2 ........................................................................................................4
1.3.2 Chapter 3 ........................................................................................................4
1.3.3 Chapter 4 ........................................................................................................5
1.3.4 Chapter 5 ........................................................................................................5
1.3.5 Chapter 6 ........................................................................................................5
1.3.6 Chapter 7 ........................................................................................................5
2 Introduction ..................................................................................................................6
2.1 Conventional Bus Limitations ...............................................................................6
2.2 Point-to-Point Links ...............................................................................................8
2.3 The Key Elements of a Link ..................................................................................8
2.4 Point-to-Point Parallel versus Serial Link ............................................................10
2.5 Point-to-Point Serial Link Block Diagram...........................................................11
2.5.1 Serializer or Transmitter ..............................................................................12
2.5.2 Transport Channel ........................................................................................13
2.5.3 Deserializer or Receiver ...............................................................................13
2.6 CDR Based Serial Link Applications ..................................................................14
2.7 CDR Principle and Architectures .........................................................................15
2.8 Properties of NRZ Data Signal ............................................................................16
2.9 Open Loops CDR Architectures ..........................................................................17
2.10 Phase-Locking CDR Architectures ......................................................................18
2.11 Full-Rate and Half-Rate CDR Architectures .......................................................19
2.12 Periodic Data Signal Phase Detector ...................................................................20
2.13 Random Data Signal Phase Detectors ..................................................................23
2.13.1 Full-Rate Linear Phase Detector for Random Data .....................................23
2.13.2 Full-Rate Binary Phase Detector for Random Data .....................................25
2.13.3 Half-Rate Binary Phase Detector for Random Data ....................................27
2.14 Frequency Detectors ............................................................................................28
2.15 CDR Architectures ...............................................................................................31
2.15.1 Full-Rate Referenceless CDR Architecture .................................................31
2.15.2 Dual-Loop CDR Architecture with External Reference ..............................32
2.16 Summary of Prior Art ..........................................................................................33
3 Introduction ................................................................................................................34
3.1 Simplified PLL Block Diagram ...........................................................................35
3.2 PLL time-domain operation in the locked state ...................................................36
3.3 Frequency-domain PLL stability analysis............................................................38
3.3.1 PLL with a simple RC filter and without a charge pump ............................39
3.3.2 Bode stability analysis of the PLL ...............................................................42
3.3.3 Charge pump PLL (CP-PLL) with a simple RC filter .................................45
3.3.4 Bode stability analysis of the charge pump PLL .........................................48
3.4 Phase Noise and Jitter in PLL-Based CDR Circuits ............................................50
3.4.1 Oscillator Phase Noise .................................................................................50
3.4.2 Oscillator Jitter .............................................................................................53
iv
3.4.3 Relationship Between Oscillator Phase Noise and Jitter .............................54
3.5 Jitter in CP-PLL Based CDR Circuits..................................................................55
3.5.1 Jitter Transfer ...............................................................................................55
3.5.2 Jitter Generation ...........................................................................................59
3.5.3 Jitter Tolerance .............................................................................................61
3.5.4 R, C, and Ip Value Optimization Algorithm and Performance Comparison of
the PLL and the CP-PLL ..............................................................................................65
3.6 Summary ..............................................................................................................66
4 Inter Chip Communication and Verilog-A System Modelling ..............................68
4.1 Dedicated Point-to-Point Serial Link ...................................................................69
4.2 Serializer/Deserializer (SerDes) System ..............................................................70
4.2.1 Serializer Principle and time domain simulations........................................72
4.2.2 Deserializer Principle and Time Domain Simulations .................................76
4.2.3 Complete Serial Link (SerDes) Time Domain Simulations.........................79
5 Building Blocks Circuit Design .................................................................................82
5.1 Static and Dynamic Logic Gates Design .............................................................82
5.1.1 CML Circuit Design Advantages and Comparison .....................................83
5.2 Oscillator Fundamentals ......................................................................................86
5.2.1 Negative Feedback Based Oscillator ...........................................................86
5.2.2 Negative Resistance Based Oscillator..........................................................88
5.2.3 Ring Type Oscillator ....................................................................................91
5.3 Voltage-Controlled Oscillators ............................................................................95
5.3.1 Tuning in Ring Oscillators ...........................................................................95
5.3.2 Delay Variation by Positive Feedback .........................................................96
5.4 A Novel Quarter-Rate Early-Late Phase-Detector.............................................100
5.5 A Novel Quarter-Rate Frequency Detector .......................................................103
5.6 Charge Pump Principle ......................................................................................106
5.7 Charge-Pump and Loop Filter Circuit Design ...................................................107
6 PLL-Based CDR Circuit Implementation .............................................................108
6.1 Voltage Controlled Oscillator ............................................................................108
6.2 Novel Quarter-Rate Three-State Early-Late Phase-Detector .............................113
6.3 Novel Quarter-Rate Digital Quadricorrelator Frequency Detector....................115
6.4 Transistor Level Simulation of the Proposed PLL-Based Quarter-Rate Clock and
Data Recovery Circuit ....................................................................................................118
7 Conclusion and Future Work .................................................................................122
7.1 Conclusions ........................................................................................................122
7.2 Future Work .......................................................................................................124
References .........................................................................................................................125
v
List of Figures
Figure 1-1: Example of communication in system on chip, (a) traditional bus-based
communication and, (b) dedicated point-to-point links. ........................................................1
Figure 1-2: Area and power for serial and parallel links versus technology node [81]. ........2
Figure 2-1: SOC based upon a shared bus. ............................................................................6
Figure 2-2: Problems associated with multi-bit shared bus in SOC. .....................................7
Figure 2-3: A basic link with its three components: transmitter, channel, and receiver. .......9
Figure 2-4: Source-synchronous parallel link, the clock is sent along for timing recovery.10
Figure 2-5: Simplified top level block diagram of a serial link. ..........................................11
Figure 2-6: Detector with peak value sampling. ..................................................................15
Figure 2-7: Spectrum of an NRZ data signal. ......................................................................16
Figure 2-8: Open loop CDR architecture using edge detection technique...........................17
Figure 2-9: Generic phase-locking CDR circuit. .................................................................18
Figure 2-10: (a) Full-rate and (b) half-rate data recovery. ...................................................19
Figure 2-11: XOR gate operating with periodic data signal. ...............................................20
Figure 2-12: (a) Sequential PFD detector. Its response for (b) fA > fB, ...............................22
(c) A leading B, and (d) for random data signal. .................................................................22
Figure 2-13: (a) Hogge PD implementation, (b) operation and (c) its CDR circuit. ...........24
Figure 2-14: (b) Alexander PD, (c) waveforms operation and, (d) its CDR circuit. ...........26
Figure 2-15: (a) Half-rate binary PD implementation, (b) use of
quadrature clocks for half-rate phase detection, and (c) its CDR circuit. ............................27
Figure 2-16: Analog quadricorrelator FD for (a) periodic signal and, (b) random data
signal. ...................................................................................................................................29
Figure 2-17: Digital quadricorrelator FD, (a) waveform for fast, (b) for slow,
(c) Implementation. ..............................................................................................................30
Figure 2-18: Referenceless CDR architecture incorporating PD and FD. ...........................31
Figure 2-19: Dual loop CDR architecture with an external reference clock........................32
Table 2-2: Summary of the prior art, including the work done in this thesis. .....................33
Figure 3- 1: Simplified PLL block diagram .........................................................................35
Figure 3-2: RC filter .............................................................................................................39
Figure 3-3: Frequency-domain PLL block diagram ............................................................40
Figure 3-4: Bode diagram of a PLL with a simple RC filter ...............................................44
Figure 3-5: A simple RC filter with a charge pump ............................................................45
Figure 3-6: Frequency domain block diagram of the charge pump PLL .............................47
Figure 3-7: Bode diagram of the CP-PLL with a simple RC filter ......................................49
Figure 3-8: (a) Spectrum of a noiseless sinusoid, and (b) noisy sinusoid ............................50
Figure 3-9: Illustration of phase noise .................................................................................52
Figure 3-10: (a) Cycle-to-cycle jitter, and (b) variable cycles .............................................54
Figure 3-11 (a) Poles and zeros position of the CP-PLL, (b) corresponding jitter transfer
function ................................................................................................................................57
Figure 3-12 Accumulation of cycle-to-cycle jitter in a phase-locked oscillator: (a) actual
behaviour and (b) resultant waveform. ................................................................................60
Figure 3-13: Effect of (a) slow and (b) fast jitter on data retiming ......................................61
Figure 3-14: Example of jitter tolerance mask .....................................................................62
Figure 3-15: Jitter tolerance for CP-PLL .............................................................................63
Figure 3-16: Jitter tolerance for different values of (a) and (b) n. .................................64
Table 3-1: PLL and CP-PLL loop parameters for the optimized value of R, C and Ip. ......66
Figure 3 17: Optimization algorithm for selecting the value of R, C, and Ip. .....................67
vi
Figure 4-1: SerDes system as used in chip-to-chip serial data communication...................69
Figure 4-2: Simplified SerDes block diagram. ....................................................................71
Figure 4-3: A multiplexer (a) and, its timing diagram (b). ..................................................72
Figure 4-4: A tree architecture of the 8-to-1 serializer. .......................................................73
Figure 4-5: Serializer test bench circuit. ..............................................................................74
Figure 4-6: Serializer time domain results, data bit input width is
800 ps (a) and, (b) output bit width is 100 ps. .....................................................................75
Figure 4-7 Block diagram of the 4-to-8 demultiplexer (a), five-latch architecture
of the 1-to2 demultiplexer (b), and timing diagram of the demultiplexer (c). .....................76
Figure 4-8: Deserializer test bench circuit.. .........................................................................77
Figure 4-9: Low pass filter output showing the deserializer PLL locking process (a) and,
(b) DFT of the quarter-rate recovered clock output signal. .................................................78
Figure 4-10: SerDes circuit test bench. ................................................................................79
Figure 4-11: Low-pass filter output voltage showing the serial link locking process
(a and b), and the DFT of the recovered clock in the deserializer (c). .................................80
Figure 4-12: Serial link data input and output (a) and,
serializer data and clock output (b). .....................................................................................81
Figure 5-1: Basic CML gate.................................................................................................82
Table 5-1: MCML and CMOS logic parameters comparison............ Error! Bookmark not
defined.
Figure 5-2: Negative feedback system. ................................................................................86
Figure 5-3: Oscillator and generation of periodic signal .....................................................87
Figure 5-4: (a) Decaying impulse response of a tank,
(b) addition of negative resistance to cancel loss in Rp........................................................89
Figure 5-5: (a) Source follower with positive feedback to create negative
impedance, (b) equivalent circuit of (a). ..............................................................................89
Figure 5-6: (a) Single and, (b) differential ended negative resistance based oscillator. ......90
Figure 5-7: (a) Oscillator and, (b) its equivalent circuit. .....................................................90
Figure 5-8: Differential eight gain stages ring oscillator (a) and
(b) its half circuit equivalent. ...............................................................................................91
Figure 5-9: Waveforms of an eight-stage ring oscillator. ....................................................93
Figure 5-10: Differential current steering ring oscillator and its waveforms.......................94
Figure 5-11: Definition of a VCO (b) ideal and, (c) real. ....................................................95
Figure 5-12: (a) Tuning with voltage variable resistors, (b) differential stage with variable
negative resistance load, (c) half circuit equivalent of (b). ..................................................97
Figure 5-13: Differential pair used to steer current between M1-M2 and M3-M4. ...............99
Table 5-2: Truth table representing all states of the Alexander ELPD. .............................100
Table 5-14: (a) Three points sampling of data by clock, and (b) an Alexander ELPD. ....101
Figure 5-15: (a) Block diagram of the proposed quarter-rate
ELPD, and (b) its operation. ..............................................................................................102
Figure 5-16: Timing diagram for (a) slow and fast data, (b) state representation and,
(c) finite state diagram. ......................................................................................................103
Table 5-3: Truth table of the proposed quarter-rate DQFD. ..............................................104
Figure 5-17: Schematic of the proposed quarter-rate DQFD. ............................................105
Figure 5-18: Charge pump and its output signal in conjunction with a periodic
signal based phase and frequency detector. .......................................................................106
Figure 5-19: Schematic of the charge-pump and loop filter. .............................................107
Figure 6-1: The eight-stage voltage-controlled ring oscillator. .........................................109
Figure 6-2: Post-layout simulation, (a) the clock signals generated by the VCO
and, (b) the VCO's conversion gain. ..................................................................................110
vii
Figure 6-3: Process variations effects on the frequency centre and amplitude of the VCO.
............................................................................................................................................111
Figure 6-4: Layout of the proposed VCO. .........................................................................112
Figure 6-5: The proposed quarter-rate early-late type phase detector
(D0, D90, D180 and D270) are the demultiplexed recovered data....................................113
Figure 6-6: Phase detector output for 10 ps out of phase two signals at its input..............114
Figure 6-7: Layout of the proposed phase detector............................................................114
Figure 6-8: Architecture of the proposed frequency detector. ...........................................115
Figure 6-9: Frequency down pulses generated when the frequency
of the VCO is higher that the frequency of the incoming data. .........................................116
Figure 6-10: Operating range of the proposed frequency detector. ...................................116
Figure 6-11: Layout of the proposed frequency detector. ..................................................117
Figure 6-12: Frequency tuning range of the schematic view of
the VCO for (a) Vbias = 0.75 V and (b) Vbias = 0.6V. ..........................................................118
Figure 6-13: Block diagram of the proposed quarter-rate PLL-Based CDR circuit. .........119
Table 6-3 : CDR characteristics table. ...............................................................................119
Figure 6-14: Frequency detector outputs (a) and output of the
low pass filter showing the PLL locking process. .............................................................120
Figure 6-15: Layout of the complete PLL-Based CDR circuit and its constituting circuits.
............................................................................................................................................121
viii
Chapter 1 Introduction
1 Introduction
Figure 1-1:
1 1: Example of communication in system on chip, (a) traditional bus-based
bus
communication and, (b) dedicated point-to-point
point point links.
Chapter 1 Introduction
Serial links have been widely used for long-haul fibre optic and cable based
communication medium (e.g. WAN, MAN and LAN) and in some computer networks,
where the cable cost and synchronization difficulties make parallel communication
impractical. Serial links have recently found a greater number of applications in consumer
electronics, such as USB (Universal Serial Bus) that connects peripheral electronic systems
to computer, and SATA (Serial Advanced Technology Attachment) which communicates
the computer motherboard with mass storage devices (e.g. hard disk) and PCI-Express
(Peripheral Component Interconnect) normally connect cards (sound, video or other) to the
motherboard. Therefore serial communication has become the solution to higher and more
efficient data transmission in order to meet the demands and trends of the higher capacity
of communication technology. A relatively recent analytical study has been conducted by
R. Dobkin [81] in which comparing in term of power and area serial to parallel links that
have been implemented in various feature size of CMOS technologies. The result of that
study is illustrated in Figure 1-2 and provides the following important remarks:
1. For any particular feature size of the CMOS technology, there is a limiting value of
the link length above which, it is better to implement the link as serial rather than
parallel because it is more advantageous in term of power and area.
2. The limiting value discussed in 1 which defines the frontiers between the two types
of the link implementations is scaling down as the relative scaling down of the
CMOS technology feature size.
Figure 1-2: Area and power for serial and parallel links versus technology node [81].
2
Chapter 1 Introduction
Therefore, for a particular CMOS technology feature size and link length, a serial link may
have the following advantages over the parallel one:
1. A serial link generally occupies less area; hence the communication and area cost
is reduced due to decreased number of pins and occupied area. The saved area can
be used to isolate the link better from its surrounding components and to integrate
more units.
2. The presence of multiple conductors in parallel and close proximity as in bus and
point-to-point parallel links implies cross-talk and especially at higher frequency.
In a serial link the undesired cross-talk is minimized.
3. The skew between the clock and data signals normally occurs in bus and point-to-
point parallel links is irrelevant in a serial link, because the transferring of data is
carried out without a clock signal.
4. A serial link can provides reliable intra/inter chip data communication at multi
Gb/s rate.
3
Chapter 1 Introduction
The processing speed of chips in a PCB (Printed Circuit board), or modules within an SOC
is normally higher than the speed at which those units normally communicate. In this thesis
we attempt to make the communication speed (e.g. 10 Gb/s) few order of magnitude higher
than the processing speed of units (e.g. 1.25 Gb/s) themselves by using a SERDES based
serial link. The contributions of this thesis can be summarized as follows.
A referenceless quarter-rate PLL-based clock and data recovery has been proposed
in which the deserializer does not need a clock reference, the deserializer is
clocked at quarter-rate (2.5 GHz) of the incoming data rate (10 Gb/s) and the input
data stream is 1-to-4 automatically demultiplexed for further processing.
In order to verify the accuracy of the proposed concept, a 10 Gb/s serial link based
chip-to-chip communication medium incorporating the proposed concept has been
implemented using the Verilog-A language and simulated in Cadence.
1.3.1 Chapter 2
In this chapter we first present the limitations and problems associated with the use of the
traditional multi-bit parallel bus and point-to-point parallel link as communication
mediums, and second we present a review of the literature relevant to the design of
different architectures of clock and data recovery circuits.
1.3.2 Chapter 3
The PLL theory will be presented in this chapter and analytical expressions will be
developed. The resulting equations will relate the PLL parameters such as stability and
bandwidth to the low pass filter components values.
4
Chapter 1 Introduction
1.3.3 Chapter 4
This chapter will focus on the current-mode logic transistor level design and optimization
at 10 Gb/s of the different parts of the proposed concept Those parts are the voltage
controlled oscillator, the proposed quarter-rate phase detector and proposed quarter-rate
frequency detector.
1.3.4 Chapter 5
Once all the circuits are designed and optimized at transistor level, their parameters (i.e.
delay, rise and fall times) will be extracted and implemented in their correspondent
Verilog-A description. This chapter will be dedicated to implement a complete 10 Gb/s
serial link in Verilog-A language using the proposed concept.
1.3.5 Chapter 6
This chapter will concentrate on the layout implementation, post-layout transistor level
simulations and characterization of the proposed concept of quarter-rate clock and data
recovery circuit as well as its comprising blocks.
1.3.6 Chapter 7
This chapter draws conclusions and offers some suggestions for future works.
5
Chapter 2 Literature Review
2 Introduction
This chapter contains a review of literature describing the problems associated with the use
of traditional multi line parallel busses as a communication medium in today system-on
system on-
chip (SOC). One solution that has been proposed is the point-to-point
point point source synchronous
parallel link that is briefly described here. An alternative approach that is proposed in this
thesis is clockless serial link. It has the potential to be a high-speed,
high speed, low cost, and skew
insensitive
ive solution to the problems of communication in SOC based upon a shared bus.
Figure 2-1:
2 SOC based upon a shared bus.
1
IP is a creation of the mind with a commercial value, the holder of the IP has exclusive right to it.
Chapter 2 Literature Review
dissipation per binary transition grows and the overall system speed is reduced due to the
increased number of attached units leading
leading to higher capacitive load. A
Ass shown in Figure
Figure
multi bit bus also has other problems such as skew2, crosstalk3 and large area [2].
2-2, the multi-bit
Since the data signal carried by the bus must be synchronized with the global clock signal,
skew has become a primary
primary limit on increasing the operational frequency. Moreover, the
crosstalk between adjacent bus lines causes data signal delay and noise and hence makes
the on-chip
chip communication unreliable. The cost of using a bus is also a serious issue since
they occupy a large area of silicon. Therefore the use of multi
multi-bit buses
es for on-chip
chip
communication with a global clock,
communication, clock will limit further improvement of future SOC.
Figure 2-2:
2 2: Problems associated
as iated with multi-bit
multi bit shared bus in SOC.
SOC
2
Skew is defined as the difference in arrival time of bits transmitted at the same time.
3
Crosstalk refers to the undesired effect created by the transmission
transmission of a signal on one channel in
another channel.
7
Chapter 2 Literature Review
The physical and electrical constraints of busses make them viable for only small scale
systems that incorporate few IPs, such as memory or peripheral busses. For larger scale
systems such as multi-processors or communication switches an alternative and attractive
solution is to replace the bus by a point to point link as a medium of communication. This
approach has advantages from both circuit and architectural points of view. From a circuit
design perspective, a point-to-point link has a higher communication bandwidth than a bus,
due to its reduced signal integrity problems. Moreover, a point-to-point transmission line
offers greater flexibility in the physical construction of the system. From an architectural
perspective, the bandwidth demands of high-speed systems make the shared bus medium
the main performance bottleneck. For this reason, the hierarchical bus has been gradually
replacing single busses as a medium of communication in high performance multi-IP SOC
[3], while the architecture of most high performance communication switches is based on
point-to-point interconnection [4, 5].
There are three key components in a link: the transmitter, the channel and the receiver. The
transmitter converts the digital data stream into an analog signal; the channel is the
transmission medium in which the signal is travelling; and the receiver converts the analog
received signal back to a digital data sequence. Figure 2-3 illustrates the block diagram of a
typical link and its primary components.
The transmitter comprises an encoder and a modulator, while the receiver contains a
demodulator and a decoder. Generally, the bit sequence is first encoded, by inserting some
redundant bits to guarantee signal transition and ease the timing recovery operation. But, in
this work, the data is not coded and sent directly on the channel using a simple non-return-
to-zero (NRZ) format, and the signal levels (high and low) are represented by two different
electrical voltages.
8
Chapter 2 Literature Review
Figure 2-3:
2 3: A basic link with its three components: transmitter, channel, and receiver.
4
A symbol in digital communication is the smallest number of data bits transmitted at one time, it
could be one bit (i.e. 0, or 1), or few bits transmitted simultaneously resulting in symbol rate.
9
Chapter 2 Literature Review
2.4 Point-to-Point
Point Point Parallel versus
ver Serial Link
Point-to-point
point link architecture can be divided into two classes, namely serial links and
parallel links. In a serial link, the clock is embedded in the data stream and has to be
extracted in the receiver from the stream itself using a clock
clock recovery circuit, while in a
parallel link an explicit clock signal is transmitted separately from the data signal over a
single interconnect. Figure 2-4 shows a conventional source-synchronous
source synchronous point-to-point
point point
parallel link. Transmission of all data signals
signals and the reference clock signal is triggered
synchronously by the transmitted clock. Point-to-point
Point point parallel link have been widely used
in short-distance
distance applications such as multi-microprocessor
multi microprocessor inte
interconnection
rconnection [6-10]
[6 ] and
consumer products with extensive
extensive multimedia
multimedia applications [11, 12
12].
]. Improving the
bandwidth of point-to
point to-point
point parallel links is achieved by increasing the bit rate per pin and
integrating a large number of pins into
in the system. The link architecture shown in Figure 2-
3 is a serial link. Parallel on-chip
on chip data streams are serialized into one data sequence.
sequence. As
A
described earlier the receiver uses the signal transitions to recover the embedded clock and
eventually align its local clock edges accordingly for optimal data detection.
Figure 2-4:
2 Source-synchronous
synchronous parallel link, the clock is sent along for timing recovery.
10
Chapter 2 Literature Review
Serial links are the design of choice in any application where the cost of communication
channels is high and duplicating the links in large number is uneconomical. Its application
spans every sector, including short and long distance communication and the networking
markets [13-16]. The principal design goal of serial links is to maximize the data rate
across the link and to extend the transmission range. Although, serial links requires
serializer and deserializer circuits, but they are more advantageous over parallel links
because they occupy less area and they are inherently insensitive to delay and skew.
Exchanging high speed serial data involves three primary components as previously
described: transmitter, channel and receiver. A transmitter gathers low rate parallel data
and serializes it into high speed serial data. The signal is then transported through the
channel to the receiver. The receiver must then demodulate the signal, extract the clock and
demultiplex the data. The received information is fed out of the receiver as low speed
parallel data for further processing as illustrated in Figure 2-5.
11
Chapter 2 Literature Review
The transmitters role is to accept several parallel data streams with a specified rate and
then serialize and drive the data into the channel. As an example, a 10 Gb/s serializer
would require eight parallel streams of 1.25 Gb/s each. Serializing involves multiplexing
the data into an ordered bit stream using a NRZ format.
Driving the channel requires adding a 50 output load amplifier, or in certain cases may
require adding a sophisticated circuit that is capable of driving an optical driver. In most
communication systems, the data is first encoded. The encoding process may include
compression, encryption, error checking and framing [17]. Another important role of the
encoder is to introduce additional transitions to the data stream to help a phase-locked loop
(PLL) in the receiver acquire the correct clock frequency of the transmitter. The 8B/10B
encoding scheme is the most popular and it guarantees at least one transition every 5 bits
[18]. A PLL in the transmitter clocks the multiplexer and the multiplexer then performs the
serialization function. Multiple clock frequencies are needed in order to properly perform
the multiplexing operation. The PLL in the transmitter is responsible for generating the
multiple clock frequencies, often known as the frequency synthesizer or the clock
multiplier unit. The frequency synthesizer is required to have low phase noise and jitter to
generate a similarly low phase noise data stream. The PLL locks the phase of an internal
high speed clock to an externally supplied low speed reference. For example, a 10 Gb/s
system may have a 156.25 MHz reference clock, and a 10 GHz internal clock. The PLL
must then compare and match the two frequencies after dividing the internal clock by 64.
The multiplexer is generally unable to drive the transmission medium directly, so a line
driver is needed [19, 20]. The line driver matches the internal circuit impedance to the
transmission line impedance and amplifies the signal to a suitable voltage swing. An
important figure of merit of the transmitter is the output data jitter. The internal voltage-
controlled oscillator (VCO), the multiplexer and all other circuits create and add jitter to
signal. The VCO jitter is normally partially filtered out by the PLL.
12
Chapter 2 Literature Review
The channel carries the data signal from the transmitter to the receiver and could be
electrical, optical or a combination of both. For long-haul communications the channel is a
dominant source of phase noise and jitter. However for short-distance communications, the
channel is considered as a negligible source of noise and jitter.
The receiver must extract a clock from a noisy and jittered high frequency signal, and the
extracted clock is then used to sample the received data stream. This process is called clock
and data recovery (CDR) and it is difficult because the extraction process is based on the
data signal transitions, the presence of which is not guaranteed. A line amplifier with a 50
input impedance amplifies the signal to a suitable level for internal circuits while
minimizing the distortion. Noise injection from this amplifier must be minimized because
the received data signal is already saturated with jitter coming from the transport channel.
If the data is of the NRZ type, then the PD must also be able to handle random data that
has random transition locations. Moreover, the key parameters of the PLL must be tuned to
a signal with high noise content as compared to the PLL in the transmitter which has a low
noise reference at its input. Additional circuits are needed to sample the data using the
recovered clock unless the PD does so automatically. In some cases, a low frequency
reference clock may be used to bring the frequency of the receivers VCO close to the data
rate before clock extraction occurs.
The architecture with a reference clock enhances the operation range of the receivers PLL.
Its drawback is that two separate PDs are needed and a circuit that can switch between
them is necessary. This introduces two loops sharing common components which must be
able to operate independently. A common component in a dual loop PLL is a lock detector
circuit that determines if phase lock is lost in the data loop. If lock is lost the loop switches
back to the external reference loop.
13
Chapter 2 Literature Review
The dual loop architecture is useful in a high noise environment where the data jitter can
cause the PLL to become unstable. Once the clock is extracted from the serial signal, the
data can then be demultiplexed through a series of multiplexers at decreasing clock rates.
For example, in a 10 Gb/s system the first re-sampled data would pass through a 1-to-2
demultiplexer driven by a 5 GHz clock. The second stage would consist of two 1-to-2
demultiplexers driven by a 2.5 GHz clock, and so on. If a multiphase clock is used, then
multiple samples can be taken with separate samplers. This allows the use of a clock at a
fraction of the data bit rate, hence reducing the power consumption associated with clock
switching.
Much of this work focuses on the design of circuits and architecture development that will
eventually leads to the implementation of a 10 Gb/s intra-chip and inter-chip high-speed
interconnections in system-on-chip (SOC). The architectures and circuits presented here
have a wider applicability to any high-speed communication system; such applications
include the following [21]:
LANs (local area networks), for broadband data communication links between
computers over optical fibers such as Fiber-Distributed Data Interface (FDDI).
14
Chapter 2 Literature Review
2.7 CDR
DR Principle and Architectures
A figure of merit in data signal detection process in the presence of noise is called the
signal-to-noise
noise ratio
ratio (SNR); the SNR is dependent
dependent on the location of the sampling instance.
If the sampling point or instant is synchronized such that the peak value of the bit pulse is
sensed, then the value of the SNR factor is maximal as illustrated in Figure 2--6.
Figure 2-6
6:: Detector with peak value sampling.
15
Chapter 2 Literature Review
When the incoming data has a spectral energy at the clock frequency, a synchronous clock
can be obtained by
by passing the data stream through a band-pass
band pass filter, often realized as an
LC tank or surface acoustic wave (SAW) device, tuned to the nominal clock frequency. In
most signaling formats such as NRZ, the data signal has no spectral energy at the clock
frequency
ency making it necessary to use the clock recovery process. The power spectral
density of an NRZ data signal is given by the following relationship.
sin( .Tb / 2) 2
P ( ) = Tb [ ] (2.1)
.Tb / 2
Due to the lack of a spectral component at the bit rate of NRZ format, a clock recovery
circuit may lock to spurious signals or simply not lock at all. Thus, NRZ data usually
undergoes a non-linear
non linear operation at the front end of the circuit so as to create a frequency
component at the bit rate. A common approach is to detect each transition and generate a
corresponding pulse, this technique known as the edge detection.
16
Chapter 2 Literature Review
Figure 2-8:
8: Open loop CDR architecture using edge detection technique.
17
Chapter 2 Literature Review
2.10 Phase-Locking
Phase Locking CDR Architectures
In this approach, the clock and data recovery is done by synchronizing the random data to
a clock signal generated by a voltage controlled oscillator (VCO)
(VCO) in a phase locked loop.
loop
During each data transition, the location of that transition with respect tto
o the clock edge is
detected. If the data leads the
the clock, the clock speed is increased.
increased. If the data lags
lags the clock,
the clock is slowed down. If the zero crossings of the data and the clock coincide, the clock
frequency is kept constant
constant to ensure phase lock.
lock. Figure 2-9 shows a generic CDR circuit.
The VCO generates a clock signal. The phase and frequency of this signal is compared to
that of the incoming data in the phase detector, generating an error signal that is passed
through the charge pump and the low
low pass filter to set the voltage required by the VCO to
oscillate at the
the frequency of interest. Phase-
Phase-locking
locking of the clock to the data means that their
phases are different by a small but constant offset. The generated clock signal is also used
to retime the data in the decision circuit. As the incoming data is regenerated in this block,
its additive noise is suppressed while the amplitude is significantly magnified.
Figure 2-9
9:: Generic phase-locking
phase locking CDR circuit.
18
Chapter 2 Literature Review
2.11 Full-Rate
Full Rate and Half-Rate
Half Rate CDR Architectu
Architectures
Phase-locking
locking CDR architectures can be divided into two major groups
groups; full--rate
rate and half-
half
rate. In a full-rate
full rate circuit the location of the data transition is compared to the falling
falling or
rising
ising edge of the clock which has a frequency equal to the data rat
ratee as illustrated in Figure
2-10(a).
(a). Therefore, data retiming can be performed using flip-flops
flip flops that operate either on
rising or falling edge of the clock signal. In a half-rate
half rate circuit, the location of data
transition is compared to that of both the rising and falling edges of clock as shown
shown in
figure 2-10
10(b).
(b). For this architecture the clock frequency is equal to one half of the data
rate, and the retiming of the data signal is performed using flip-flops
flip flops triggered on both the
falling and rising edges of the clock signals. The main advantage of using half-rate
half rate CDR
circuit is the reduction of the clocking frequency by a factor of two. Hence, reducing the
dynamic power consumption associated
as iated with the switching activity of the clock. The DC
power dissipation is also reduced because the biasing current is less for circuits working at
lower operating frequencies.
frequencies
Figure 2-10:
2 : (a) Full-rate
Full rate and (b) half-rate
half rate data recovery.
19
Chapter 2 Literature Review
Figure 2-11:
2 : XOR gate operating
operating with periodic data signal.
20
Chapter 2 Literature Review
Although this simple approach proves to be useful for applications where the two inputs
have identical frequencies and different phases, it falls short in providing frequency error
information as the two inputs frequencies start to grow apart from each other. The reason is
that if the two frequencies are not equal, the detector generates a beat frequency with an
average value of zero (Figure 2-11(c)). The beat signal can still provide efficient
information about the phase and frequency difference if the two frequencies are slightly
different. To improve the capture range of the phase detector, phase locked loop circuits
use additional means of frequency acquisition.
A circuit that can detect both phase and frequency difference is extremely useful because it
significantly increases the acquisition range and lock speed of PLLs. The sequential phase
and frequency detector (PFD) proves to provide a large range for periodic waveforms [22].
Figure 12-2 shows the implementation of this circuit and the corresponding waveforms
when the two inputs have different frequencies and phases. As shown in Figure 2-12(b), if
the frequency of input A is greater than of input B, then the PFD produces positive pulses
at QA, while QB remains zero. Conversely, if fA < fB, positive pulses appear at QB while QA
= 0. If fA = fB, then the circuit generates pulses at either QA or QB with a width equal to the
phase difference between the two inputs as illustrated in Figure 2-12(c). Thus the average
value of difference (QA-QB) is an indication of the frequency or the phase difference
between A and B. The sequential PFD is a major block used for phase detection in
frequency synthesizers and clock generators. Its compact and power-efficient structure
makes it attractive for low power applications. However, this circuit cannot be used to
provide phase error information for random data because in contrast to periodic data a zero
crossing at the end of each bit is not guaranteed. Consecutive ones and zeros are very
likely to appear in a random sequence hence producing erroneous pulses at QA and QB.
If for instance, the PLL is in locked state the clock frequency and the data rate will be the
same, and the clock edges will be in the middle of the data bits, hence no error pulses will
be required to adjust the phase and frequency of the VCO clock signal. However, the
sequential PFD produces pulses at QA and QB driving the VCO clock signal away from its
locked state. Therefore this type of PFD is not suitable for random data sequences.
21
Chapter 2 Literature Review
Figure 2--12:: (a) Sequential PFD detector. Its response for (b) fA > fB,
(c) A leading B, and (d) for random data signal
signal.
22
Chapter 2 Literature Review
Binary data is commonly transmitted in the NRZ format. In this format each bit has
duration Tb (bit period), is equally likely to be zero or one, and is statistically independent
of other bits. A NRZ data signal has two properties that make the clock recovery task
difficult. First, data may exhibit long sequences of consecutive ones or zeros, demanding
the clock recovery circuit to remember the bit rate during such an interval. This means
that, in the absence of data transitions, the clock recovery circuit should not only continue
to produce clock, but also cause only a negligible drift in the clock frequency. Second, the
spectrum of NRZ data has nulls at frequencies that are integer multiples of the bit rate. Due
to the absence of a spectral component at the bit rate in the NRZ format, a CDR circuit
may lock to spurious signals or simply may not lock at all. Phase detectors operating with
random data sequences are generally categorized in two groups, linear and binary. In a
linear phase detector, the phase error signal is linearly proportional to the phase difference,
falling to zero in the locked condition. In a binary phase detector, an early or late (binary)
signal is generated in response to a phase difference between the clock and data.
In a linear PD, such as the one proposed by Hogge [23], the phase error information is
generated at each data transition and produced by taking the difference of two pulses. One
of them is width modulated the width is linearly proportional to the phase difference
between the clock and data, whereas the other pulse has a fixed width. Gate-level
implementation of Hogges phase detector is shown in Figure 2-13. The NRZ input data
signal is sent through two D-type flip-flops. The first flip-flop samples the data signal on
the rising edge of the clock, whereas the second flip-flops samples the output of the first
one on the falling edge of the clock. If the three signals, Din, A, and Dout are applied to two
XOR gates, the resulting output signals will have the properties of a linear phase detector.
The Error output signals will appear at each data transition with a width proportional to the
phase difference between the clock and the data. The reference output will always have
pulses as wide as half the clock period. An important feature of the Hogge PD is the
automatic retiming of the data sequence.
23
Chapter 2 Literature Review
In the lock condition, the clock signal zero crossings will appears in the middle of the bits,
meaning that the bits are sampled at their optimum points.
Figure 2-13:
2 : (a) Hogge PD implementation, (b) operation and (c) its CDR circuit.
circuit
24
Chapter 2 Literature Review
If A = B C, clock is early.
If A B = C, clock is late.
Using the above observations, the three samples can be used to produce a phase error in a
CDR circuit. The early signal can be formed as B C and the late signal is generated as
A B. The desired phase error can be obtained by subtracting the early signal from the
late signal. Figure 2-14(d) shows a CDR circuit employing an Alexander phase detector.
The XOR gate outputs drive voltage-to-current converters so that the two signals can be
summed in the current domain, and the result is applied to the loop filter. The high gain of
the Alexander PD yields a small phase offset in the locked condition. CDR circuits using
similar PD are described in [25-27].
25
Chapter 2 Literature Review
Figure 2-14:
2 : (b) Alexander PD, (c) waveforms operation and, (d) its CDR circuit.
26
Chapter 2 Literature Review
2.13.3 Half--Rate
Rate Binary Phase
Phase Detector for Random Data
Figure 2-15:
2 : (a) Half-rate
Half rate binary PD implementation, (b) use of
quadrature clocks for half-rate
half rate phase detection, and (c) its CDR circuit.
circuit
27
Chapter 2 Literature Review
Data communication standards require operation at a precise data rate. Therefore the
frequency of the VCO should be equal to the data rate. However, the VCOs in the CDR
circuits are generally designed with a large tuning range to accommodate for the process
and temperature variations. On the other hand, the phase-locking CDR circuits have
narrow capture range. This range is primarily determined by two factors: the PLLs
bandwidth and the phase detector topology. The loop bandwidth is a communication
standard dependent and does not exceed normally a few MHz. The capture range of the
linear PD is a fraction of one percent of the incoming data rate, and it is typically a few
percent for binary a PD. Therefore the CDRs capture range is much smaller than the
VCOs tuning range. For this reason, it is unlikely that CDR circuits will acquire lock to
the data when the circuit turns on and the VCO starts oscillating at a frequency that is very
different from the data rate. This limitation calls for an aided acquisition mechanism.
Various frequency detection techniques have been used that operate with or without a
reference signal. The idea is that as the circuit is turned on, the frequency detector (FD)
pushes the VCO frequency close to the data rate. When the frequency difference between
the VCO and the data rate is small enough to fall into the capture range of PD, the FD is
then disabled and the PD takes over. A frequency detector must generate an output the
average of which represents the polarity and magnitude of the frequency difference at its
inputs. Considering the block diagram of the circuit shown in Figure 2-16, and assuming
for instance that all input signals are periodic, example:
x1 (t ) = A1 cos 1t ,
x2 I (t ) = A2 cos 2t , (2.2)
x2Q (t ) = A3 sin 2t ,
A1 A2
x1 (t ) x 2 I (t ) = ( )[cos(1 + 2 )t + cos(1 2 )t ]
2
(2.3)
AA
x1 (t ) x 2Q (t ) = ( 1 2 )[sin(1 + 2 )t + sin(1 2 )t ]
2
28
Chapter 2 Literature Review
A1 A2
x A (t ) = ( ) [cos(1 2 )t ]
2
(2.4)
AA
xB (t ) = ( 1 2 ) [sin(1 2 )t ]
2
A1 A2 2
xC (t ) ( ) (1 2 ) = (2.5)
2
Eq. 2.5 shows that the signal xC(t) issued from the FD is directly proportional to the
frequency difference at its inputs ()
( and changes sign with that difference. The topology
of Figure 2-16(a)
2 16(a) is called a quadricorrelator
quadricorrelator [28]. This technique requires that the signal
x1(t) contains a spectral line or component, thus circuit must then be proceeded by an edge
detector for operation with an NRZ random data signal (Figure 2-16(b)).
2 16(b)).
Figure 2-16
16:: Analog quadricorrelator FD for (a) periodic signal and, (b) random data signal.
29
Chapter 2 Literature Review
Figure 2-17:
2 : Digital quadricorrelator FD, (a) waveform for fast, (b) for slow,
(c) Implementation.
30
Chapter 2 Literature Review
After studying the design and analysis of PD and FD for periodic aand
nd random signal
complete CDR architectures can now be developed. A robust architecture must perform the
following operations: phase and frequency acquisition to ensure lock despite process and
temperature variations of the VCO frequency and; data retiming inside the phase detector
to avoid systematic skew [28].
2.15.1 Full--Rate
Rate Referenceless CDR Architecture
Using random data based FD eliminates the need for external reference frequencies. Figure
2-18
18 depicts a referenceless architecture containing two loops: a frequency
frequency loop employing
a digital quadricorrelator FD from Figure 2-16,
2 16, and a phase loop incorporating one of the
phase detectors studied in Sections 2.13.1-2.13.3.
2.13.1 2.13.3. Upon start-up
start up or the loss of phase lock,
the FD produces a DC voltage that drives the VCO frequency
frequency toward the input data rate.
When the frequency error is small and falls within the capture range of the phase loop, the
PD then takes over, phase-locking
phase locking the clock to the data.
Figure 2-18
18:: Referenceless CDR architecture incorporating PD and FD.
31
Chapter 2 Literature Review
Figure 2-19:: Dual loop CDR architecture with an external reference clock.
32
Chapter 2 Literature Review
State of the art works on CDR circuits are summarized in Table 2.2. The indicated data rate
corresponds to the data speed at the CDR input. The clock frequency is the frequency of
the clock signal that is used to transmit the data and has to be extracted by the CDR circuit.
Table 2-2: Summary of the prior art, including the work done in this thesis.
We presented in this chapter the problems and limitations associated with the use of busses
as a medium of synchronous communication in todays complex SOC. To alleviate
previous problems an asynchronous link based on PLL CDR circuits has been proposed as
a high performance alternative solution. Furthermore, we reviewed the current state of the
art of PLL-based CDR; from the literature it is apparent that there is considerable scope of
improvement in their designs for asynchronous link based communication in SOC. This
thesis therefore presents a detailed study of quarter-rate PLL-based CDR circuit.
33
Chapter 3 Theoretical Background
3 Introduction
In this chapter, a mathematical development of the PLL will be carried out covering the
following subjects:
1. Simplified time-domain analysis of the PLL in the locked state. In other words,
studying the tracking property of the PLL, in which any change in the frequency
input will be tracked by the output through the phase error signal [Eq. 3.10]
3. Same as in (2), but for a charge pump PLL (CP-PLL) [Eq. 3.33-3.35].
4. Stability parameters comparison of the PLL and the CP-PLL [Table 5.1].
5. CDR jitter specifications and its relation to the PLL parameters [Eq. 3.49, & 3.58].
A phase-locked loop (PLL) is a circuit that synchronizes the phase and frequency of a
signal generated by a local oscillator with that of a reference signal, by means of the phase
difference between the two signals. PLLs are primarily used in communication systems.
For example, they recover clock signals from digital data signals, recover the carrier from
satellite transmission signals, perform frequency and phase modulation/demodulation, and
synthesize exact frequencies for receiver tuning [47].
Chapter 3 Theoretical Background
As shown in figure 3-
3-1,
1, the PLL circuit consists basically of three blocks:
Figure 3-
3 1:: Simplified PLL block diagram
diagram.
1. Phase detector (PD), a simple one can be realized using an analog multiplier. Since the
PD is performing a multiplication, hence the output signal vd will have the following
form:
Where f is a function of the phase difference between the reference and the oscillator
signals and kpd is the conversion gain of the PD measured in units of volt per radian
(V/rad).
35
Chapter 3 Theoretical Background
Where 0 is the free running angular frequency, corresponding to vf = 0, and kvco is the
VCO conversion gain, expressed in units of radians per volt per second (rad/V.sec).
In this section, the time-domain operation of the PLL will be studied. When the PLL is
operating in the synchronized state, the angular frequency of both the input reference
signal (ref) and the VCOs output signal (vco) will be equal. Let the following
expressions represent, respectively, the input reference and the VCO output signal:
[ ]
v f = k pd {sin ( vco ref )t + ( 0 0 ) } = k pd sin e (t )
x0 y 0 (3.5)
where k pd = and e (t ) = ( vco ref )t + ( 0 0 )
2
At the start, there is no voltage (vf) applied to the input of the VCO, thus (vco = ref). As
shown in equation 3.5, the signal issued from the filter vf carries information about the
frequency error (vco-ref) and the phase error (0-0), between the input (reference) and
the output (VCO) signals.
Since; 1 sin e 1 e then k pd v f k pd and, based on the Eq. 3.2, the angular
frequency (vco) of the VCO will be limited by the range [min, max] such that:
36
Chapter 3 Theoretical Background
Since the VCOs angular frequency is limited by the range [min, max], then in the locked
state, there exist a value in that range which is equal to the reference angular frequency
(ref). Therefore, based on the equation 3.1, the following double equations are fulfilled in
the locked state of the PLL:
ref 0 vco 0
vf = = (3.7)
k vco k vco
The above equation shows that in the locked state any change that may occur on (ref) or
(vco), will be tracked by the PLL, and the filter voltage (vf) will change accordingly. As an
example, if for instance a random signal decreases the VCO frequency by an amount of
(vco), then the filter voltage (vf) will be increased by an amount of (vf) and the VCO
will be controlled by the total voltage (vf + vf) and hence the VCO angular frequency will
be increased to compensate for the action of the disturbance. The result will then be:
Therefore, as soon as the VCO angular frequency is driven away from the reference one by
a random signal, or a temperature variation, a phase error signal is generated and hence a
voltage will also be generated, forcing the VCO to be synchronized with the reference
angular frequency. As shown in Eq. 3.5, the signal issued from the filter is giving by:
[ ]
v f = k pd {sin (vco ref )t + ( 0 0 ) } = k pd sin e (t ) (3.9)
For a small phase error, the last equation can be simplified to:
37
Chapter 3 Theoretical Background
v f = k pd . e (t ) then
v f = k pd . e (t )
vco
v f = and (3.10)
k vco
vco
e (t ) =
k pd .k vco
The last expression of the phase error signal shows that the VCO is forced to shift its
angular frequency to become identical to the reference one through the phase error signal
e(t).
In the previous section, an elementary time-domain analysis of the PLL in the locked state
was performed, and an approximation expression was developed relating the phase error
signal to the required change of the VCOs angular frequency in order to maintain
synchronization. Since the PLL is a feedback loop system, a stability analysis of that
system is necessary in order to guarantee its stability; otherwise the PLL may oscillate and
never reach the required steady state. In this section a frequency domain analysis will be
carried out to determine the stability limits and conditions of the PLL circuit, as well as a
calculation of the low pass filter components (i.e. R and C) based on the previous
conditions results. In order to transform the time domain PLL block diagram of figure 3-1
to the frequency domain, a simple case will be considered, and its results will be
generalized.
38
Chapter 3 Theoretical Background
Figure 3-2:
3 2: RC filter.
filter
dv f
v d (t ) = RC + v f (t ) (3.11)
dt
V f (s) 1 1
TF filter (s ) = = = (3.12)
Vd ( s ) RCs + 1 s + 1
3 2 and (=RC
Where TF(s) is the transfer function of the RC filter of figure 3-2 RC) is the time
constant of the filter. By integrating with respect to time Eq. 3.2, one can obtain
t t t t
0
vco dt = 0 dt + k vco v f (t ) dt Then, vco (t )t = 0t + kvco v f (t ) dt and
0 0 0
t
[vco (t ) 0 )] t = kvco v f (t ) dt , hence
0
39
Chapter 3 Theoretical Background
t
vco (t ) = kvco v f (t ) dt (3.13)
0
k vco
vco ( s) = V f (s)
s
vco ( s ) k vco
Rearranging the last equation, we obtain: TFvco ( s ) = =
V f (s) s
(3.14)
TFvco(s) is the transfer function of the VCO. Taking the Laplace transform of the first
equation of 3.10, one can obtain:
V f ( s)
V f ( s) = k pd e (s) , then TF pd ( s ) = = k pd (3.15)
e ( s)
TFpd(s) is the transfer function of the phase detector. We now have the s or frequency
domain transfer function of all the PLL blocks; therefore we can redraw the PLL block
diagram in frequency domain.
Figure 3-3:
3 3: Frequency-domain
Frequency domain PLL block diagram
diagram.
40
Chapter 3 Theoretical Background
Based on the definition of the feedback system in control theory, the open loop transfer
function G(s) of the PLL is giving as follow:
1 k k pd k vco
G ( s) = TFpd ( s) TF filter ( s) TFvco ( s) = k pd vco =
s + 1 s s( s + 1)
Setting k pd k vco = k which is measured in sec-1, the open loop transfer function G(s)
becomes:
k pd k vco k
G ( s) = = (3.16)
s( s + 1) s( s + 1)
And the closed loop transfer function of the PLL, H(s) will be defined as follow:
k
G (s) s ( s + 1) k /
H (s) = = = 2 (3.17)
1 + G ( s) 1 + k s + s / + k /
s ( s + 1)
Since the denominator of the function H(s) is a polynomial of second order, the loop is
second order and it has the following general form:
n2
H ( s) = (3.18)
s 2 + 2 n s + n2
1 k
2 n = and n2 =
(3.19)
1
=
2 k
Where () is the damping factor of the loop and is unitless. (n) is the natural angular
frequency of the loop and is measured in radian per second (rad/s). From Eq. 3.19, we
notice that increasing the factor , and hence the loop stability, requires a decreasing of
design parameters (k) and ().
41
Chapter 3 Theoretical Background
For convenience, we rewrite the open loop transfer function of the PLL without a charge
pump by substituting s with (j).
k
G ( j ) =
j (1 + j )
The function G(j) is a complex function. Its magnitude and phase are giving as follow:
k 1
G ( j ) = and tan = (3.20)
(1 + ) 2 2
The angular frequency for which |G(j)| = 1 is called the cut-off frequency of the PLL and
is denoted by (3dB).
k
G ( j 3dB ) = = 1, then k 2 = 23dB (1 + 23dB 2 )
3dB (1 + 2
3 dB )2
Rearranging last equation and using equation 3.19, one can obtain:
43dB + 4 2 n2 23dB n4 = 0
Solving the last equation with respect to (3dB), the cut-off frequency of the PLL can be
determined in terms of the damping factor ():
3 dB = n 1 + 4 4
2 2 (3.21)
Substituting equation 3.21 in 3.20, one can obtain the phase of the open loop transfer
function G(j) at the cut-off frequency 3dB which correspond to the phase margin
(margin).
42
Chapter 3 Theoretical Background
1 2 n 2
= arctan( )= arctan( )= arctan[ ]
2 3dB 2 3dB 2 1 + 4 4
2 2
2
Then, = arctan[ ]
2 1 + 4 4
2 2
The phase margin of the function G(j) is defined as: m arg in = ||G|=1 +180 o
2
Then, m arg in = arctan[ ] (3.22)
2 1 + 4 4
2 2
The PLL is normally stable when the phase margin is equal to 45o and higher. Thus to find
the corresponding value of (), the following equations should be solved with respect to
():
2
= arctan[ ]
4 2 1 + 4 4 2 2
(3.23)
2
1=
1 + 4 4 2 2
The solution resulting from solving 3.23 is =0.42. Figure 3.4 illustrate the Bode diagram,
and it corresponds to the amplitude and phase of the open loop transfer function G(j).
43
Chapter 3 Theoretical Background
Figure 3-4:
3 4: Bode diagram of a PLL with a simple RC filter
filter.
Though this filter is simple, it does not allow independent optimization of the bandwidth
and damping factor of the PLL. Reducing, for instance, the bandwidth ((3dB
3dB) in order to
reduce the
he noise of the output signal requires a reduction of the damping factor ()
( and
hence compromising the stability of the PLL.
44
Chapter 3 Theoretical Background
In a previous section, a PLL without a charge pump has been studied to determine its
( , and the natural angular frequency (n)
principal parameters such as the damping factor (),
in terms of its design parameters (k) and () as illustrated in Eq. 3.19. Bode stability
analysis has been also performed on the same PLL and an analytical expression for its
bandwidth (-3dB) and its phase margin (margin) in terms of the damping factor ()
( have
been found. In this section, a charge pump PLL with a simple
simple RC filter will be studied and
compared to its counterpart without the charge pump. Let us consider the simple RC filter
with a charge pump of current (Ip) as shown in figure 3-5.
3
Figure 3-5:
3 5: A simple RC filter with a charge pump
pump.
45
Chapter 3 Theoretical Background
With this type of circuit, the linear expression of the phase detector will be modified by
incorporating current flowing into or from the filter. Based on the first expression of Eq.
3.10 and figure 3-5, one can write:
e (t )
id (t ) = I p (3.24)
2
Where (id(t)) is the current that has been delivered (pumping) or taken from (sinking) the
filter in response to a phase error (e(t)). The sign in the last expression represent the
polarity of the frequency difference-being positive or negative depending on the difference
between the reference and the VCO signals. Considering the filter of the figure 3-5, the
voltage at its output can be written as follow:
v f (t ) = id (t ) Z filter (3.25)
Where, (Zfilter) is the total impedance of the filter. Taking the Laplace transform of the
equation 3.25, we obtain the following equation:
e (s) 1 I p RCs + 1
V f ( s ) = I d ( s ) Z filter = I p (R + )= ( ) e ( s) (3.26)
2 sC 2 sC
Thus, the transfer function of the combined phase detector and the charge pump filter
blocks will be:
V f (s) Ip RCs + 1 Ip s + 1 I p R s + 1
TF pd & filter = = ( )= [ ]= (3.27)
e (s) 2 sC 2 s ( / R ) 2 s
The transfer function for the VCO is unchanged to that of the previous section, we rewrite
it for convenience:
vco ( s ) k vco
TFvco ( s ) = =
V f ( s) s
46
Chapter 3 Theoretical Background
Based on the previous development, one can redraw the frequency domain block diagram
of the CP--PLL
Figure 3-6:
6: Frequency domain block diagram
diagram of the charge pump PLL.
PLL
The open G(s) and closed loop H(s) transfer functions of the PLL will be giving as follow:
I p R s + 1 k vco s + 1
G ( j ) = TF pd + cp TFvco = =k 2 (3.28)
2 s s s
k / (1 + s )
H ( s) = (3.29)
s 2 + ks + k /
Ip k
Where k = k vco R , setting k = 2n and = n2
2
(k / n2 ) s + 1 ks + n2
G ( s) = k = (3.30)
(k / n2 ) s 2 s2
n2 + 2n s
H (s) = (3.31)
s 2 + 2 n s + n2
47
Chapter 3 Theoretical Background
Substituting s by (j) in the open loop transfer function G(s) of the charge pump PLL (CP-
PLL), the magnitude of that function will be giving as follow:
k 2 2 + n4
| G ( j ) |= (3.32)
4
Solving the equation |G(j)|=1 with respect to () give us the cut-off frequency (-3dB) of
the CP-PLL.
3 dB = n 1 + 4 4
+ 2 2 (3.33)
m arg in = arctan{ 2 1 + 4 4
+ 2 2 } (3.34)
For convenience we rewrite the equations describing the main characteristics of the CP-
PLL in term of the design parameters R, C and Ip, and in term of the stability parameter
such as the damping factor and the natural frequency (n). The open G(s) and closed
loop H(s) transfer functions, the cut-off frequency (-3dB) and the phase margin (margin) are
giving respectively by the following relationships:
ks + n2 n2 + 2 n s
G ( s) = k , H (s) = (3.35)
s2 s 2 + 2 n s + n2
3 dB = n 1 + 4 4
+ 2 2 , m arg in = arctan{ 2 1 + 4 4
+ 2 2 }
Ip k
Where, k = k vco R, k = 2 n , = n2 and = RC
2
48
Chapter 3 Theoretical Background
Figure 3-7:
3 7: Bode diagram of the CP-PLL
PLL with a simple RC filter
filter.
49
Chapter 3 Theoretical Background
The design of reliable communication circuits and systems normally concerns the
reduction of phase noise and jitter. These two undesirable effects are closely related, and
sought to be considered in the context of oscillators and PLLs.
In order to study and estimate the impact of phase noise on an oscillators output, let us
consider, for instance an ideal oscillator producing a sinusoidal signal at frequency
0 = 2f0 = 2/0. Its output waveform can be expressed as Vout(t) = V0 cos0t and its
frequency spectrum-as
spectrum as illustrated in Fig. 3- consists of two impulses at = 0. Since
3-8(a)-consists ince
this sinusoid is an ideal one, its zero-crossing
zero crossing points occur at integer multiples of
(T0). Also,, the spectrum indicates that the signal carries no energy at any frequency other
than (0).
Figure 3-8:
8: (a) Spectrum of a noiseless sinusoid, and (b) noisy sinusoid.
sinusoid
In real oscillator, its internal devices and the circuits surrounding it will randomly vary its
oscillation as if the oscillator occasionally operates at frequencies other than 0- as
illation period -as
shown in Fig. 3-8(b).
3 8(b). In this case, the zero-crossing
zero crossing points do not necessarily occur at
50
Chapter 3 Theoretical Background
integers multiples of (T0) and the output spectrum spreads out around the peaks, revealing
that the signal carries finite energy at (0 + ).
In order to find a mathematical expression for this phase noise, we suppose that the
amplitude of the output signal is constant and unaffected by the noise. Since the
instantaneous frequency varies randomly, the oscillator signal can be written as:
Where n(t) is a small random phase component with zero average. Thus, the zero crossing
points of the signal Vout(t) occur randomly because they appear at instants given as:
k ( / 2) n (t )
t= (3.37)
0
Where (k) is an odd number. Equivalently, the oscillation period varies from one cycle to
the next. The frequency spectrum of the signal n(t) is called the phase noise and is denoted
by Sn. Since n(t) is typically very small, we therefore can assume:
Eq. 3.39 dictates that the spectrum of Vout consists of impulses at = 0 and that the
spectrum of n(t) translated to 0 as illustrated in Fig. 3-8(b). To quantify the phase
noise Sn, we measure the average power carried in f = 1 Hz in the phase noise area of
Fig. 3.8(b). Since the intensity of n is frequency dependant, the power must be measured
at a consistent specified frequency offset () from (0) as shown in Fig. 3-9. Also, the
measured power in 1 Hz at must be normalized to the carrier power, Pc (i.e. the power
51
Chapter 3 Theoretical Background
carried by the impulses at 0), this normalization allows comparison between different
2
oscillators. Based on Eq. 3.39, the carrier power is equal to Vrms = V02 / 2.
Figure 3-9:
3 9: Illustration of phase noise.
noise
P1Hz |
Re lative Phase Noise | = 10 log dBc / Hz (3.40)
Pc
Where the unit dBc/Hz denotes the decibels with respect to the carrier emphasizing the
normalization [29]. As an example, suppose the phase noise spectrum of an oscillator is
giving by
(50 mVrms ) 2
P1Hz | =
2
If the oscillation amplitude is equal to 0.5 Vrms, the relative phase noise at 100 kHz offset
will be
52
Chapter 3 Theoretical Background
S n (2 100 kHz )
= 2.533 10 14 . Thus,
(0.5 Vrms ) 2
The signal jitter is defined as the deviation of the zero crossings from their ideal position in
time, or alternatively could be defined as the deviation of each period from the ideal value.
Consider a noisy oscillator operating at a nominal frequency 0 = 2f0 = 2/0 with its
output compared against an ideal square wave with period T0 [Fig. 3.10(a)].
To estimate the jitter, we measure the deviation of each positive (or negative) transition
point of x2(t) from its corresponding point in the ideal signal x1(t), i.e., 1,
2, ..., . This type of jitter is called absolute jitter because it results from
comparison with an ideal reference. Since the measured deviations are random, we
therefore measure a very large number of deviations (i.e. ) and evaluate the root mean
square value of absolute jitter as:
1
Trms
abs
= lim[ T12 + T12 + ...... + TN2 ] (3.41)
N N
Another type of jitter which does not require a reference signal and is called cycle-to-
cycle jitter. It is obtained by measuring the difference between each two consecutive
cycles of the waveform, and taking the root mean square of the values [Fig. 3.10(b)]:
53
Chapter 3 Theoretical Background
Figure 3-10:
3 10: (a) Cycle-to-cycle
Cycle cycle jitter, and (b) variable cycles
cycles.
1
Trms
cc
= lim[ (T2 T1 ) 2 + (T3 T2 ) 2 + ......+ (TN TN 1 ) 2 ] (3.42)
N N
1
Trms
p
= lim[ (T T1 ) 2 + (T T2 ) 2 + ...... + (T TN ) 2 ] (3.43)
N N
The oscillator phase noise can be more easily simulated and measured in the laboratory
compared to the jitter. It is therefore desirable to establish a relationship
relationship between the two
quantities. For absolute jitter, a comparison between the actual signal and an ideal
reference is required noting that the deviation of each zero crossing is Tj = (2/T0)n,j,
where n,j denotes the value of n in radians of the
the zero crossing number (j). Thus,
1 N N
Tabs
2
, rms = lim
N
j T0 N
N j =1
T 2
= ( 2 2
) lim
j =1
n2, j
54
Chapter 3 Theoretical Background
1 +T / 2 2
n (t )dt
T T T / 2
Tabs
2
, rms = ( T0 ) lim
2 2
The limit represents the average power of n and, from Parcevals theorem [29], is
equivalent to the area under the spectrum of n:
+
Tabs , rms = ( T0 ) S n ( f ) df
2 2 2
(3.44)
CDR circuits used for wireline, optical and other communication systems must satisfy
certain jitter criteria specified by the standard associated to a particular type of
communication system. In this section, a description and estimation of the CDR jitter
characteristics will be carried out. The main CDR jitter characteristics are, jitter transfer,
jitter generation and jitter tolerance. Each type of jitter will be studied and related through
analytical expressions to the PLL parameters , n, -3dB, R, C, and Ip.
The jitter transfer function of a CDR circuit represents the output jitter as a function of the
input one, when the input jitter is varied at different rates. If, for example, the input jitter
varies slowly and therefore the waveform zero-crossing points move slowly around their
ideal positions then the output can follow the input to ensure phase locking. On the other
hand, if the input jitter varies rapidly, the CDR circuit must filter the jitter, i.e., the output
tracks the input to a lesser extent. Thus, the jitter transfer exhibits a low-pass characteristic,
as in the case of the PLL. The jitter transfers required by communication standards must
generally meet difficult specifications. First, the CDR bandwidth should be small enough
to attenuate jitter components above the CDR bandwidth. Second, the amount of peaking
in the jitter transfer (jitter peaking) must be also small to avoid any eventual instability.
Reducing the CDR bandwidth requires a reduction its CP-PLL bandwidth (-3dB), giving as
(Eq. 3.35)
55
Chapter 3 Theoretical Background
Ip k
3 dB = n 1 + 4 4
+ 2 2 , Where, k = k vco R, k = 2 n , = n2 and = RC
2
To reduce (-3dB), either () or (n) must be reduced. However, loop stability requires that
stays higher than 0.707, leaving (n) as the only parameter which may be reduced.
Lowering (kvco) and (Ip) will reduce (n), but will also reduce (). Thus, C is the principal
parameter that can be increased to decrease the loop bandwidth (-3dB) while increasing the
damping factor ().
For, >> 1, the CP-PLL bandwidth expression (Eq. 3.45) can be reduced to
RI p k vco
-3dB = 2 n =
2
Now, in order to reduce the jitter peaking, the damping factor should have a large value,
and careful attention must be paid to the poles and zeros of the closed loop transfer
function. The closed loop transfer function of the CP-PLL is giving by (Eq. 3.35)
n2 + 2 n s
H ( s) =
s 2 + 2 n s + n2
n 1
z = = (3.46)
2 RC
1
p1, p 2 = ( 2 1) n = (1 1 ) n
2
For a large damping factor (i.e. >>1), the square root function can be approximated as
56
Chapter 3 Theoretical Background
2
1 1
2 8
1 1
p1, p 2 = [1 (1 4 )]n
2 2
8
n n
p1 = 2 8 3
It follows that (3.47)
= 2 + n + n
p 2 2 8 3
n
Figure 3-11
3 11 (a) Poles and zeros position of the CP-PLL,
CP PLL, (b) corresponding jitter transfer
function.
function
The poles and zero are illustrated graphically in Fig. 3.11(a). The expressions 3.46 and
3.47 yield several interesting results.
results. First, the zero appears always before the poles,
leading to an inevitable peaking in the jitter transfer function. Second, since the damping
factor is large (i.e. >>1), the zero and the first pole differ in magnitude only by a small
value (i.e. n/83). Third, for >>1, ( z-p1) falls well below the second pole (p2)
>>1, the pair (
because (n/2)<<
)<< 2n. Fourth,
ourth, p2 is slightly lower than (-3dB
3dB) by an amount equal to
57
Chapter 3 Theoretical Background
assumes a constant value for > p1, and begins to fall at 20 dB/decade at > p2,
dropping to -3 dB at = 2. With logarithmic scales, the value of jitter peaking (Jp) can
be written as [29]
p1 1
That is, Jp = 1+ 2
z 4
1
20 log J p = 20 ln J p log e e = 8.686 ln(1 + )
4 2
8.686 2.172
20 log J p = (3.48)
2
2
Using expression 3.45, Eq. 3.48 can be expressed in terms of the CP-PLL design
parameters kvco, R, C, and Ip
8.686
20 log J p = 2
(3.49)
R I p Ck vco
If the resistor value R is lowered to reduce the jitter bandwidth (-3dB), then the capacitor
value C must be raised substantially to maintain Jp constant.
58
Chapter 3 Theoretical Background
Jitter generation refers to the jitter produced by the CDR circuit itself, when the input
random data is jitter free. The source of jitter in CDR circuits can be summarized as
follow:
VCO phase noise due to the electronic noise of its constituent devices
Coupling of data switching to the VCO through the phase and frequency detectors
To estimate the VCO noise contribution to CDR jitter, an expression relating PLL jitter to
the jitter of the free-running VCO must be derived. The phase noise and cycle-to-cycle
jitter of the free-running VCO are related by the following equation [45]
4
Tcc2 S ( ) 2 (3.50)
3
0
Where 0 denotes the oscillation frequency and S() represents the relative phase noise
power at an offset frequency () [45]. The jitter given by Eq. 3.50 will be shaped due to
the PLL effect. As illustrated in Fig. 3.12, it can be assumed that for a loop bandwidth of
2fu, the jitter rises with the square root of time until the instant t1 = 1/2 fu and saturates
thereafter [46]. The total jitter accumulated over time t1 by a free-running oscillator is
equal to [45]
f0
T1 = Tcc t1 (3.51)
2
1
TPLL = S ( ) (3.52)
2 0
59
Chapter 3 Theoretical Background
ow, if the value of (TPLL) must be less than 0.25 ps at 40 GHz and fu = 20 MHz, then
Now,
S() must not exceed -79
79 dBc/Hz at 1-MHz
1 MHz offset [29].
Figure 3-12
3 12 Accumulation of cycle-to-cycle
cycle cycle jitter in a phase-locked
phase locked oscillator: (a) actual
behavior and (b) resultant waveform.
waveform
Another jitter source in CDR circuits is the ripple on the control voltage. Any mismatch in
the charge pump design circuit can lead to a net charge injection into the loop filter on
every phase comparison instant even if the loop
loop is locked, hence modulating the VCO
control voltage and generating jitter. Random data transitions can also generate ripple on
the VCO control voltage through the phase and frequency detector. Thus, modulation
resulting from the ripple may be significant
significant thus yielding large jitter.
For a simple case of periodic modulation of a VCO, it is possible to estimate the output
jitter. Assuming a sinusoid modulation Vmcosmt, the cycle-to-cycle
cycle cycle jitter is giving by [45]
Vm k vco
Tcc = 2
1 cos m (3.53)
f0 f0
If the modulation frequency is much smaller than the oscillation frequency (i.e. fm << f0),
the expression (3.53) can be reduced to
Vm k vco m
Tcc = (3.54)
2 f 03
60
Chapter 3 Theoretical Background
The analysis of jitter properties in the previous sections has been so ffar
ar focused on their
effects on the recovered clock. However, as the data stream at the CDR circuit output will
be used for further processing, hence, the retimed data quality is also important. The CDR
circuit should normally behave with respect to the jittered
jittered input data stream as follow:
For slowly varying jitter at the input, the recovered clock usually tracks the phase
variations, always sampling the data in the middle of the bit [Fig. 3-13(a)]
3 13(a)] and
guaranteeing a low bit error rate (BER).
Figure 3-13:
3 13: Effect of (a) slow and (b) fast jitter on data retiming
retiming..
61
Chapter 3 Theoretical Background
Figure 3-14:
14: Example of jitter tolerance mask
mask.
In the next section, the jitter tolerance of a typical CDR circuit will be quantified and
compared with the mask shown in Fig. 3-14.
3 14. At, a given jitter frequency, the magnitude of
the input phase in must be increased until the BER begins to rise. This occurs
occurs when the
phase error, in-out, approaches one-half
one half unit interval, bringing the sampling edge of the
clock close to the zero-crossing
zero crossing points of data. Thus, an approximate condition to avoid
increasing the BER is
1
in out UI
2
1
Or, equivalently,
equivale in [1 H (s)] UI
2
0.5 UI
in (3.55)
[1 H ( s )]
62
Chapter 3 Theoretical Background
0 .5
GJT ( s ) (3.56)
[1 H ( s )]
Where, GJT(s) denotes he largest phase modulation at the input that increases the BER
negligibly. For a CDR loop based on the CP-PLL
CP PLL studied in the previous section, hence
1 s 2 + 2 n s + n2
G JT ( s ) = (3.57)
2 s2
Where, the closed loop transfer function H(s) is giving by (Eq. 3.35)
n2 + 2n S
H ( s) =
s 2 + 2n s + n2
The function GJT(s) contains two poles at the origin and two zeros coincident with the
poles of H(s). Consequently, as depicted in Fig. 3-15,
3 15, the Bode plot of the function |GJT(s)|
(i.e. 20log|GJT(s)|) falls at a rate of 40 dB/decade for < p1 and at 20 dB/decade for
p1< <p2, approaching 0.5 UI for > p2.
Figure 3-15:
3 15: Jitter tolerance for CP-PLL.
CP
Some few interesting remarks could be deduced from the previous results. First, the
magnitude of the function GJT(s) at s = j|p1| can be calculated as
63
Chapter 3 Theoretical Background
1 ( n p1 ) + 4 n p1
2 2 2 2 2 2
| G JT ( s = j | p1 |) | 2 =
4 p41
| G JT ( s = j | p1 |) | 2 = 8 4
Figure 3.16(a) plots |GJT(s)| for two different values of ,, revealing that if increases and
n remains constant, the required jitter tolerance is easily met. Moreover, Fig. 3.16(b)
suggests that as n increases while remains constant, jitter tolerance improves.
64
Chapter 3 Theoretical Background
After studying the time domain tracking property of the PLL, and the stability analysis of
the PLL and the CP-PLL incorporating a simple RC filter, we will now look for the
optimized value of R, C and Ip to obtain reasonable value of the loop parameters (m, and
3dB). Once the optimized value is obtained, a performances comparison of the PLL and
the CP-PLL will be carried out. To do this, we will start from initial value of the design
parameters R, C, and Ip. The VCOs conversion gain kvco is taken from the transistor level
design of the PLL.
2
3 dB = n 1 + 4 4
2 2 , m arg in = arctan[ ]
2 1 + 4 4
2 2
1 k 1
Where, 2 n = , n2 = and =
2 k
3 dB = n 1 + 4 4
+ 2 2 , m arg in = arctan{ 2 1 + 4 4 + 2 2
Ip k
Where, k = k vco R, k = 2 n , = n2 and = RC
2
We have, kvco = 2 (1.7x109) rad/V.sec. The values of the parameters resulting from the
optimization are the following: R = 370 , C = 2.3 nF, Ip = 30 A.
65
Chapter 3 Theoretical Background
Table 3-1: PLL and CP-PLL loop parameters for the optimized value of R, C and Ip.
Table 3-1 shows clearly that the CP-PLL is much better than the PLL in term of damping
factor, phase margin, jitter peaking and jitter tolerance.
3.6 Summary
In this chapter, a simplified time-domain analysis of the PLL in the locked state has been
carried out illustrating the tracking property of the PLL. In order to properly select the low
pass filter components (i.e. R and C), a frequency-domain stability analysis of the PLL and
the CP-PLL has been carried out, this analysis results in analytical expression relating the
stability parameters to R and C. Finally, as the jitter is predominant parameters in the CDR
circuits, a study of the jitter in the CP-PLL and its relation to R and C has been carried out.
66
Chapter 3 Theoretical Background
Figure 3 17: Optimization algorithm for selecting the value of R, C, and Ip.
67
Chapter 4 Chip-to-Chip Communication and Verilog-A System Modelling
The Verilog-A language is relatively new. It is an extension of SPICE; hence they have a
compatible simulation environment. In this work, we have adopted an efficient bottom-up
extraction approach to build and simulate a gate-level model for the clockless SerDes-
based serial link for an asynchronous inter-chip communication system [34, 65, 66, 67].
First, the dynamic (e.g. Latch, DFF, DETFF) and static gates (e.g. AND, OR, XOR) were
designed at transistor level using the resistively loaded MOS current mode logic, then the
characteristic parameters (e.g. delay, rise and fall time) of those gates were extracted.
Finally all the extracted parameters were incorporated into the behavioral model of the
reciprocal gates. In order to verify the accuracy of the quarter-rate concept, a 10 Gb/s
point-to-point based serial link interfacing two 8 bits chips will be implemented using the
Verilog-A language [34]. The proposed serial link will be incorporating the proposed
quarter-rate PLL-based CDR circuits. Based on the diagram illustrated on Figure 4-2, the
optimization implementation and simulations of this link will be carried out as follow:
In a serial
al bus or link, a circuit called SerDes (Serializer/Deserializer) interfacing for
example two VLSI chips is used to transmit and receive data over clockless serial link as
shown in Figure 4-1.
4 1.
Figure 4-1:
4 1: SerDes system as used in chip-to-chip
chip serial data
ta communication.
communication
In essence, a SerDes is a serial transceiver which converts parallel data into a serial data
stream on the transmitter side and converts the serial data back to parallel on the receiver
side. The timing skew problem encountered usually in
in a parallel bus is eliminated by
embedding the clock signal into the data stream. Since there is no separate clock signal in a
serial link, timing skew between clock and data no longer exist. As a result, a serial bus or
link can usually operate at a much higher data rate than a parallel bus in a comparable
system environment. Since the data is sent without the clock signal, one therefore needs a
circuit in the receiver to extract that clock signal from the serial data stream itself and
sample the last stream
stream using the extracted clock, the clock extraction and data retiming is
actually refer to the clock and data recovery (CDR) operation.
69
Chapter 4 Chip-to-Chip Communication and Verilog-A System Modelling
As discussed earlier, a SerDes circuit performs two functions, serialization and des-
serialization in a lossy and noisy environment. As shown in Figure 4-2, the serializer
converts the 8 bits parallel data streams into a single serial data stream. The conversion is
done with the clocks generated from the transmitters clock generator. Usually a high
speed clock running at the serial data rate is required. A practical and cost effective
solution is to generate this high speed clock from an off-chip low frequency quartz crystal
oscillator. As a result, a PLL based frequency multiplier is required in the transmitter side;
another important design challenge for the PLL is to maintain a minimum amount of clock
jitter despite all the switching noise generated by the surrounding circuits.
70
Chapter 4 Chip-to-Chip
Chip Communication and Verilog
Verilog-A
A System Modelling
Figure 4--2:
2: Simplified SerDes block diagram.
71
Chapter 4 Chip-to-Chip
Chip Communication and Verilog
Verilog-A
A System Modelling
4.2.1 Ser
erializer
ializer Principle and time domain simulations
Figure 4-3:
4 3: A multiplexer (a) and, its timing diagram (b).
72
Chapter 4 Chip-to-Chip
Chip Communication and Verilog
Verilog-A
A System Modelling
Figure 4-4:
4 4: A tree architecture of the 8-to-1
8 1 serializer.
73
Chapter 4 Chip-to-Chip Communication and Verilog-A System Modelling
The test bench of the serializer circuit is shown in Figure 4-5. The input signal of the
serializer circuit is 8 parallels channels of PRBS (pseudo-random bit sequence generator)
at 1.25 Gb/s data rate each representing for example the output of an 8-bits microprocessor,
this former communicates with a hard-drive disk or a memory. The serializer output should
be a single data stream at 8X1.25 Gb/s (=10 Gb/s). The serializer time domain simulations
results are shown in Figure 4-6, once the PLL in the serializer reached the steady state, and
for data input bits width of 800 ps (red signal in Figure 4-6(a)), the serializer data output
bits have a width of 100 ps (blue signal in Figure 4-6(b)), and the clock issued from the
PLL in the serializer has a period width of 100 ps (red signal in Figure 4-6(b)) which
confirms the operation accuracy of the serializer.
74
Chapter 4 Chip-to-Chip Communication and Verilog-A System Modelling
Figure 4-6: Serializer time domain results, data bit input width is
800 ps (a) and, (b) output bit width is 100 ps.
75
Chapter 4 Chip-to-Chip
Chip Communication and Verilog
Verilog-A
A System Modelling
Figure 4-7
4 7 Block diagram of the 4-to-8
4 8 demultiplexer (a), five
five-latch
latch architecture
architecture
of the 1-to2
1 to2 demultiplexer (b), and timing diagram of the demultiplexer (c).
76
Chapter 4 Chip-to-Chip Communication and Verilog-A System Modelling
As shown in Figure 4-8, the deserializer circuit, comprise our proposed quarter-rate PLL
based clock and data recovery circuit. Its input is a one PRBS serial data stream at 10 Gb/s
data rate without any clock signal associated to it. The VCO frequency in the CDR was
2.45 GHz, which is 50 MHz below the required frequency of 2.5 GHz (i.e. quarter-rate of
the data rate). The task of the deserializer is to, extracts the clock signal embedded in the
data stream, demultiplexes (1-to-4) the former one and simultaneously retime (sample)
them for further processing. In our case an additional demultiplexing (4-to-8) is required in
order to compare the 8 inputs of the serializer (section 4.2.1) to the 8 outputs of the
deserializer. As shown in Figure 4-9(a,) the PLL in the deserializer reached the steady state
within 2.3 s, and extracted clock has a frequency of 2.5 GHz (Figure 4-9(b)).
77
Chapter 4 Chip-to-Chip Communication and Verilog-A System Modelling
Figure 4-9: Low pass filter output showing the deserializer PLL locking process (a)
and, (b) DFT of the quarter-rate recovered clock output signal.
78
Chapter 4 Chip-to-Chip Communication and Verilog-A System Modelling
The test bench circuit of the serial link is shown in figure 4-10. This circuit includes 8
PRBS parallel data channels at 1.35 Gb/s each, the PLL-based 8-to-1 serializer and the
PLL-based 1-to-8 deserializer (our proposed 1-to-4 CDR plus an additional 4-to-8
DEMUX). The VCO minimum frequency in the CDR was 2.6 GHz, which is mean 100
MHz below the required one (2.7 GHz). Figure 4-11 (a and b) illustrates the transient
simulation results. The serializer reaches the steady-state within 1.2 s, followed by the
deserializer in less than 2 s later. As shown in Figure 4-12(a) the serial link is working
properly, because the deserializer outputs d1 and d2 are the same as the serializer inputs
in1 and in2.
79
Chapter 4 Chip-to-Chip Communication and Verilog-A System Modelling
Figure 4-11: Low-pass filter output voltage showing the serial link locking process
(a and b), and the DFT of the recovered clock in the deserializer (c).
80
Chapter 4 Chip-to-Chip Communication and Verilog-A System Modelling
Figure 4-12: Serial link data input and output (a) and,
serializer data and clock output (b).
In this chapter, we proved using the serial link schematic view and the Verilog-A language
that our proposed quarter-rate concept PLL-Based CDR is a working concept for a point-
to-point clockless based serial link interfacing two chips communicating serially at 10 Gb/s
data rate.
81
82
This chapter describes the transistor level design using the complementa
complementary
ry metal-oxide
metal oxide-
semiconductor (CMOS) current-mode
current mode logic (CML) of the blocks comprising the PLL-
PLL
based CDR circuit, such as the ELPD, the DQFD, the VCO and the charge
charge-pump.
pump.
Figure 5-1:
5 : Basic CML gate.
gate
Chapter 5 Building Blocks Circuit Design
The MCML gates are fully differential and steer current between the two pull-up resistors.
The total voltage swing, V = R.Itail, is set by adjusting the resistance of the pull-up
devices for a given current. The voltage swing V is not rail to rail but in fact is much less,
of the order of several hundred millivolts.
CML circuits are widely known to have advantages over their CMOS counterparts which
are especially useful for this application. First, CML circuits operate differentially; hence
inherently rejecting any common-mode noise introduced by the power supply and the
surrounding environment. Also, due to its reduced logical voltage swing, propagation
delays are shorter [54], which will be translated to a faster switching circuit. Although
CML logic style is known to suffer from more static power dissipation than the CMOS
logic, properly designed CML gates can consume less power than the CMOS style at
higher frequency of operation [55]. Especially, CML gates reduce current spikes and peak
during logical transitions, which in turn reduce the effects of power supplies bouncing.
CML circuits are mainly designed for low power and high frequency applications such as
communication transceivers and serial links; they are usually incorporating resistive load
rather than PMOS active load devices because the PMOS transistors severely limit the
maximum operating frequency of the circuit [56, 57]. There are generally several
techniques to implement logic circuits and functions in CMOS technology such as
complementary MOS logic (CMOS), the MOS current mode logic (MCML), folded source
mode logic (FSCL), domino logic, and complementary pass logic (CPL). The most popular
design styles are based on the CMOS logic for digital circuits. The CMOS logic design
style is known for being robust to the variations of fabrication process, hence producing
reliable integrated circuits. In the other hand, the resistively loaded MCML circuits are
sensitive to process variations and mismatch. For example, certain type of resistors can
vary up to 30% in CMOS technology, which may affect the proper functionality of the
logic circuits. Due to the popularity of using the CMOS logic style in VLSI systems, the
MCML characteristics will be compared to it. The main parameters to be compared
between the two logic styles are generally the delay, the power consumption and power-
delay product.
83
Chapter 5 Building Blocks Circuit Design
Lets assume that our circuit is composed of an integer number N of identical gates
connected in series, all with load capacitance C. The total propagation delay through the N
gates will be given as follow [82]:
While the CMOS logic gates dissipate static and dynamic power, the MCML gates are
drawing constant current over time and independent of the switching activity. Based on the
above assumption, expressions for power, power-delay product can be written as follow:
Hence,
!"
#
!" !"
1
84
Chapter 5 Building Blocks Circuit Design
This CMOS power-delay product, under conditions of continual dynamic load, is higher
than that for the MCML gate by a factor VDD/V, where V is the lower voltage swing of
the MCML system. In effect the MCML circuit trades noise margin for a significantly
improved power-delay product. If for instance the MCML tail current Itail is equal to the
PMOSFET (NMOSFET) saturation current in the CMOS gate, one therefore can compare
the delay in both logics.
1.2
6
0.2
The selected value of V above is the minimum voltage swing required to make the
NMOSFET differential pairs in the MCML gate switch properly between on and off. The
example above clearly shows that the delay is larger in CMOS than its counterpart in
MCML logic and hence the operation frequency is higher in MCML, thus retaining only
the delay (or frequency) parameter and ignoring the other, the MCML logic style is more
suitable than CMOS logic for our particular high speed applications [82]. Another
interesting point to be compared between the two logic styles is actually the common
supply lines fluctuations during the bits transitions in digital integrated circuits. Since the
voltage swing in MCML is much less than its CMOS counterpart, therefore the supplied
lines current fluctuations for the CMOS inverter for example is higher than those in the
MCML buffer. This reduced fluctuations in common supply lines, decrease in turn the
amount of jitter propagated throughout the integrated circuit.
85
Chapter 5 Building Blocks Circuit Design
Oscillators are an essential part of many electronic systems, and they are generally
embedded in the PLL circuits. They have a wide range
range of applications, such as clock
generation in microprocessors and frequency synthesizers in cellular telephones, therefore
requiring different oscillator topologies and performance parameters. In the following
sections, the analysis and CMOS design of oscillators and more specifically the VCO will
be carried out.
The oscillator is a negative feedback system has no input and producing a periodic output,
usually in the form of voltage. Let us consider the unity-gain
unity gain feed
feedback
back system shown in
2, comprising an amplifier represented by its transfer function H(s), the closed-
Figure 5-2, closed
loop gain of that system is then given by
Vout H (s)
(ss) = (5.1)
Vin 1 + H (s)
Figure 5-2:
5 2: Negative feedback system.
86
Chapter 5 Building Blocks Circuit Design
Under this condition, the circuit amplifies its own noise components at 0 indefinitely. As
5 3, a noise component at 0 having a total gain of unity
conceptually illustrated in Figure 5-3,
and a phase shift of 180o, returning to the subtractor as a negative re
replica
plica of the input.
Upon subtraction, the input and feedback signals produce a larger difference. Thus, the
feedback system amplifies continuously the noise component and hence generating a
periodic signal at 0.
Figure 5-3:
5 3: Oscillator and generation of periodic signal
signal.
In summary, if a negative feedback circuit has an open loop gain H(j) that satisfies the
following two conditions:
| H ( j0 ) | 1
(5.2)
H ( j0 ) = 180 o
Then oscillation may occur at 0. The conditions described by Equation 5.1 are normally
called Barkhausen criteria, these conditions are necessary but not sufficient [29]. To ensure
the starting of oscillation in the presence of temperature and process variations, the open
loop gain should be at least twice or three times the require
requiredd value. The oscillation
normally occurs when the total phase shift around the loop is equal 360o; this total phase
shift is composed of two components, a low frequency phase shift of 180o represented by
the subtractor, and a frequency dependant component of 180o introduced by the amplifier
transfer function H(j). CMOS oscillators of todays technology are typically implemented
as ring or LC oscillators; we focus more on ring type as it will be used in our CDR system.
87
Chapter 5 Building Blocks Circuit Design
I X = g m 2V 2 = g m1V1
Where, gm1 and gm2 are the transconductance of transistors M1 and M2 respectively.
IX I 1 1
And, V X = V1 V2 = X = I X ( + )
g m1 g m 2 g m1 g m 2
VX 2
ZX = = (5.3)
IX gm
Since gm > 0, then ZX < 0. In other word, if the input voltage in Figure 5-5(a) is increased,
so does the source of the transistor M1, reducing the drain source voltage of M2 thus
reducing the drain current of M2, and allowing part of Ib to flow back to the input source
hence reducing it. One of the negative resistance based oscillator design is shown in Figure
5-6(a). Here, Lp provides the bias current to M2 and Rp denotes the equivalent parallel
resistance of the tank and, for oscillation to occur Rp 2/gm 0.
88
Chapter 5 Building Blocks Circuit Design
Figure 5-4:
5 4: (a) Decaying
Decaying impulse response of a tank, (b) addition of negative
resistance to cancel loss in Rp.
Figure 5-5:
5: (a) Source follower with positive feedback to create negative
impedance, (b) equivalent circuit of (a).
89
Chapter 5 Building Blocks Circuit Design
Figure 5-6:
5 6: (a) Single and, (b) differential ended negative resistance based oscillator.
Figure 5-7:
5 7: (a) Oscillator and, (b) its equivalent circuit.
90
Chapter 5 Building Blocks Circuit Design
A ring oscillator consists generally of a number of gain stages in a closed loop [75-77].. For
the proper operation of our CDR circuit, eight clock phases separated by 22.5o and their
complements will be required; obtaining such clock phases requires
requires a differential ring type
oscillator comprising eight gain stages as shown in Figure 5-8(a).
5 8(a). To simplify the analysis,
let us consider the half-circuit
half circuit equivalent depicted in Figure 55-8(b),
8(b), and calculate the
minimum voltage gain that is necessary for the
the oscillation to occur. In this oscillator
design, the eight gain stages are identical, in which, RD and CL represents the total
resistance and total capacitance seen by the output node of each gain stage.
Figure 5-8:
5 8: Differential eight gain stages ring
ring oscillator (a) and
(b) its half circuit equivalent.
91
Chapter 5 Building Blocks Circuit Design
If the transfer function of each gain stage is H0(s), then the open loop transfer function of
the eight gain stages will be given by
A0 A0 A0 A08
H ( s ) = H 0 ( s ).H 0 ( s ).......H 0 ( s ) = =
s s s s 8
(1 + ) (1 + ) (1 + ) (1 + )
0 0 0 0
1 1
Where, 0 = =
CL RD C L
2 RD
2
A08
Hence, H (s) = (5.4)
s 8
(1 + )
0
The oscillation will start only if the total frequency dependant phase shift equal 180o, or if
each stage contributes 22.5o (=180o/8). The frequency at which this occurs is given by
osc
tan 1 = 22.5 o (5.5)
0
The minimum voltage gain per stage must be such that the magnitude of the open loop gain
at osc is equal to unity:
A08
=1 (5.7)
[ 1 + ( osc ) 2 ]8
0
In summary, an eight-stage ring oscillator requires a low frequency gain of 1.1 per stage,
and it oscillates at a frequency of 0.450, where 0 is the -3dB bandwidth of each stage.
92
Chapter 5 Building Blocks Circuit Design
Figure 5-9:
5 9: Waveforms of an eight-stage
eight stage ring oscillator.
One of the practical implementation of eight stages ring oscillator is depicted on Figure5-
Figure5
10 and called the current steering based differential ring oscillator. If the gain per stage is
well
ell above 2 (A0,min=1.1), then the amplitude grows until each differential pair experiences
complete switching, that is , until the current Is is completely steered to one side every half
cycle. As a result, the swing at the output node is equal to IsRD.
93
Chapter 5 Building Blocks Circuit Design
Figure 5-10:
5 10: Differential current steering ring oscillator and its waveforms.
If the number of stages is N and the delay per stage is TD, thus the circuit completes one
period of oscillation in a lap of time equal to 2NTD and hence the circuit oscillates at a
frequency equal to 1/2NT
1/2N D. In summary, an eight-stage
eight stage (N = 8) differential ring oscillator
has a small-signal 0.450 (Eq. 5.3) and a large-signal
small signal oscillation frequency equal to 0.45 large signal value
equal to 1/16TD. Since 0 is determined
determined by the small-signal
small signal output resistance and
capacitance of each stage whereas TD is results from the large signal, nonlinear current
drive and capacitance of each stage, therefore the large-signal
large signal frequency is less than the
small-signal
signal one. In other word,
word, the eight-stage
eight stage ring oscillator start
starts oscillating with a
frequency of 0.45
0.45 0 but, as the amplitude grows and the circuit becomes nonlinear, the
frequency shifts to the lower value of 1/16TD.
94
Chapter 5 Building Blocks Circuit Design
5.3 Voltage-Controlled
Voltage Controlled Oscillators
Figure 5-11:
5 11: Definition of a VCO (b) ideal and, (c) real.
95
Chapter 5 Building Blocks Circuit Design
1
Ron 3, 4 = (5.10)
W
p C ox ( ) 3, 4 (V DD Vcont | Vthp |)
L
Where, Ron3,4 is the on-resistance of the PMOS transistors M3 and M4. Thus
CL
= Ron3, 4 C L = (5.11)
W
p Cox ( ) 3, 4 (VDD Vcont | Vthp |)
L
Where, CL represents the total capacitance seen by each output to ground including the
input capacitance of the following stage. The total delay in the circuit is proportional to the
delay in each stage, hence
W
p C ox ( ) 3, 4 (VDD Vcont | Vthp |)
1 L
f osc = = (5.12)
2 NTD 2 NC L
Eq. 5.4 shows that the frequency of oscillation fosc of an N stages ring oscillator is linearly
proportional to the control voltage Vcont and inversely proportional to the number of stages
N of the oscillator.
R N RP
Req. = R N || RP =
R N + RP
If for example | R N | | R P | , then Req. is less negative and it has therefore a higher value.
This concept can be used in each stage of a ring oscillator as illustrated in Figure 5-12(b).
96
Chapter 5 Building Blocks Circuit Design
As Icc increases, the small signal differential resistance -2/gm3,4 becomes less negative and,
from the
he half circuit of Figure 5-12(c
5 12(c),
), the equivalent resistance
1 RP
Req. = R P || ( )= increases, thereby lowering the frequency of oscillation.
g m 3, 4 1 g m 3, 4 R P
Figure 5-12:
5 2: (a) Tuning with voltage variable resistors, (b) differential
differential stage with variable
variable
negative resistance load, (c) half circuit equivalent of (b
(b).
97
Chapter 5 Building Blocks Circuit Design
A drawback in the circuit of Figure 5-12 is that as Icc varies, so does the current steered by
the pair M3-M4 through R1 and R2. Thus, the output voltage swing is not constant across the
tuning range. To reduce this effect, Is can be varied in the opposite direction of Icc such that
the total current steered between R1 and R2 remains constant. In other words, it is
preferable to vary Icc and Is differentially while their sum is fixed, this property is normally
provided by a differential pair. As illustrated in Figure 5-13, the idea is to use the
differential pair M5-M6 to steer IT between the two pairs M1-M2 and M3-M4 such that the
expression IT = Is + Icc is always verified. Since IT must flow through R1 and R2, if M1-M4
experience complete switching in each cycle of oscillation, then IT is steered to R1 (through
M1 and M3) in half a period and to R2 (through M2 and M4) in the other half, giving a
differential swing of 2RpIT. The control voltages Vcont1 and Vcont2 in the circuit of Figure 5-
13 can be viewed as differential control lines if they vary by equal and opposite amounts.
Differential topology provides normally higher noise immunity for the control input than if
Vcont is single ended.
As Vcont2 increases and Vcont1 decreases, the transconductance of the cross coupled pair
increases, increasing the time constant and hence reducing the frequency of oscillation. A
drawback of circuit in Figure 5-13 is that when the current IT is completely steered by M6
through the pair M3-M4. Since the pair M1-M2 carries no current at all, hence the gain of
each stage will fall eventually to zero, preventing oscillation. To avoid the occurrence of
this situation, a small constant current Ibias is added to the pair M1-M2, thereby ensuring M1
and M2 remain always on. We calculate the required minimum value of Ibias in Figure 5-13
to guarantee a low frequency gain of 1.1 (for N = 8) when all of IT is steered to the cross
coupled pair M3-M4. The small signal gain of the circuit 5-13 is given by [29]
1.21 [1 R p n C ox ( WL ) 3, 4 I T ] 2
That is, I bias (5.13)
n C ox ( WL )1, 2 I bias R p2
98
Chapter 5 Building Blocks Circuit Design
Figure 5-13:
5 Differential pair used to steer current between M1-M2 and M3-M4.
99
Chapter 5 Building Blocks Circuit Design
Before presenting our novel quarter-rate early-late PD (ELPD), we will briefly explain the
concept of full-rate (i.e. the clock frequency is equal the data rate) ELPD that is originally
proposed by Alexander [24]. Figure 5-14, illustrates the concept of early-late detection
method. Using three data samples taken by three consecutive clock edges, the PD can
determine whether a data transition is present, and whether the clock leads or lags the data.
In the absence of data transitions, all three samples are equal and no action is taken. If the
clock leads (it is early), the first sample S1, is unequal to the last two (i.e. S2 and S3).
Conversely, if the clock lags (it is late), the first two samples, S1 and S2, are equal but
unequal to the last S3. Thus, S1 S2, and S2 S3 provide the early-late information:
Based on the above observations, the Table 5-2, and Figure 5-14 can be constructed.
S1 S2 S3 Y = S1 S2 X = S2 S3 Detection (Action)
Table 5-1: Truth table representing all states of the Alexander ELPD.
100
Chapter 5 Building Blocks Circuit Design
Table 5-14:
5 14: (a) Three points sampling of data by clock, and (b) an Alexander ELPD.
UP1 = D0 D45 ,UP2 = D 90 D135 ,UP3 = D180 D 225 , and UP4 = D 270 D 315 .
In order to simplify the charge pump circuit design, the signals UP1-UP4 and DN1-DN4 are
serialized using the clock phases 22.5o, 67.5o, 112.5o, 157.5o, 247.5o, and 292.5o as
illustrated in Figure 5-15.
5 When the CDR is in the locked state, the half
half-quadrature
quadrature clock
signal edges are aligned with the data transitions and, hence, D0, D90, D180 and D270 will be
the recovered demultiplexed data.
101
Chapter 5 Building Blocks Circuit Design
Figure 5-15:
5 15: (a) Block diagram of the proposed quarter
quarter-rate
rate
ELPD, and (b) its operation.
102
Chapter 5 Building Blocks Circuit Design
Figure 5-16:
5 16: Timing diagram for (a) slow and fast data, (b) state representation and,
(c) finite state diagram.
103
Chapter 5 Building Blocks Circuit Design
In this work, we propose a quarter-rate DQFD [34, 64], the proposed architecture
comprises eight DFFs, two XOR gates, and combinational logics as shown in Figure 5-17.
The combinational logics truth table of the proposed quarter-rate DQFD is shown in Table
5-1. Clocks 0o, 22.5o, 45o and 67.5o are first sampled by input data, each half of a clock
period (i.e. 200 ps) is divided into four states, I, II, III, and IV as shown in Figure 5-16(b).
In the proposed DQFD four DFFs triggered by rising and falling edges of the clock 0o will
store the sampled values and record the states. The arrow in Figure 5-16(b) represents the
rising or falling edge of the clock 0o to appear at the boundary between the states IV and I.
the operational Principle of the proposed quarter-rate DQFD will be discussed in the
following. For a slow periodic data stream as shown in Figure 5-16(a), suppose that the
first rising edge of the data appears at the boundary between the states III and IV. Then the
second rising edge crosses the boundary between the states IV and I and appears in state I.
The state transition rotated from state IV to I would be detected. This state transition
indicates that the clock is faster than quarter the data rate and frequency down pulses are
generated. For a fast data periodic data as shown in Figure 5-16(a), the first rising edge
appears at the boundary between the states I and II. Then the second rising edge crosses the
boundary between the states IV and I and appears in state IV. The last state transition
indicates that the clock is slower than quarter the data rate and frequency up pulses are
generated. The truth table 5-1 represents the states transition of the proposed quarter-rate
DQFD.
Q7 Q8 10 11 01 00
State IV (00) UP UP X X
104
Chapter 5 Building Blocks Circuit Design
To find the required combinational logics circuit, we write the equations describing the
frequency up and down pulses. From table 5-3:
5
From the above equations, the implementation of the required combinational logic circuit
is shown in Figure 5-
5-17.
Figure 5-17:
5 : Schematic of the proposed quarter-rate
quarter rate DQFD.
105
Chapter 5 Building Blocks Circuit Design
5.6 Charge-Pump
Charge Pump Principle
Figure 5-18:
5 18: Charge pump and its output signal in conjunction with a periodic
signal based phase and frequency detector.
106
Chapter 5 Building Blocks Circuit Design
5.7 Charge-Pump
Charge Pump and Loop Filter Circuit Design
The common mode voltage of Vup and Vdown is compared to a reference voltage Vref by the
comparator. If the common mode voltage level is increased, the drain currents of
transistors M7 and M8 are decreased and the common mode voltage is pulled up by the
current source Ic.
Figure 5-19:
5 19: Schematic of the charge-pump
charge pump and loop filter.
107
Chapter 6 PLL-Based CDR Circuit Implementation and Simulations
The results of design and transistor level simulation of a novel architecture for PLL-based
clock and data recovery (CDR) circuit are presented in this chapter. The proposed PLL-
based CDR is a referenceless quarter-rate design (i.e., the clock frequency is quarter the
input data rate), comprising a novel quarter-rate phase detector, a novel quarter-rate
frequency detector and can be used in a deserializer as part of the Serializer/Deserializer
(SerDes) device usually utilized in inter-chip communication networks [34]. The proposed
CDR circuit is designed in a standard 0.13 m CMOS technology, and simulated at
transistor level to verify its accuracy as well as to evaluate its characteristics and
performances.
For proper operation of the phase and frequency detectors, eight clock signals and their
complements (separated by 22.5) are required. Due to its wide tuning range an eight-stage
ring oscillator structure was chosen. As shown in Figure 6-1, the VCO consists of eight
stages, each one of them comprising a delay cell and a control circuit for generating
differential control voltages Vinc and Vdec for the delay cell. The controlling signals Vinc and
Vdec can be viewed as differential control lines and hence providing higher noise immunity
to the VCO controlled input. The dimensions of transistor M7 and the voltage at its gate
Vbias should be carefully adjusted such that proper VCO gain, linearity and tuning range
will be obtained. The tuning technique in this architecture is already described in 5.3.2 and
based on the concept of bias current controlled negative resistance [64]. As the bias current
of the cross-coupled pair of transistors (M3 and M4) increases, their negative small-signal
resistance becomes less negative; hence the total resistance seen by the outputs nodes out
and outb increase, thereby lowering the oscillation frequency. The eight clock signals
generated by the VCO are shown in Figure 6-2(a).
Chapter 6 PLL--Based
Based CDR Circuit Implementation and Simulations
Figure 6-1:
6 : The eight-stage
eight stage voltage-controlled
voltage controlled ring oscillator.
In summary, and based on the post layout simulation results, the proposed VCO has the
following features:
The generated clock signals are differential which gives it a good supply and
substrate noise rejection and yield 50% duty cycle
cycle in the oscillating signals.
109
Chapter 6 PLL-Based CDR Circuit Implementation and Simulations
Figure 6-2: Post-layout simulation, (a) the clock signals generated by the VCO
and, (b) the VCO's conversion gain.
110
Chapter 6 PLL--Based
Based CDR Circuit Implementation and Simulations
111
Chapter 6 PLL-Based CDR Circuit Implementation and Simulations
112
Chapter 6 PLL--Based
Based CDR Circuit Implementation and Simulations
Figure 6-5
5:: The proposed quarter-rate
quarter rate early-late
early late type phase detector
(D0, D90, D180 and D270) are the demultiplexed recovered data.
113
Chapter 6 PLL-Based CDR Circuit Implementation and Simulations
Figure 6-6: Phase detector output for 10 ps out of phase two signals at its input.
114
Chapter 6 PLL--Based
Based CDR Circuit Implementation and Simulations
Figure 6-8:
6 : Architecture of the proposed frequency detector.
115
Chapter 6 PLL-Based CDR Circuit Implementation and Simulations
116
Chapter 6 PLL-Based CDR Circuit Implementation and Simulations
To determine the operating range of the proposed frequency detector, we apply two
periodic signals to its inputs. One of them is considered as a reference and has a quarter-
rate constant frequency (2.5 GHz) and the other signal is swept in frequency at a constant
rate of 5 MHz/ns starting from 9 GHz and stopping at 11 GHz. The transfer curve of the
proposed frequency detector is illustrated on Figure 6-10. It exhibits a 1 GHz operating
range around the nominal frequency of 10 GHz.
117
Chapter 6 PLL-Based CDR Circuit Implementation and Simulations
The proposed quarter rate PLL-CDR has been designed in UMC 0.13m CMOS
technology and simulated at transistor level using the schematic view of the CDR circuit
[64]. Since we are using a quarter-rate based CDR topology, the input data rate should be
four times the VCO centre frequency. Based on the VCO schematic simulation
characteristic curve of Figure 6-12, the VCO centre frequency is about 5.5 GHz, therefore
the data rate should be about 22 Gb/s. As shown in Figure 6-13, the input data signal is
PRBS (N=32) with a data rate of 21.85 Gb/s. The data rate is 160 MHz below the required
centre frequency of the VCO (i.e. 5.35 GHz). Figure 6-14(b), illustrates the transient
simulation results of the circuit locking process, the PLL reaches the steady state within
500 ns. As shown in Figure 6-14(a), once the desired frequency has been acquired the
frequency detector is disabled, hence generating no outputs. Table 6-1 summarizes the
PLL-CDR circuits performances based on schematic view simulation results.
118
Chapter 6 PLL-Based CDR Circuit Implementation and Simulations
Figure 6-13: Block diagram of the proposed quarter-rate PLL-Based CDR circuit.
Parameter Simulation
PRBS 232-1
CDR power 97 mW
119
Chapter 6 PLL-Based CDR Circuit Implementation and Simulations
120
Chapter 6 PLL-Based CDR Circuit Implementation and Simulations
Based on the schematic view simulation results illustrated on Figure 6-14 (a) and (b), the
quarter-rate PLL-based CDR is a working concept. Although the schematic view of the
CDR circuit is working at around 22 Gb/s data rate, the fabricated chip is expected to work
at about 10 Gb/s, because the VCO centre frequency is expected to be lower than the
schematic one due to the presence of parasitic capacitors and resistors associated to the
fabricated chip.
Figure 6-15: Layout of the complete PLL-Based CDR circuit and its constituting circuits.
As shown in Figure 6-15, the design occupies an area of 920 m x 315 m and is expected
to dissipate approximately 97 mW, excluding the output buffers, at a supply voltage of
1.2 V according to the transistor level simulation results [64].
121
Chapter 7 Conclusion
7.1 Conclusions
Serial data communications are widely used in todays data communication systems such
as fibre optic and wireline based communication links, they as well as are aggressively
substituting the communication based on the source synchronous parallel links and the
multi-bit parallel bus because they are more power and space efficient. Higher volume of
transmitted data requires higher and higher bandwidth. CMOS technology is largely used
and highly desired for monolithic implementation because of its advantages of low cost
and wide availability. The primary goal of this dissertation is to implement a new concept
of a clock and data recovery circuit in 130nm CMOS technology for 10 Gb/s operation,
modelling it with the Verilog-A language and ultimately using it as part of the receiver in a
chip-to-chip serial link transceiver, another advantage of the proposed concept is that, the
serial data stream is inherently 1-to-4 demultiplexed.
The existing works of Gb/s clock and data recovery circuits are full, half data rate,
reference or referenceless based architectures. The proposed architecture of this circuit is a
referenceless quarter-rate PLL-based clock and data recovery circuit, it means that first, the
circuit does not require a reference clock signal because it is internally generated from the
VCO and, second for a 10 Gb/s incoming data rate, the internal parts of the circuit (i.e.
VCO, DFFs and primitive gates) are actually working at a clock speed of 2.5 GHz.
Working at quarter-rate relaxes the timing constraints of the dynamic elements and the
static gates as well as reducing the dynamic power consumption resulting from the
switching activities in the circuits.
Chapter 7 Conclusion
The proposed topology contains two loops operating independently, the phase and
frequency-locked loops; the frequency detector is for frequency acquisition only. Once the
frequency lock is acquired (i.e. the clock frequency is equal to quarter of the data rate), the
frequency detector is disabled and the phase detector will take over to properly adjust the
clock phase with respect to the data stream (i.e. the clock edges occurs in the middle of the
data bit). When the lock is lost, the frequency detector is automatically activated.
The proposed quarter-rate frequency detector has two advantages, first because the
frequency detector is completely disabled when the lock is acquired, it does not contribute
any jitter to the system, second because the gain or the operating range of the frequency
detector is reasonably large, hence the process of frequency acquisition is faster while the
loop dynamics of the phase locked loop and the jitter performance of the system are not
disturbed. From the transistor level simulations, the frequency detector demonstrated a
detecting range 25% of the data rate. The proposed phase detector is a symmetric quarter-
rate and nonlinear; because it is nonlinear, hence it has a large gain and therefore it is
suitable for Gb/s data rate. An 8-stage differential ring oscillator was used for the voltage
controlled oscillator (VCO). The differential architecture is widely used because it rejects
noises from both the power lines and the substrate. Eight phases and their complements
separated by 22.5o are produced from the 8-stage ring oscillator and ready to use for proper
operation of the phase and frequency detector. The chip was designed, transistor level
simulated, modelled with the Verilog-A language and fabricated using the CMOS UMC
130nm technology process. The simulation results showed that the circuit has excellent
performance in term of locking time (500 ns), small silicon area and power consumption
(97 mW), having short acquisition time reduce the number of preamble or training bits
required and results in higher efficiency. Unfortunately the fabricated chip was not
working because the VCO was not generating any signal normally required for the proper
operation of the phase and frequency detector. The VCO was not oscillating because the
measured DC voltage level at the output of the VCO was much lower (0.2 V) than the
simulated and expected value (0.8 V). Since the VCO architecture is a current mode based
design, hence between the power supply (VDD) and the ground (GND), there is one load
resistor cascoded (stacked) with two stacked transistors below it. Having low DC voltage
level at the output of the load resistor makes the bottom two transistors below it completely
off and hence preventing the VCO from oscillating.
123
Chapter 7 Conclusion
In semiconductor industries and research, the most important figure of merit of any new
circuit design and system architecture is a working silicon implementation of the proposed
circuit. Although our proposed concept or approach of the PLL-based clock and data
recovery is a working concept at transistor level simulation and Verilog-A modelling, but
we still need to have working silicon of such new concept. For a better chance of having a
successful implementation in the future, we propose the following steps:
3. Implementing of the new idea at lower data rate (e.g. 1 Gb/s) using the rail-to-
rail CMOS logic and using as much as possible primitive logic cells and
dynamic gates already available in the libraries provided by AMS or TSMC.
Using the rail-to-rail logic alleviates the problem of proper biasing normally
encountered in current mode logic.
4. Once the concept is proved to work in silicon, at a lower data rate using rail-to-
rail logic, we can eventually move forward and implement the idea using the
current mode logic for higher data rate (e.g. 10 Gb/s).
124
References
2. K. Lee, S-J Lee, and H-J Yoo, SILENT: Serialized Low Energy Transmission
Coding for On-Chip Interconnection Networks, Proceedings of the 2004
IEE/ACM International Conference On Computer-Aided Design, 448-415, 2004.
4. N. McKewon et al., Tiny Tera: A Packet Switch Core, IEEE Micro, vol. 17, no.1,
26-33, Jan.-Feb. 1997.
10. A. Charlesworth, Starfire: Extending the SMP Envelope, IEEE Micro, vol. 18,
no. 1, 39-49, Jan.-Feb. 1998.
11. T. Takahashi et al. A CMOS Gate Array With 600 Mb/s Simultaneous
Bidirectional I/O Circuits, IEEE Journal of Solid-State Circuits, vol. 30, no. 12,
1544-1546, Dec. 1995.
12. K. Lee et al., A Jitter-Tolerant 4.5 Gb/s CMOS Interconnect for Digital Display,
IEEE International Solid-State circuits Conference Digest of Technical papers, 310-
311, Feb. 1998
13. L.I. Anderson et al., Silicon Bipolar Chipset for SONET/SDH 10-Gb/s Fiber-
Optic Communication Links, IEEE Journal of Solid-State Circuits, vol. 30, no. 3,
210-218, Mar. 1995.
125
14. Y.M. Greshishchev et al., A Fully Integrated SiGe Receiver IC for 10-Gb/s Data
Rate, IEEE Journal of Solid-State Circuits, vol. 35, no. 12, 1949-57, Dec. 2000.
15. M. Meghelli et al., SiGe BiCMOS 3.3-V Clock and Data Recovery Circuits,
IEEE Journal of Solid-State Circuits, vol. 35, no. 12, 1992-5, Dec. 2000.
17. S. Shioiri et al., A 10 Gb/s SiGe Framer/Demultiplexer for SDH Systems, IEEE
Journal of Solid-State Circuits Conference, 202-203, 1998.
19. Y.M. Greshishchev et al., A 60-dB Gain, 55-dB Dynamic Range, 10-Gb/s
Broadband SiGe HBT Limiter Amplifier, IEEE Journal of Solid-State Circuits,
vol. 34, no. 12, 1914-1920, Dec. 1999.
22. C. A. Sharp, A 3-State Phase Detector Can Improve Your Next PLL Design,
EDN, pp. 55-59, Sept. 1976.
25. R-J Yang et al., A 3.125-Gb/s Clock and Data Recovery Circuit for 10-Gbase-
LX4 Ethernet, IEEE Journal of Solid-State Circuits, vol. 39, 1356-1360, Aug. 2004.
26. J. Savoj and B. Razavi, A 10-Gb/s CMOS Clock and Data Recovery Circuit with
A Half-Rate Binary Phase/Frequency Detector, IEEE Journal of Solid-State
Circuits, vol. 38, 13-21, January 2003.
27. J.E. Rogers and R. J. Long, A 10-Gb/s CDR/DEMUX with LC Delay Line VCO
in 0.18-m CMOS, IEEE Journal of Solid-State Circuits, vol. 37, 1781-1789, Dec.
2002.
126
28. D. Richman, Color-Carrier Reference Phase Synchronization Accuracy in NTSC
Color-Television, Proc. IRE, vol. 22, 106-133, jan. 1954.
32. J. Savoj and B. Razavi, A 10-Gb/s CMOS Clock and Data Recovery Circuit with
Frequency Detection, IEEE Journal of Solid-State Circuits Conference Digest of
Technical Papers, 78-79, Feb. 2001.
33. G. Guetierrez et al., 2.485 Gb/s Silicon Bipolar Clock and Data Recovery IC for
SONET (OC-48), Proceedings of the Customs Integrated Circuits Conference,
575-578, May 1998.
35. J. Savoj and B. Razavi, A 10-Gb/s CMOS Clock and Data Recovery Circuit with a
Half-Rate Linear Phase Detector. IEEE JSSC, vol. 36, 761-767, May 2001.
37. C. J. Scheytt et al., A 0.155-, 0.622-, and 2.488-Gb/s Automatic Bit-Rate Selecting
Clock and Data Recovery IC for Bit-Rate Transparent SDH Systems. IEEE JSSC,
vol. 34, 1935-1943, Dec. 1999.
38. K. Irvani et al., Clock and Data Recovery for 1.25-Gb/s Ethernet Transceiver in
0.35-m CMOS. IEEE CICC, 261-264, 1999.
39. R. C. Walker et al., A Two-Chip 1.5GBd Serial Link Interface. IEEE JSSC, vol.
27, 1805-1811, Dec. 1992.
40. B. S. Anand and B. Razavi, A CMOS Clock Recovery Circuit for 2.5-Gb/s NRZ
Data, IEEE JSSC, vol. 36, 432-439, Mar. 2001.
127
41. Y. Qiu et al., 5-Gb/s 0.18- m CMOS Clock Recovery Circuit. IEEE Int.
Workshop VLSI Design & Video Tech., 21-23, May 28-30, 2005.
42. A. Rezayee and K. Martin, A 9-16 GB/s Clock and Data Recovery Circuit with
Three-State Phase Detector and Dual-Path Loop Architecture. ESSCIRC03.
43. T-S Chen et al., A 10Gb/s Clock and Data Recovery Circuit with Binary
Phase/Frequency Detector Using TSMC 0.35m SiGe BiCMOS Process, IEEE
Asia-Pacific conference on Circuit and Systems, Dec. 6-9, 981-984, 2004.
44. Razavi B., A 2.5-Gb/s 15-mW Clock Recovery Circuit. IEEE JSSC, vol. 31, pp.
472-480, April 1996.
45. F. Herzel, and B. Razavi, A Study of Oscillator Jitter Due to Supply and Substrate
Noise, IEEE Trans. Circuits and Systems, Part II, vol. 46, pp.56-62, 1999.
46. J. A. McNeill, Jitter in Ring Oscillator, IEEE JSSC, vol. 32, pp. 870-879, 1997.
47. D. H. Wolaver, Phase-Locked Loop Circuit Design, PTR Prentice Hall, 1991.
48. M. Mizuno et al., A GHz MOS Adaptive Pipeline Technique Using MOS Current-
Mode Logic, IEEE JSSC, vol. 31, pp. 784-791, June1996.
49. K. Irvani et. al., Clock and data Recovery for 1.25 Gb/s Ethernet Transceiver in
0.35 m CMOS, in Proc. IEEE Custom Integrated Circuits Conf., May 2001, pp.
261-264.
50. H.-T. Ng and D. J. Allstot, CMOS Current Steering Logic for Low-Voltage
Mixed-Signal Integrated Circuits, IEEE Trans. VLSI Syst., vol. 5, pp. 301-308,
Sep. 1997.
51. A. Tanable et. al., 0.18-m CMOS 10-Gb/s Multiplexer/Demultiplexer ICs Using
Current Mode Logic with Tolerance to Threshold Voltage Fluctuation, IEEE
JSSC, vol. 36, pp. 988-996, June 2001.
52. H.-D. Wohlmuth et. al., A High Sensitivity Static 2:1 Frequency Divider up to 19
GHz in 120 nm CMOS, in Proc. IEEE Radio Frequency Integrated Circuits
(RFIC) Symp., June 2002, pp. 231-234.
53. M. H. Anis and M. I. Elmasry, Self-Timed MOS Current Mode Logic for Digital
Applications, in Proc. IEEE Int. Conf. ASIC/SOC, 2002, pp. 193-197.
54. J. Musicer and J. Rabaey, MOS Current Mode Logic for Low Power, Low Noise
CORDIC Computation in Mixed-Signal Environments, Proc. ISPLPED, pp. 102-
107, July 2000.
128
55. M. W. Allam and M. I. Elmasry, Dynamic Current Mode Logic (DyCML), A New
Low-Power High performance Logic style, IEEE JSSC, vol. 36, pp. 550-558,
March 2001.
56. J. Rabaey, Digital Integrated Circuits: A Design perspective. Englewood Cliffs, NJ:
Prentice-Hall, 1996.
58. S. J. Song et. al., A 4-Gb/s CMOS Clock and Data Recovery Circuits Using 1/8-
Rate Clock Technique, IEEE JSSC, vol. 38, pp. 1213-1219, July 2003.
60. J. E. Rogers and J. R. Long, A 10 Gb/s CDR/DEMUX with LC Delay Line VCO
in 0.18-m CMOS, IEEE JSSC, vol. 37, pp. 1781-1789, May 2002.
61. J. Savoj and B. Razavi, A 10-Gb/s CMOS Clock and Data recovery Circuit with
Half-Rate Linear Phase Detector, IEEE JSSC, vol. 36, pp. 761-767, May 2001.
62. A. Pottbacker et. al., A Si Bipolar Phase and Frequency Detector IC for Clock
Extraction up to 8 Gb/s, IEEE JSSC, vol. 27, pp. 1747-1751, December 1992.
63. B. Stilling, Bit Rate and Protocol Independent Clock and Data Recovery,
Electron. Lett., vol. 36, pp. 824-825, April 2000.
65. C-C Kuo, Y-C Wang and C-N J. Liu An Efficient Approach to Build Accurate
Behavioural Models of PLL Designs, IEICE Transactions on Fundamentals of
Electronics, Communications and Computer Sciences, vol. E89-A, pp. 391-398,
February 2006.
66. L-X Liu, Y-T Yang, Z-M Zhu and Y. Li, Design of PLL System Based Verilog-
AMS Behavioural Models, IEEE Int. Workshop VLSI Design & Video Tech., pp.
67-70, May 2005.
67. T. Oura, Y. Hiraku, T. Suzuki, and H. Asai Modelling and Simulation of Phase-
Locked Loop with Verilog-A Description for Top-Down Design, IEEE Asia-
Pacific conference on Circuit and Systems, vol. 1, pp. 549-552, December 2004.
129
68. G. Balamurugan and N. Shanbhag, Modelling and Mitigation of Jitter in High-
Speed Source-Synchronous Inter-Chip Communication Systems. IEEE Computer
Society, Proceedings of the 21st International conference on computer Design
(ICCD03).
69. E. Yeung and Horowitz Mark A., A 2.4 Gb/s/pin Simultaneous Bidirectional
Parallel link with Per-Pin Skew Compensation, IEEE JSSC, vol. 35, pp. 1619-
1628, November 2000.
70. Y. Chen et. al., A Novel technique to Enhance the Negative resistance for Colpitts
Oscillators by Parasitic Cancellation, IEEE Conference on Electron Devices and
Solid-State Circuits, pp. 425-428, December 2007.
71. K. Mayaram, Output Voltage Analysis for the MOS Colpitts Oscillator, IEEE
Transactions on Circuits and Systems I, vol. 47, pp. 260-263, February 2000.
72. C.-Y. Cha and S.-G. Lee, A Complementary Colpitts Oscillator in CMOS
Technology, IEEE Transactions on Microwave Theory and Techniques, vol. 3, pp.
881-887, March 2005.
73. U. Yodprasit and C. C. Enz, Realization of Low-Voltage and low-Power Colpitts
Quadrature Oscillator, IEEE International Symposium on Circuits and Systems,
pp. 4289-4292, 2006.
74. J. Steinkamp et. al., A Colpitts Oscillator design for a GSM Base Station
Synthesizer, IEEE Radio Frequency Integrated Circuits (RFIC) Symposium, pp.
405-408, June 2007.
75. Y. A. Eken and J. P. Uyemura, A 5.9-GHz Voltage-Controlled Ring Oscillator in
0.18-m CMOS, IEEE JSSC, vol. 39, pp. 230-233, January 2004.
76. S.-J. Lee et. al., A novel high-speed ring oscillator for multiphase clock generation
using negative skewed delay scheme, IEEE JSSC, vol. 32, pp. 289-291, February
1997.
77. J.D. Van Der Tang et. al., A 9.8-11.5-GHz quadrature ring oscillator for optical
receivers, IEEE JSSC, vol. 37, pp. 438-442, March 2002.
78. A. Hajimiri and T. Lee, Design Issues in CMOS Differential LC Oscillators,
IEEE JSSC, vol. 34, pp. 717-724, May 1999.
79. P. Zhang, Design of CMOS LC Oscillators, International Conference on Solid-
State and Integrated Circuit Technology, pp. 1534-1537, October 2006.
80. J. Van Der Tang and A. Van Roermund, A 5.3 GHz phase shift tuned I/Q LC
oscillator with 1.1 GHz tuning range, IEEE MTT-S International Microwave
Symposium Diegest, vol. 1, pp. A133-A136, June 2003.
81. R. Dobkin et al., Parallel vs. Serial On-Chip Communication, CCIT TR674, EE
Pub No. 1631, EE Dept., Technion, December 2007.
82. J. Musicer and J. Rabaey, An Analysis of MOS Current Mode Logic for Low
Power and High Performance Digital Logic, Proceedings ISLPED, July 2000.
130