Project Report Manu Sir
Project Report Manu Sir
(ELL749)
A Project Report
on
Design of 128K x 8 High Speed Full CMOS SRAM
Submitted by: -
The semiconductor memory, SRAM uses a bi-stable latch circuit to store the logic data 1 or 0.
It differs from Dynamic RAM (DRAM) which needs periodic refreshment operation for the
storage of logic data. Depending upon the frequency of operation SRAM power consumption
varies, i.e. it consumes very high power at higher frequencies like DRAM. The Cache memory
present in the microprocessor needs high speed memory, hence SRAM can be used for that
purpose in microprocessors. The DRAM is normally used in the Main memory of processors,
where importance is given to the density than its speed. The SRAM is also used in industrial
In this Project 128Kx8 High Speed SRAM is designed by using the memory banking
method in UMC 65nm Technology. The pre- layout simulation for the critical path is
performed and obtained the delay of the circuit. All peripherals like pre-charge, Row Decoder,
Word line driver, Sense amplifier, Column Decoder/MUX and write driver are designed and
layouts of all the above peripherals also drawn in an optimized manner such that their layout
occupies the minimum area. The 6T SRAM cell is designed and stability analysis are also
performed for single SRAM cell. The layout of the Single SRAM cell is drawn in a symmetric
manner, such that two adjacent cells can share same contact, which results reduction in the area
of cell layout. The Static Noise Margin, Read Noise Margin and Write Noise Margin of single
cells are found to be 540.91mV, 261.04mV and 571.69mV respectively, for a supply voltage
of 1.2V. The effect of the pull-up ratio and cell ratio on the stability of SRAM cell is observed.
1|Page
Table of Contents
1. INTRODUCTION .................................................................................................................................................. 3
1.1 FEATURES ................................................................................................................................................. 3
1.2 PROCESS TECHNOLOGY............................................................................................................................ 5
1.3 DESIGN FLOW AND TOOLS USED ............................................................................................................. 5
2. SRAM BANK DESIGN ......................................................................................................................................... 6
5. SUMMARY ................................................................................................................................................ 36
4. REFERENCES .............................................................................................................................................. 38
2|Page
1. INTRODUCTION
4|Page
1.3 Data Flow and Tools Used
Layout Integration
5|Page
Figure4: Divide wordline Architecture
Since this memory is intended for the high-speed applications so, single
array of 128Kx8 will be large also will require gates with large driving
capacity. We explored the possibilities to partition the whole memory
in banks. Now, the question was to decide in how many numbers of
banks we should divide, for this we performed few experiments and
benchmarked them against their results. After comparing we found that
8 memory banks would give us the optimal results as per our
requirements.
6|Page
In our memory design we have divided our 128K X 8 bit memory into
8 banks .Each Bank will have 16KB of memory .3 address lines are
used to switch between banks .In each bank we have divided 16KB of
memory in 4 blocks ,so 2 address lines are needed to switch between
these blocks .In each block we have 12 address lines to be used. For
high speed SRAM we are taking column size more than the row size.
In our design we have taken 5 row and 7 column
7|Page
2.3 6T SRAM BIT CELL
Further we explored which bit cell out of 6T, 4T and 11T would be
suitable and found that 6T cell would be most appropriate for low
power applications. Schematic diagram of the 6T bit-cell is shown
below:
Figure 8: 6T Cell
To meet our high speed design, we have taken different W/L ratios
and finally getting the most optimized results on these values.
AREA ESTIMATION: -
• w/l(PUN) = 100/60
• w/l(PDN) = 195/60
• w/l(Access Transistor) = 100/80
• Area of 1-Bit Cell = 1.355(W) * 1.125(H) = 1.524 um2
8|Page
To design a memory, we must take care of the sizes also. We must
make the layout of the 1-bit 6T cell in the most optimized way.
Layout of the cell can be seen in the figure below:
Size of 1-bit cell is coming out to be 1.524 um2 with a core efficiency
of 65%.
9|Page
Static noise margin is the maximum Temperature Static Noise Margin
-40º C 589.16 mV
noise voltage VN that can come
simultaneously on both the voltage -6.5º C 574.61 mV
time.
Figure 12 : SNM Analysis
540.91 mV
10 | P a g e
Write Noise Margin
Write noise margin is the maximum noise voltage VN that can come
simultaneously on both the voltage sources as shown in above section
that will prevent write operation to flip the content of the cell. Cell is
put in write mode with word lines enabled and one of the bit lines pulled
to ground.
-40º C 604.07 mV
-6.5º C 592.88 mV
27º C 571.69 mV
56º C 563.36 mV
85º C 555.6 mV
Figure 13 : Write Margin Plot for CR = 2.6 & PR = 1.11 Figure 14 : SNM Analysis
1.4 160.21 mV
1.8 197.40 mV
2.2 229.41 mV
2.6 261.06 mV
3.0 288.24 mV
11 | P a g e
Based on the VTCs, we define the read Margin to characterize the
SRAM cell’s read stability. Read Margin is directly proportional to the
cell ratio. Read Margin with the increase in value of the pull up ratio .
So carefully we must design SRAM cell invertors before calculating
read noise margin of SRAM cell in read operations.
3. DETAILED BLOCK
3.1 DECODERS
3.1.1 ROW DECODER
The Row decoder is Used to Generate Worldline Address for the
Memory Cluster. In our decoder design, hierarchical scheme has been
used for the design of decoders.
The decoder block is divided into two stages.
Pre-decoder
Post decoder
12 | P a g e
The Row Decoder is Designed in Such a way that it saves both area,
power and provide a faster access. To ensure the Low Power and Speed,
Hierarchal Decoding is used, and the driving power of Gates are
Increased in a step Fashion (Tapered Buffering), the Other concept is
the use of Negative Body Biasing for ensuring very low leakage
current. For Faster Access, NAND Gates are used over NOR gate
which Provide Faster Access. Here we are dividing row and column in
the ratio of 5:7 which means that row uses 5 address line and column
uses 7 address line .Since we are dividing our 1MB of memory into 8
banks and these blocks are again sub divided into 4 blocks .So we
require total three types of row decoder i.e.,3x8 for bank select ,2x4 for
block select and 5x32 for bit selection.
2x4 decoder: -
2x4 decoder is designed using NAND gate and layout of this decoder
can be seen in the figure above. We have done its MONTE CARLO
simulation to find the collect value of the delay. In this decoder we are
getting a mean delay of 89.2 ps.
13 | P a g e
Yield = 100 %
Yield = 100 %
3x8 Decoder: -
14 | P a g e
Layout of the 3x8 is shown in the figure 19 and from MONTE CARLO
simulation mean delay is found to be 181.93 ps.
Yield = 100 %
Yield = 100 %
5x32 Decoder: -
Yield = 100 %
Yield = 100 %
16 | P a g e
3.1.2 COLUMN DECODER
17 | P a g e
3.2 PRECHARGE CIRCUIT
Pre-charging is done at the end of every clock cycle. Both static and
dynamic pre-charge schemes are used. Static precharge transistors are
used to avoid the bit-lines floating, when the core is not in use. Also,
they do not affect the normal operation of the core in any way. Size of
Dynamic transistors selected such that it can charge up the Bitline
Capacitance within the allocated precharge budget. Equalization
transistor is also used to ensure the BL and BL_BAR lines equal at the
start of read operation.
18 | P a g e
transistors in the middle are controlled by EN. When EN signal is at
low logic level, the circuit is in pre-charge mode and bit line and bit bar
lines pulled to high logic level. The middle transistor in between the
bitline and bitline bar acts as equalizer. The layout of pre-charge has
been made taking care of the pitch of the SRAM cell. To match the
pitch of the dynamic transistor, the layout width is restricted by the bit
cell width. The bit and bit bar lines are perfectly aligned with bit cell
lines
19 | P a g e
3.3 WRITE DRIVER
The tremendous bit line swing can bring about huge power dissipation
in write operation and during read operation, the bit line voltage swing
is normally limited to 180mV, and consequently the write cycle can
consume around 1/8th more power than a read operation. Initially,
before write operation both bit line voltages are charging to supply
voltage and the write operation is performed by enabling WR_EN
signal. Suppose if we want to write logic 0 in to the memory cell, then
the BB line voltage charges to supply voltage VDD and BT line voltage
is discharges to lower potential i.e. ground. The data stored in bit line,
BT and bit line bar, BB is accessed by enabling word line. The sizing
of transistors in write driver is quite large to provide large driving
current.
20 | P a g e
Figure 28: CONTROL PART
Two CMOS inverters (M1, M3 and M2, M4) are the main bodies of
this amplifier, they connect to each other in a cross-coupling way
including a positive feedback, thus the amplifier turns very fast.
Bitlines BL and BLN are not only inputs but also outputs, so full swing
output is conducted on signal is used to turn on/off the transistor M5 to
control the state of the sense amplifier. Transistors M6-M8 constitute a
typical precharge circuit, in which M6 and M7 are used to precharge
BL and BLN to VDD, M8 as a balanced transistor is used to ensure that
the two bitlines have the same initial voltage which is necessary to
prevent sense amplifier turning unexpectedly when it starts to amplify
signal. In the precharge stage, bitlines are charged to VDD through
pulling down PC signal, at the same time, transistor M8 is turned on to
eliminate the voltage difference between BL and BLN caused by the
change of device’s threshold. At the end of the precharge stage, M6-
22 | P a g e
M8 are turned off, and then a word line is turned on to start the read
operation. After that, one of the bitlines will be pulled down by the
selected storage cell, when voltage difference of the two bitlines is big
enough, SE signal turns to VDD from GND, and transistor M5 is turned
on to enable the sense amplifier, under the effect of positive
feedback(M1-M4), bitline voltage is amplified to CMOS levels
quickly.
The sense amplifier shown in Fig. 30 has a characteristic that its input
is the same as output, thus the full swing voltage conversion is
conducted on the bitlines. Since massive storage cells are connected to
the bitline in SRAM, parasitic capacitance of the bitline is also large,
therefore the delay time and power consumption needed for the
amplification process will be severe and this kind of amplifier is rare to
be used in SRAM.
23 | P a g e
Separating from input and output is one of the methods to improve
speed and reduce power consumption of sense amplifier. An improved
latch type sense amplifier is shown in Fig. 31. Compared to the
amplifier shown in Fig. 30, transistors M9 and M10 are added as
switching PMOS to transfer voltage difference from bitlines to sense
amplifier. Enable signal SE turns off M9 and M10, then the large bitline
capacitance is separated from sense amplifier, so that in the interior of
the amplifier, bitline capacitance will have little impact on the speed of
the circuit and power consumption is reduced outstandingly.
24 | P a g e
Figure 33: Layout of Sense Amplifier
25 | P a g e
information and operates corresponds to that wrong address. The ATD
circuit is used to avoid such problem. When any address signal changes
it generates the pulse. And during that period an internal circuit of the
memory device is held in the non-operating condition in response to
change in address signal. And thus, responding to wrong address
information which is generated due to noise or deviation of timing in
address signal is prevented by using ATD circuit.
The EXOR Circuit used in the above ATD is designed using DPL
Logic.
26 | P a g e
Figure 37: Output Waveform of ATD Circuit
The Pulse width of the ATD designed considering all the worst-case
delay of each circuit comes out equals to 823.3 pS. The Monte-Carlo
Simulation shows the Yield equal to 100% within 6 sigma limits.
27 | P a g e
3.6: Monte Carlo Simulation of Noise Margins
3.6.1: Read Noise Margin:
RNM= 261.06 mV
SD = 0.9420 mV
28 | P a g e
3.6.3: Static Noise Margin:
SNM= 540.91 mV
SD = 0.9603 mV
29 | P a g e
4: Layout of Memory Array & Timing Analysis
4.1: 16x16 Memory Array
30 | P a g e
4.2: 32x32 Memory Array
VDD VSS
Tapping
31 | P a g e
4.3: Timing Circuit Analysis
The Output of the ATD is divided into Read, Write & Precharge Signal
with the help of WE Signal. At every transition of Address, ATD circuit
generates a pulse for read and write operation. When the Write Enable
Signal is Low, it generates a Write Pulse and for that duration it turns
off the Precharge Circuit by making Precharge Signal High. When the
Write Enable Signal is High, it generates a Read Pulse and for that
duration it turns off the Precharge Circuit by making Precharge Signal
High. The Read Pulse also goes to the enable of Sense Amplifier.
32 | P a g e
4.4: Critical Path Simulation
4.4.1: Read Critical Path Simulation
33 | P a g e
4.4.2: Write Critical Path Simulation
In section 2.1 and 2.2, the SRAM 16KB bank abstraction and
characterization are described. This is performed to design the SRAM
1Mb block that uses eight 16KB banks. Also, the SRAM 1Mb RTL
synthesis is discussed that generates gate level netlist and gives an
estimate about the timing details and worst path based on gate delays
and wire load models.
In section 3.4 and 3.5, Design of Critical Components like Sense
Amplifier and Address Transition Detection Circuit is shown with its
simulation results. The MONTE CARLO Simulation shows that both
the circuits are robust against Process and Mismatch Variation. The
Yield is 100% within 6 Sigma limit.
35 | P a g e
6. REFERENCES
1. Bharadwaj S Amrutur Mark A. Horowitz "A replica Technique for Wordline and Sense
Control in Low-Power SRAM's" IEEE journal of Solid State Circuits vol. 33 pp. 1208-
1219 1998.
2. Seevinck, E., F. J. List, and J. Lohstroh. "Static-Noise Margin Analysis of MOS SRAM
Cells." IEEE Journal of Solid-State Circuits, vol. 22, no. 5, 1987, pp. 748-754.
6. Neil Weste, David Harris. CMOS VLSI Design: A Circuits and Systems
Perspective(4thEdition),2010.
7. CHOW H C, CHANG S H. High performance sense amplifier circuit for low power
SRAM applications[C]. IEEE ISCAS’04, 2004: 741-744.
36 | P a g e