0% found this document useful (0 votes)
28 views14 pages

IET Circuits Devices Syst - 2023 - Wang

The document describes a 1 kb 6T SRAM implemented using a 40 nm CMOS process that achieves 1.0 fJ energy per bit. It presents a design with single-ended cells that uses different supply voltages for read/write operations and standby to reduce standby power. Simulations and physical measurements on silicon prototypes demonstrate the low power performance.

Uploaded by

kalawanti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views14 pages

IET Circuits Devices Syst - 2023 - Wang

The document describes a 1 kb 6T SRAM implemented using a 40 nm CMOS process that achieves 1.0 fJ energy per bit. It presents a design with single-ended cells that uses different supply voltages for read/write operations and standby to reduce standby power. Simulations and physical measurements on silicon prototypes demonstrate the low power performance.

Uploaded by

kalawanti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Received: 12 August 2022

DOI: 10.1049/cds2.12141

ORIGINAL RESEARCH
- -Revised: 28 November 2022 Accepted: 22 December 2022

- IET Circuits, Devices & Systems

A 1.0 f J energy/bit single‐ended 1 kb 6T SRAM implemented


using 40 nm CMOS process

Chua‐Chin Wang1,2 | Ralph Gerard B. Sangalang1 | I‐Ting Tseng1 | Yi‐Jen Chiu3 |


Yu‐Cheng Lin4 | Oliver Lexter July A. Jose1,5
1
Department of Electrical Engineering, National Sun Yat‐Sen University, Kaohsiung, Taiwan
2
Institute of Undersea Technology, National Sun Yat‐Sen University, Kaohsiung, Taiwan
3
Department of Photonics, National Sun Yat‐Sen University, Kaohsiung, Taiwan
4
Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan
5
Department of Electronics Engineering, Batangas State University‐ The National Engineering University, Batangas City, Philippines

Correspondence Abstract
Chua‐Chin Wang, Department of Electrical An ultra‐low‐energy SRAM composed of single‐ended cells is demonstrated on silicon in
Engineering, National Sun Yat‐Sen University, No.
70, Lian‐Hai Rd., Kaohsiung City 80424, Taiwan.
this investigation. More specifically, the supply voltages of cells are gated by wordline
Email: [email protected] (WL) enable, and the voltage mode select (VMS) signals select one of the corresponding
Yu‐Cheng Lin, Department of Engineering Science,
supply voltages. A lower voltage is selected to maintain stored bit state when cells are not
National Cheng Kung University, No. 1, University accessed, lowering the standby power. And when selecting a cell (i.e. WL is enabled) to
Rd., Tainan City 70101, Taiwan. perform the read or write (R/W) operations, the normal supply voltage is used. A 1‐kb
Email: [email protected]
SRAM prototype based on the single‐ended cells with built‐in self‐test (BIST) and power‐
delay production (PDP) reduction circuits was realised on silicon using 40‐nm CMOS
Funding information
The National Science and Technology Council of
technology. Theoretical derivations and simulations of all‐PVT‐corner variations are also
Taiwan, Grant/Award Numbers: MOST109‐2218‐ disclosed to justify low energy performance. Physical measurements of six prototypes on
E‐110‐007, 108‐2218‐E‐110‐011, 108‐2218‐E‐110‐ silicon shows that the energy per bit is 1.0 fJ at the 10 MHz system clock.
002, 107‐2218‐E‐110‐002, 110‐2221‐E‐110‐063‐
MY2; National Applied Research Laboratories
KEYWORDS
digital integrated circuits, logic design, low‐power electronics, memory architecture, VLSI

1 | INTRODUCTION critical that SRAMs used in CPU/MPU have an energy saving


feature to minimise their effect on its performance. Several
Memory devices are known to be second only to CPU/MPU variety of SRAM designs have been reported in past decades.
in terms of overall timing parameter performance of digital Three key design methods were presented, specifically for the
subsystems in electronic products. It will soon occupy 90% of energy‐saving and power demands of SRAMs:
the entire area in SOC (system on chip) according to the ITRS
report [1]. Low‐power memory devices will cut the total power 1). Current‐mode sense amplification [2, 3]: Since CMOS
consumption of these items, especially those that are battery technology has been scaled down very quickly, the bitline
powered and portable. Unlike DRAM, which is widely used as capacitance has become too large for an SRAM cell to
the main memory mechanism, SRAM has been used in most drive. During read operations, a sense amplifier pre-
of CPU/MPUs as cache devices to speed up the access of determines the output result by sensing the differential
recently used data. The performance of SRAM, regardless of current on two bitlines, enabling high‐speed and low‐
its usage, has a significant influence on the power dissipation, power operation. In this case, the bitline capacitance has
which affects the overall efficiency of the system. Thus, it is less of an impact on the output delay.

-
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is
properly cited.
© 2023 The Authors. IET Circuits, Devices & Systems published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.

IET Circuits Devices Syst. 2023;17:75–87. wileyonlinelibrary.com/journal/cds2 75


17518598, 2023, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cds2.12141, Wiley Online Library on [02/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
76
- WANG ET AL.

2). Secondary supply [4]: Adding an extra supply voltage, Schmitt‐trigger inverters, are also possible to improve cell
higher than the nominal VDD, can improve the access performance [12].
speed of the SRAM at the expense of a higher energy cost. In order to meet the low power dissipation demands for
The energy or standby power of the cells that are not SRAMs implemented in advanced CMOS process, a supply
accessed are often ignored. voltage gate‐control mechanism for every column of SRAMs is
3). Current compensation [5, 6]: When the SRAM is turned on, proposed in this work, wherein two supplies with different
leakage current is detected by a current compensation circuit voltages are used and selected by WL (wordline) and associated
in each bitline to inject an additional current into the asso- signals to decrease the power dissipation while on standby. To
ciated bitline. Thus, the SRAM's access speed is enhanced, compensate for the loss of R/W speed caused by reduced supply
even if the leakage current is not reduced. This method has voltage, a voltage boost is given to the driving gate of the selected
little energy‐saving benefits, especially in standby cells. SRAM cells to provide speed and slew rate improvement.
Detailed post‐layout simulations and physical on‐silicon mea-
A 4T loadless SRAM has been reported for lower power surements are demonstrated to justify the low power/energy
usage, where high‐threshold voltage transistors are used in data feature. The proposed SRAM is fabricated using a typical 40‐nm
latches and low‐threshold voltage transistors are used in bitline CMOS process, where 1.0 fJ energy/bit is measured at a 10 MHz
drivers [7], also called a P‐latch N‐drive 4T SRAM cell. Despite system clock with an access delay of 52 ns.
having self‐refreshing paths to keep the stored bit state,
instability and read/write disturbance become a threat to the
access operations. These threats are particularly strong when 2 | LOW ENERGY SRAM WITH SINGLE‐
loadless designs lacks any bitline isolation mechanism. What's ENDED CELLS
even worse is that the weakening of the static noise margin
(SNM), as mentioned in the works by Wang et al. [8], proved to Referring to Figure 1, the proposed SRAM design consists of
be a hazard for such a cell structure. These SRAMs were found a memory array, a control circuit, a row and column decoder, a
to be even more vulnerable when the supply voltage is lower. column select circuit, a build‐in self‐test (BIST) circuit, a
To resolve this issue, readout assist circuits were proposed for power‐delay product (PDP) reduction circuit, and a VDD select
single‐ended SRAMs [9]. This breakthrough accelerated the circuit. The supply voltage of the SRAM cells is selected by the
research of non‐symmetrical R/W auxiliary circuit designs that VDD select circuit. A pass‐transistor gate voltage boosting
are meant to provide disturbance isolation from bit‐lines, for (PVB) and adaptive voltage detector (AVD) circuits make up
example, reports of Chen et al. [10] The disturbance isolation the PDP reduction circuit. The functions of major signals in
design becomes even more critical if the SRAM is meant to be the proposed SRAM are summarised as follows:
fabricated using advanced CMOS technology nodes (e.g.
<100 nm), or the SRAM cell is operating near the subthreshold 1. Bit_Addr[4:0]: bitline addresses
region. The SRAMs that have write‐assist loops were typical 2. Word_Addr[4:0]: wordline addresses
examples to demonstrate the disturbance‐free feature [10, 11]. 3. WR_EN: write/read enable (1/0)
The two examples, however, used the design methodology for 4. Data_out, Data_in: data output and data input, respectively
symmetrical R/W, and hence cannot be applied to single‐ended 5. CLK: system clock
SRAM cells. Other methods, such as the usage of asymmetric 6. BS: boost select

FIGURE 1 1‐kb SRAM with single‐ended cells system block diagram


17518598, 2023, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cds2.12141, Wiley Online Library on [02/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WANG ET AL.
- 77

7. VMS: voltage mode select 2). If all the cells in the column were not being accessed, WLB
8. BIST_Pass: BIST pass (or not) goes high and WL goes low. A lower supply voltage,
9. BIST_EN: BIST enable VDD – Vthp(M306), is coupled to all cells' supply nodes in the
same column. That is, the voltage supplied to the cells are
decreased by the threshold voltage of transistor M306 to
save power and still maintain the stored bit states.
2.1 | Single‐ended SRAM cell circuit
analysis Let us consider a typical CMOS process for our SRAM
design vehicle. The supply voltage VDD is 0.8 V for the typical
Works related to the reduction of leakage has been introduced 40‐nm CMOS process such that the reduced supply voltage for
in 8T SRAM cells through the use of low‐Vth auto‐gating those unaccessed cells becomes VDD–Vthp = 0.68 V. Referring
transistors to minimise leakage at the expense of lower to Figure 2, the current supplied through the low‐Vth PMOS
speeds [13]. Meanwhile in the works of Chen et al. [10], a 5T devices is limited by the width thereof. Thus, auxiliary circuits
single‐ended cell was reported, which introduced a cell isola- driven by a VMS signal are needed to prevent possible R/W
tion mechanism to prevent noise interference. However, errors caused by the insufficient supply current.
leakage current was a main issue, causing retention fault,
particularly for advanced CMOS processes. This issue was not � Access operation: WL = 1 and WLB = 0. VMS1 = VMS (=
present in the works of Terada et al. [14], because they 0) or WLB = 0 such that M305 is turned on to supply extra
introduced a transistor in their design to act as a leakage current.
bypass. This was at the expense of a larger area overhead. � Hold operation: WL = 0 and WLB = 1. VMS2 = VMS (= 0)
All the SRAM cells discussed before still exhibit high or WL = 0 such that M310 and M315 are on to supply extra
standby power, since all of the idle cells are directly coupled to current.
regular power, thus consuming a significant amount of standby � No auxiliary: VMS = 1. All of the auxiliary circuits are off.
power. A new cell‐column structure was presented in our This is for the purpose of testing to validate the proposed
previous report [15], as presented in Figure 2. This was to power‐gated mechanism.
reduce the standby power, and in turn the overall power
dissipation, when most cells are not accessed. Though the cell
with the associated power‐gated mechanism was described in 2.2 | SRAM cell transistor sizing
the mentioned article, operation details, theoretical analysis,
and the on‐silicon verification were never disclosed. The The transistor sizes of the SRAM cells are determined by the
proposed design operates as follows: current that will pass through the transistors for every operation.
To attain reliable Q and Qb with a symmetric feature, currents
1). In the event that any cell is being accessed, WLB goes low through M201 and M202 must be the same. Hence, (W/L)201 =
and WL goes high. Transistor M301 is then turned on, (W/L)202. Transistor M206 will drain the current when Q = 0
providing the regular VDD to the cells in the same column. such that (W/L)206 is chosen to be the minimum size to have the
lowest current passing to it. The write‐assist transistor M203
should be able to draw the same current passing to M201 when
writing logic ‘1’. Transistors M203 and M204 are equally sized to
have the ratio of M201 versus M203 equal to that of M202 versus
M204. Access transistor M205 is chosen to be equal to M203 versus
M204, since it should have current passing to them while doing
write operation, which is equal.
To further decrease power consumption, transistors inside
the cells are chosen to have high‐Vth, while the write‐assist and
access transistors are chosen to have low‐Vth to have faster
access and write operations.

2.3 | Analysis of power dissipation and area


overhead

Every column of memory cells in the proposed design has a


power‐gated mechanism. The biggest challenge is to optimise
the size of the gating transistors to retain the correct operating
region and reduction of power dissipation at the same time.
FIGURE 2 Proposed ultra‐low power 6T SRAM cell with power Awkward scenarios at the FS corner (fast NMOS, slow PMOS)
gating might be a problem of this power‐gated mechanism, since slow
17518598, 2023, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cds2.12141, Wiley Online Library on [02/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
78
- WANG ET AL.

PMOS devices provide weak current. The expected result is used HVT and LVT devices such that the shared diffusion layout
that it will be hard for Qb to stay high (as Q = 0). This current (as what is done for the conventional cell) cannot be utilised to
shortage issue can be overcome through analytical solutions to minimise the area of the cell.
know what the minimal PMOS size should be provided that The proposed ultra‐low‐energy 6T single‐ended SRAM cell
the number of cells is given. Assume that every column has a design with the power‐gated mechanism does not need a sense
total of n cells. Assume that Iact stands for the current needed amplifier (SA) like prior single‐ended SRAMs [10, 14]. In
by the cell to access, and Iidl denotes the required current by contrast, traditional SRAMs [13] need SAs to accelerate access
idle cells. The power PMOS device's drain current should operations, because they use the bitline and bitline together to
satisfy the following equation: access the data nodes. This is another reason why the proposed
SRAM has to be more energy efficient.
ID ≥ Iact þ ðn − 1Þ � Iidl ð1Þ

If n = 32 and the implementation is on typical 40‐nm 2.4 | Read/write cycles


CMOS technology, the total required current for cells being
accessed in a column is ID = 39.5 μA = 31.5 μA (1 cell Read/write cycles of the proposed SRAM are described as
accessed) + 31 � 245 nA (other 31 cells idle). According to the follows. The read cycles are shown in Figure 6. The read
saturation current ID equation for MOS transistors, the mini- operation in the cell is shown in Figure 7.
mal width of the power PMOS to supply such a current is
3.75 μm. Therefore, a total of 5�750‐nm PMOS transistors in
parallel are used, which are M301–M305. Similarly, the number
and the size of those power‐gated transistors for idle cells in
Figure 2, namely M306–M315 can be determined as well.
Based on the proposed power‐gated mechanism to
decrease the standby power for most of the cells, the layouts of
the proposed ultra‐low power 6T SRAM cell and the power‐
gated mechanism circuit are shown in Figures 3 and 4,
respectively, where the area of the single cell is 1.8 � 2.1 μm2,
and that of the power‐gated mechanism circuit is 2.8 � 3.6 μm2.
2:8�3:6
Namely, the area overhead cost is 8.33% = 32�1:8�2:1 for every
32 cells, if the area of wiring is not taken into account, to share 1
power‐gated mechanism circuit. If the length of the cell array is
increased to 1024, the area penalty becomes only 0.26%, which
is considered relatively small and negligible.
Referring to Figure 5, the layout size comparison between the
proposed cell to a conventional 6T cell, both cells have similar
transistor sizes, and the proposed cell occupied 21% larger area
compared to a conventional cell layout, since the proposed cell
FIGURE 4 Layout of the proposed power‐gated mechanism

F I G U R E 5 Layout size comparison (a) proposed 6T cell;


FIGURE 3 Layout of the proposed ultra‐low‐power 6T SRAM cell (b) conventional 6T cell
17518598, 2023, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cds2.12141, Wiley Online Library on [02/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WANG ET AL.
- 79

FIGURE 6 Read cycle timing diagram FIGURE 8 Write cycle timing diagram

FIGURE 9 Write‐0 operation


FIGURE 7 Read operation

� Before any operation, predischarge will cause BLB to be


grounded to prevent state “0” from being disrupted by
leakage and noise.
� The matching decoders choose the cell once if the row and
column addresses are available.
� WA and WL are then pulled high, turning M204 (M203) and
M205 on. Regardless of Read1 or Read0, WAB is pulled low
to turn M203 off. Qb will then be coupled to BLB through
M205 and M204.

The write operation timing diagram is shown in Figure 8.


The write‐0 and write‐1 operations are shown in Figures 9
and 10, respectively. FIGURE 10 Write‐1 operation

� Write‐0: WA is set to low and WAB is then pulled high to


turn transistors M203 and M204 on and off, respectively. Q is � Write‐1: WA is pulled high to turn transistor M204 on. WAB
then pulled down to the ground using the predischarge is low to turn transistor M203 off. Predischarge then pulls Q
signal. high by pulling Qb down.
17518598, 2023, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cds2.12141, Wiley Online Library on [02/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
80
- WANG ET AL.

Table 1 tabulates the overall R/W operation and related namely “energy”) reduction circuit is used to further minimise
control signals of the proposed SRAM. energy consumption in each R/W operation [11, 15, 16].
Referring to Figure 12, the adaptive voltage detector and pass‐
transistor gate voltage boosting are the two sub‐circuits of the
2.5 | Hold/standby operation PDP reduction circuit [11]. For high‐speed access operations,
when boost select (BS) is set to high, the AVD circuit gives a
The hold/standby operation is also shown as part of the read boost enable (Boost_EN). This changes the voltage supply of
cycle in Figure 6. All the access transistors (M203, M204, and the cells to be accessed from VDD to V’DD (a voltage greater
M205) are disconnected to isolate the memory cell. During this than VDD).
operation, the power gating circuit is also enabled to reduce the
supply voltage of the inactive cells. This is shown in Figure 11.
A reduced voltage of around 0.68 V will now be used to supply 2.6.1 | Adaptive voltage detector (AVD)
the cells. This ensures the low power standby operation of the
cells. Referring to Figure 13, the adaptive voltage detector circuit
Since the low‐Vth access transistors are used in the design, used for generating the boost enable signal for the pass tran-
it is important that these transistors do not leak in the worst sistor voltage boosting circuit is presented. A common source
process corner. A long transient simulation during hold oper- amplifier (composed of M1301, M1302, and R1301) generates the
ation is presented in Section 3 to show that the design will not VP0 signal, once the BS signal is enabled, which is then fed
droop during the worst‐case corner. into the current‐starved inverter composed of M1304 and
M1305, and has a precise switching voltage to adjust to slight
variations in the BS signal. The inverter's output is then latched
2.6 | PDP reduction circuit to keep track of whether the pass transistor gate boosting
circuit has to be enabled or disabled. The Boost_EN signal will
Aside from the power‐gating method used in the preceding be high if the output of inv1303 and the latched voltage VP2 are
sections for each column of cells, a power‐delay product (PDP, both low.

TABLE 1 Read and write operation


2.6.2 | Pass‐transistor gate voltage boosting
Write 1 Write 0 Read (1/0) Standby (PVB):
Predischarge 1 1 0 1
The PDP reduction circuit initially operates in the waiting
WL 1 1 1 0
mode. It will stay in the waiting mode, since the AVD circuit
WA 1 0 1 0 has not yet finished the system voltage detection (the inverter
WAB 0 1 0 0

BL 1 1 1/0 1

BLB 0 0 0/1 0

FIGURE 12 PDP reduction circuit

FIGURE 11 Standby/hold operation FIGURE 13 AVD circuit


17518598, 2023, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cds2.12141, Wiley Online Library on [02/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WANG ET AL.
- 81

switching voltage vs. VP0). The PDP reduction circuit enters of BIST circuits, has a complexity of 10�N, where N specifies
the PDP reduction mode after exiting the waiting mode. The the memory size, which in this study is 1024. The March C‐
operation is as follows, as seen in Figure 14; C1401's top plate algorithm, as shown in Eqn. (2), is outlined as follows:
will be pulled down to the ground by inv1403, while the bottom
plate will be pulled up to VDD by M1401. On the other hand, f⇕ ðw0Þ; ⇑ ðr0; w1Þ; ⇑ðr1; w0Þ;
ð2Þ
as soon as Boost_EN is pulled up high, the PVB circuit starts ⇓ ðr0; w1Þ; ⇓ðr1; w0Þ; ⇕ðr0Þg
to work. If WR_EN is high, meaning one of the SRAM cells is
being accessed (read or write), the PVB circuit enters the
where w represents write access operation, r means read
voltage boosting mode, turning M1401 off. Then the top plate
operation, ⇕ represents either up or down count, ⇓ represents
of C1401 is pulled to a higher voltage through the pull‐up circuit
down counts, and ⇑ represents counting up. The timing dia-
of inv1403. Now, the supply voltage V’DD is at a higher level
gram for the BIST is shown in Figure 17; it has two testing
than when the PDP is on the waiting mode, V’DD = VDD −
modes, a normal testing mode and a retention testing mode. A
VDS1401. This now has a value V’DD = VDD + ∆V, which in
linear feedback shift register (LFSR) pseudo‐random number
this 40‐nm CMOS process is 1.0 V. The illustrative timing
generator is used to generate the test patterns with a charac-
diagram of the PVB circuit is presented in Figure 15.
teristic equation as follows:

2.7 | Built‐in self‐test (BIST) f ðxÞ ¼ x5 þ x4 þ x3 þ x þ 1 ð3Þ

In every memory system, as shown in Figure 1, a BIST circuit


is essential for high reliability. The BIST circuit block diagram
is shown in Figure 16. It is composed of a pattern generator, a
3 | CHIP IMPLEMENTATION AND
controller, and an output response analyser. The BIST circuit
MEASUREMENT
implements the control and output response pattern based on
The prototype SRAM was implemented using a TSMC 40‐
the March C‐algorithm [17], which has moderate complexity
nm CMOS process. Figure 18 shows the layout of the
and fault coverage. It was tested for transition faults, address‐
entire system, with the floor‐plan at the right‐hand side,
decoder faults, stuck‐at faults, coupling faults, data retention
where the overall chip area is 0.276 mm2 and the core area is
faults etc. The March C‐algorithm, which is the most reliable

FIGURE 16 BIST block diagram

FIGURE 14 Pass‐Transistor Voltage Boost (PVB) circuit

FIGURE 15 Gate drive boosting timing diagram FIGURE 17 BIST timing diagram
17518598, 2023, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cds2.12141, Wiley Online Library on [02/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
82
- WANG ET AL.

FIGURE 18 Layout of the proposed 1‐kb SRAM

0.031 mm2. The validation of the design is carried out in the


following sections: post‐layout simulations and on‐silicon
measurements. FIGURE 19 Static noise margin

3.1 | All‐PVT‐corner post‐layout simulations

A post‐layout all process, voltage, and temperature (PVT)


variations simulation is required before the system is fabricated
on silicon. The corners used in the post‐layout simulations are
as follows: 5 process variations (TT, SS, SF, FS, and FF), 3
supply voltage variations (0.9�VDD, VDD, and 1.1�VDD), and
3 temperature variations (0, 25, and 75) °C. Monte‐Carlo
simulations were executed a total of 100 times. All tests per-
formed showed the correct functionality of the system. It was
FIGURE 20 Dynamic noise margin
also ensured that there is high speed access when the AVD
circuit is activated.
Two of the most important measures for the quality of
SRAM operations are the static and dynamic noise margins.
The static noise margin of the proposed SRAM cell is assessed
by turning off the write‐assist loop and turning on M604, M605
in Figure 2. Then, the voltage on BLB is changed from logic
“0” (GND) to logic “1” (VDD). The curves of the induced
voltages on Q versus Qb are plotted, where the biggest square
in the generated transfer functions diagram is the SNM of this
cell. Figure 19 shows the static noise margin of the cells
showing the worst case of 412.3 mV. The static noise margin
has an asymmetrical shape, different from traditional differ-
ential SRAM cells, since the design employed single‐ended
architecture.
For the dynamic noise margin (DNM) assessment, it can
be found by using pulses with varying amplitude and pulse
width applied at node WL to find out if the state of node Q is
compromised. The dynamic noise margin of the designed
SRAM cell is shown in Figure 20, which shows that VDD for
the proposed SRAM cell to operate correctly can be as low as
0.3 V.
Figures 21 and 22 show all the PVT (process, voltage, and
temperature) corners simulations when the AVD is disabled
and enabled. It can be seen that when the AVD circuit is
enabled, the access speed of the SRAM has been improved.
Post‐layout simulation shows that the standby power goes
down from 17.734 to 3.432 μW, which is by 80.65% reduction. FIGURE 21 Simulations with AVD circuit disabled
17518598, 2023, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cds2.12141, Wiley Online Library on [02/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WANG ET AL.
- 83

FIGURE 24 Die photo of our SRAM prototype

metal density rules required for the 40‐nm CMOS process. A


total of 6 chips, measured 50 times each, were used for
testing in the measurement site as shown in Figure 25, where
this site is in the Tainan branch of TSRI (Taiwan Semi-
conductor Research Institute). The power supply is Agilent
N6761A, the Agilent 81250 pattern generator is used as test
vector generator, and the voltage measurements were taken
using the Agilent 54855A oscilloscope. The Mmasurements
show an improvement in the read delay of the system when
boost select is enabled.
For a system clock of 2 MHz, the read delay improvement
FIGURE 22 Simulations with AVD circuit enabled is from 229 to 64 ns. Figure 26a,b show the read/write timing
waveforms when BS = 0 and BS = 1, respectively, at the
system clock = 2 MHz.
Upon increasing the system clock to 10 MHz, the read
delay is reduced from 148 to 61 ns. Figure 27a,b show the
read/write timing waveforms when BS = 0 and BS = 1,
respectively, at the system clock = 10 MHz. The proposed
AVD produced a huge reduction in the access delay as ex-
pected. It was measured that the write‐0 delay is 0.250 ns, and
the write‐1 delay is 0.233 ns. The overall power consumption is
found to be 0.8 V � 30 μA = 24.0 μW.
Table 2 tabulates several previous low power/energy
SRAM designs using 28, 40, and 65‐nm CMOS technology
nodes in the past years. The energy/access is defined as the
F I G U R E 2 3 Hold state transient simulation in the worst corner (SS
corner, VDD = 0.72 V, T = 0 °C)
average energy dissipated while executing write‐0‐read‐0 and
write‐1‐read‐1 operations divided by the system clock rate. The
energy/bit, on the other hand, is defined as the energy/access
Referring to Figure 23, the post‐layout simulation of the pro- divided by the number of bits per word. The proposed SRAM
posed design in the worst corner (SS corner, VDD = 0.72 V, attained the second lowest energy per access with a value of
and T = 0 °C) is presented. A data bit is written in the cell first, 32 f J, second only to 20.6 fJ [18].
then the data is read, and afterwards, the cell is placed on its It does, however, demonstrate the lowest energy per bit
hold state. The simulation run shown is 800 ns with the cell in among all SRAMs designed and implemented using the 40‐nm
the hold state for over 710 ns. It can be seen that the data bit CMOS technology and measured on silicon. It even has a
written in Qb remains at the same value after a long standby lower energy per bit compared to one work implemented in
state; hence there is no droop in its content. It also shows the more advanced CMOS technology, that is, 28‐nm CMOS
lowered voltage level for the cell during standby operation. technology [18]. Referring to Figure 28, the SNMs of the
SRAMs in the previous decade are compared. The graph is
normalised to the respective voltage supply. As observed, the
3.2 | On‐chip measurements proposed SRAM's SNM is the closest to 50% of the supply
voltage, resulting in good noise immunity.
The die photo of the proposed SRAM array (1 kb) is shown These observations validate the fact that the proposed
in Figure 24. The details are hard to be observed because the power‐gating PMOS devices indeed account for the reduction
prototype is covered by metal layers due to the minimum of the standby power for those cells that are not activated.
17518598, 2023, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cds2.12141, Wiley Online Library on [02/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
84
- WANG ET AL.

FIGURE 25 Measurement setup for the SRAM prototype

F I G U R E 2 7 Access operation at (a) 10 MHz (BS = 0) and


(b) 10 MHz (BS = 1)

when the access operations are asserted. All of the measure-


ment results substantially prove the energy efficiency of the
proposed design.

4 | CONCLUDING REMARKS

FIGURE 26 Access operation at (a) 2 MHz (BS = 0) and (b) 2 MHz This investigation demonstrates a very low‐power SRAM
(BS = 1) architecture on silicon that has power supply gating that re-
sponds to the cell operations. The supply voltage is kept at a
lower level for SRAM cells that are not being accessed, which
Referring to the technology roadmap shown in Figure 29, the in turn creates a substantial decrease in standby power. Aside
proposed SRAM achieved the historical second lowest energy from the supply voltage gating, a power delay product
per bit in the last decade compared with previous works. If the reduction circuit is added to the design to further reduce the
CMOS processes normalise the energy per bit, namely 40‐nm power dissipation by decreasing the transient time of states.
(ours) versus 28‐nm [18], as well as the PDP reduction, the On the other hand, this extra circuit elevates the supply of
proposed SRAM is in fact the historical lowest one. The major the read/write circuit to a higher level when the read/write
reason is the addition of the power gating circuit in the single‐ circuit is accessed, in which the delay is significantly reduced.
ended cell wherein the standby power is significantly reduced. Post‐layout simulations verify the ultra‐low‐power perfor-
Meanwhile, the proposed AVD circuit manages to compensate mance, and the physical measurement also showed the ex-
for the access speed loss by generating a higher supply voltage pected low power/energy performance. The same design
17518598, 2023, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cds2.12141, Wiley Online Library on [02/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WANG ET AL.
- 85

TABLE 2 Performance comparison of current state‐of‐the‐art low‐energy/power SRAM designs

ISQED 2012 TCAS‐I 2017


Year VLSIC 2011 [19] [14] TCAS‐I 2014 [11] CICC 2015 [20] TVLSI 2016 [16] TVLSI 2017 [21] [22]
CMOS Tech. (nm) 40 40 40 28 40 65 65

Cell 8T 8T 12T 8T 5T 6T 9T

Supply volt. (V) 0.5 0.6 0.35 0.7 0.8 1.2 0.35
a
Verification Meas. Meas. Meas. Meas. Meas. Simu. Meas.

SNM (mV) N/A 86 119 171 353 N/A N/A

Read PDP (fJ) 88 N/A N/A 650 N/A 17.5 N/A

Capacity (kb) 512 4+1 4 64 4+1 1 4

Word length 16 16 16 16 5 32 64

Frequency (MHz) 6.25 10 11.5 50 54 100 0.741

Energy/access (pJ) 8.8 2.24 1.91 0.65 0.941 2.2 0.229

Energy/bit (fJ) 550 140 119.4 40.625 188.22 68.75 3.58


2
Core area (mm ) 0.73 1.278 0.018 0.73 0.024 0.013 0.011
TCAS‐II 2018 TCAS‐II 2020 TCAS‐I 2021 TCAS‐II 2021 TCAS‐II 2021
Year [23] JCSC 2019 [18] [24] [25] [26] [27] This work 2022
CMOS Tech. (nm) 65 28 65 55 40 16 40

Cell 6T 6T 8T 6T 6T 6T 6T

Supply volt. (V) 1.2 0.8 0.36 1.2 0.9 0.8 0.8
a
Verification Simu. Meas. Meas. Meas. Meas. Meas. Meas.

SNM (mV) N/A 292 190 N/A 377 504.76 412.3

Read PDP (fJ) N/A 444.5 4454.4 233.38 47.382 2.69 2.0592

Capacity (kb) 8 1+1 32 4 1 1 1

Word length 32 32 128 32 32 32 32

Frequency (MHz) 20 40 0.25 935 200 500 10 (typ.)

15 (max.)

Energy/access (pJ) 0.592 0.026 0.3 1.04 0.2313 0.219 0.032

Energy/bit (fJ) 18.5 0.6 2.34 32.5 7.23 6.8 1.0 @10 MHz

4.3 @2 MHz
2
Core area (mm ) 0.019 0.025 0.015 0.018 0.01 0.02 0.02
a
Simu. ‐ Simulations or Meas. ‐ Measurements on‐chip.

methodology is expected to be applied in more advanced


SRAM technology nodes, for example, 22, or even 16‐nm
FinFET nodes.

A UT HO R C O N TR I B U T I O N
Chua‐Chin Wang: Funding acquisition, Visualisation, Formal
analysis, Investigation, Methodology, Writing – review &
editing. Ralph Gerard B. Sangalang: Formal analysis, Investi-
gation, Methodology, Writing – review & editing. I‐Ting Tseng:
Conceptualisation, Methodology, Software, Validation. Yi‐Jen
Chiu: Funding acquisition, Visualisation. Yu‐Cheng Lin:
Funding acquisition, Visualisation, co‐corresponding. Oliver
FIGURE 28 Comparison of SNMs for SRAMs (normalised to VDD) Lexter July A. Jose: Resources, Writing – review & editing.
17518598, 2023, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cds2.12141, Wiley Online Library on [02/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
86
- WANG ET AL.

FIGURE 29 Technology roadmap of energy per bit for recent SRAMs

ACKN OW LE DG E ME N T 4. Kim, D., et al.: A 1.85fW/bit ultra low leakage 10T SRAM with speed
The National Science and Technology Council of Taiwan compensation scheme. In: Proceedings of the 2011 IEEE International
funded this study in part via grants MOST109‐2218‐E‐110‐ Symposium of Circuits and Systems (ISCAS); 2011 May 15– 18; Rio de
Janeiro, Brazil, pp. 69–72. IEEE, New York (2011). https://doi.org/10.
007, 108‐2218‐E‐110‐011, 108‐2218‐E‐110‐002, 107‐2218‐E‐ 1109/ISCAS.2011.5937503
110‐002, and 110‐2221‐E‐110‐063‐MY2. The authors would 5. Ruixing, L., et al.: Bitline leakage current com‐pensation circuit for high‐
like to express their profound appreciation to TSRI (Taiwan performance SRAM design. In: Proceedings of the 2012 IEEE Seventh
Semiconductor Research Institute) in NARL (National Applied International Conference on Networking, Architecture, and Storage;
2012 Jun 28–30; Xiamen, China, pp. 109–113. IEEE, New York (2012).
Research Laboratories), Taiwan, for providing EDA tool sup-
https://doi.org/10.1109/NAS.2012.19
port, fabrication service, and measurement setup. 6. Agawa, K., et al.: A bitline leakage compensation scheme for low‐voltage
SRAMs. IEEE J. Solid State Circ. 36(5), 726–734 (2001). https://doi.
CONF L ICT OF I N T ER E ST STAT E M EN T org/10.1109/4.918909
None. 7. Wang, C.‐C., et al.: 4‐kB 500‐MHz 4‐T CMOS SRAM using low‐VTHN
bitline drivers and high‐VTHP latches. IEEE Trans. VLSI Syst. 12(9),
901–909 (2004). https://doi.org/10.1109/TVLSI.2004.833669
DATA AVA IL AB I LI T Y S TAT E M EN T 8. Wang, C.‐C., Lee, C.‐L., Lin, W.‐J.: A 4‐kB low‐power SRAM design with
The data that support the findings of this study are available negative word‐line scheme. IEEE Trans. Circuits‐I 54(5), 1069–1076
from the corresponding author upon reasonable request (2007). https://doi.org/10.1109/TCSI.2006.888767
9. Wang, D.‐S., Su, Y.‐H., Wang, C.‐C.: A readout circuit with cell output
slew rate compensation for 5T single‐ended 28 nm CMOS SRAM.
PE RM ISSI O N T O R E PROD U CE M A T ER I A LS Microelecton J. 70, 107–116 (2017). https://doi.org/10.1016/j.mejo.
FR OM OTHE R S OU R CE S 2017.11.001
10. Chen, S.‐Y., Wang, C.‐C.: Single‐ended disturb‐free 5T loadless SRAM
None.
cell using 90 nm CMOS process. In: Proceedings of the 2012 IEEE
International Conference on IC Design and Technology; 2012 May 30–
ORCI D Jun 1; Austin, TX, USA, pp. 1–4. IEEE, New York (2012). https://doi.
Chua‐Chin Wang https://orcid.org/0000-0002-2426-2879 org/10.1109/ICICDT.2012.6232848
Ralph Gerard B. Sangalang https://orcid.org/0000-0002- 11. Chiu, Y.‐W., et al.: 40nm bit‐interleaving 12T subthreshold SRAM with
4120-382X data‐aware write‐assist. IEEE Trans. Circuits‐I 61(9), 2578–2585 (2014).
https://doi.org/10.1109/TCSI.2014.2332267
12. Reddy, S., Sangalang, R.G.B., Wang, C.‐C.: Sub‐0.2 pJ/access Schmitt
R EF ER EN CES trigger‐based 1‐kb 8T SRAM implemented using 40‐nm CMOS process.
1. Morifuji, E., et al.: Supply and threshold‐voltage trends for scaled logic In: Proceedings of the 2022 International Conference on IC Design and
and SRAM MOSFETs. IEEE Trans. Electron. Dev. 53(6), 1427–1432 Technology (ICICDT); 2022 Sep 21–23; Hanoi, Vietnam, pp. 24–27.
(2006). https://doi.org/10.1109/TED.2006.874752 IEEE, New York (2022). https://doi.org/10.1109/ICICDT56182.2022.
2. Xu, H., et al.: A current mode sense amplifier with self‐compensation 9933116
circuit for SRAM application. In: Proceedings of the 2013 IEEE 10th 13. Frustaci, F., et al.: Techniques for leakage energy reduction in deep
International Conference on ASIC; 2013 Oct 28–31; Shenzhen, China, submicrometer cache memories. IEEE Trans. VLSI Syst. 14(11),
pp. 1–4. IEEE, New York (2013). https://doi.org/10.1109/ASICON. 1238–1249 (2006). https://doi.org/10.1109/TVLSI.2006.886397
2013.6812020 14. Terada, M., et al.: A 40‐nm 256‐kb 0.6‐V operation half‐select resilient 8T
3. Do, A.‐T., et al.: Design and sensitivity analysis of a new current‐mode SRAM with sequential writing technique enabling 367‐mV VDDmin
sense amplifier for low‐power SRAM. IEEE Trans. VLSI Syst. 19(2), reduction. In: Proceedings of the 13th International Symposium on
196–204 (2011). https://doi.org/10.1109/TVLSI.2009.2033110 Quality Electronic Design (ISQED); 2012 Mar 19–21; Santa Clara, CA,
17518598, 2023, 2, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cds2.12141, Wiley Online Library on [02/08/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
WANG ET AL.
- 87

USA, pp. 489–492. IEEE, New York (2012). https://doi.org/10.1109/ 22. Shin, K., Choi, W., Park, J.: Half‐select free and bit‐line sharing 9T SRAM
ISQED.2012.6187538 for reliable supply voltage scaling. IEEE Trans Circuits‐I. 64(8),
15. Wang, C.‐C., Tseng, I.‐T.: Ultra low power single‐ended 6T SRAM using 2036–2048 (2017). https://doi.org/10.1109/TCSI.2017.2691354
40 nm CMOS technology. In: Proceedings of the 2019 International 23. Surana, N., Mekie, J.: Energy efficient single‐ended 6‐T SRAM for
Conference on IC Design and Technology (ICICDT); 2019 Jun 17–19; multimedia applications. IEEE Trans. Circuits‐I 66(6), 1023–1027 (2019).
Suzhou, China, pp. 1–4. IEEE, New York (2019). https://doi.org/10. https://doi.org/10.1109/TCSII.2018.2869945
1109/ICICDT.2019.8790848 24. Do, A.‐T., Zeinolabedin, S.M.A., Kim, T.T.‐H.: Energy‐efficient data‐
16. Wang, C.‐C., et al.: A leakage compensation design for low supply voltage aware SRAM design utilizing column‐based data encoding. IEEE Trans.
SRAM. IEEE Trans. VLSI Syst. 24(5), 1761–1769 (2016). https://doi. Circuits‐II 67(10), 2154–2158 (2020). https://doi.org/10.1109/TCSII.
org/10.1109/TVLSI.2015.2484386 2019.2958668
17. Al‐Harbi, S.M., Gupta, S.K.: An efficient methodology for generating 25. Chen, J., et al.: Analysis and optimization strategies toward reliable and
optimal and uniform march tests. In: Proceedings of the 19th IEEE high‐speed 6T compute SRAM. IEEE Trans. Circuits‐I 68(4), 1520–1531
VLSI Test Symposium (VTS 2001); 2001 Apr 29– May 3; Marina Del (2021). https://doi.org/10.1109/TCSI.2021.3054972
Rey, CA, USA, pp. 231–237. IEEE, New York (2001). https://doi.org/ 26. Wang, C.‐C., Kuo, C.‐P.: 200‐MHz single‐ended 6T 1‐kb SRAM with
10.1109/VTS.2001.923444 0.2313 pJ energy/access using 40‐nm CMOS logic process. IEEE Trans.
18. Wang, C.‐C., et al.: A single‐ended 28‐nm CMOS 6T SRAM design with Circuits‐II 68(9), 3163–3166 (2021). https://doi.org/10.1109/TCSII.
read‐assist path and PDP reduction circuitry. J. Circ. Syst. Comput. 29(6), 2021.3091973
2050095 (2020). https://doi.org/10.1142/S0218126620500954 27. Wang, C.‐C., Sangalang, R.G.B., Tseng, I.‐T.: A single‐ended low power
19. Yoshimoto, S., et al.: A 40‐nm 0.5‐V 20.1‐µW/MHz 8T SRAM with low‐ 16‐nm FinFET 6T SRAM design with PDP reduction circuit. IEEE
energy disturb mitigation scheme. In: Proceedings of the 2011 Sympo- Trans. Circuits‐II 68(12), 3478–3482 (2021). https://doi.org/10.1109/
sium on VLSI Circuits ‐ Digest of Technical Papers; 2011 Jun 15–17; TCSII.2021.3123676
Kyoto, Japan, pp. 72–73. IEEE, New York (2011). https://ieeexplore.
ieee.org/document/5986220
20. Mori, H., et al.: A 298‐fJ/writecycle 650‐fJ/readcycle 8T three‐port
SRAM in 28‐nm FD‐SOI process technology for image processor. In: How to cite this article: Wang, C.‐C., et al.: A 1.0 f J
Proceedings of the 2015 IEEE Custom Integrated Circuits Conference energy/bit single‐ended 1 kb 6T SRAM implemented
(CICC); 2015 Sep 28–30; San Jose, CA, USA, pp. 1–4. IEEE, New York using 40 nm CMOS process. IET Circuits Devices Syst.
(2015). https://doi.org/10.1109/CICC.2015.7338360
17(2), 75–87 (2023). https://doi.org/10.1049/cds2.
21. Lee, J., et al.: A 17.5‐fJ/bit energy‐efficient analog SRAM for mixed‐
signal processing. IEEE Trans. VLSI Syst. 25(10), 2714–2723 (2017). 12141
https://doi.org/10.1109/TVLSI.2017.2664069

You might also like