Perspectives of Racetrack Memory Based On Current-Induced Domain Wall Motion: From Device To System
Perspectives of Racetrack Memory Based On Current-Induced Domain Wall Motion: From Device To System
Yue Zhang1, Chao Zhang3, Jacques-Olivier Klein1, Dafine Ravelosona1, Guangyu Sun3, Weisheng Zhao1,2
Email: [email protected] [email protected]
Institut dElectronique Fondamentale, Univ. Paris-Sud/UMR 8622 CNRS, Orsay, France
2
Spintronics Interdisciplinary Center, Beihang University, Beijing, China
3
Center for Energy-Efficient Computing and Applications, Peking University, Beijing, China
1
I.
INTRODUCTION
II.
381
! 0 + X! / = H + u / + f pin
X! ! 0 = v sin 2 0 + u
Output
Inputb
Vddp
MN3
SA
MTJ0
constant
Input
MN1 MN2
Inputb
MTJ1
is the
Iref
Ku
Hy
ENp
(2)
Vdda
Input MP1 MP2
(1)
Gnd
Gnd
Fig. 1.
Racetrack memory based on CIDWM, which is composed of one
write head (MTJ0), one read head (MTJ1) and one magnetic nanowire.
Writing circuit generates Iw to nucleate data or magnetic domain, propagation
circuit generates Ish to induce DW motion and sense amplifier (SA) generates
Ir to detect the magnetization direction. (the direction of Ish for the case of
SOT should be opposite to that for STT shown in this figure)
III.
P
Ms
Hw
Ku
TMR
Jc_nucleation
Description
Gilbert damping constant
Nonadiabatic coefficient
Spin polarization rate
Saturation magnetization
Walker breakdown field
Gyromagnetic ratio
DW width
Uniaxial anisotropy
TMR of write head MTJ
DW Nucleation critical current density
Default Value
0.045
0.02
0.49
0.66 MA/m
4.4 mT
0.176 THz/T
10 nm
0.41 MJ/m3
120%
57 GA/m2
Fig. 2.
Dependence of critical current density required and shifting energy
versus different number of bits per nanowire in magnetic field assisted
racetrack memory.
382
Iw
M2(414.5uA)
IV.
Ish
0 0 0 0
M0(12.7ns)
50
100
150
time (ns)
200
250
300
Fig. 3.
Transient simulation of 16-bit racetrack memory based on chiral
DW motions. Iw is used for inputting data, Ish is used for DW shifting.
12
Pt
Pd
CoFeB
(STT)
10
Component
Processor
L1 I/D
Cache
L2
Cache
Main
Memory
0
0.5
1.5
2.5
3.5
4.5
Configuration
4 simple cores, 2GHz, 1-way issue
32/32KB, 2-way, 64B line, private, LRU
SRAM, 1/1 cycle, 6.2/2.3pJ
16-way, 64B, shared, LRU
SRAM: 2MB, 10/10 cycle, 0.57/0.54nJ, 3438mW
RM: 64MB, 9/20/5 cycle, 0.50/0.55/0.50nJ, 1062mW
8GB, DDR3, 1600MHz, 120cycle, 12.8GB/s
RM
1.2
1
0.8
0.6
0.4
0.2
0
Fig. 4.
Dependence of nanowire length versus DW shifting current density
for Pt and Pd. Dash line shows the case for the conventional STT based
CoFeB racetrack memory.
SRAM
average
V (V)
blackscholes
bodytrack
canneal
dedup
facesim
ferret
fluidanimate
freqmine
streamcluster
swaptions
vips
x264
I (uA)
4.5
V (V)
500
0
-500
I (mA)
Fig. 5.
Comparison of system overall execution time. (RM: racetrack
memory)
383
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
RM
[8]
[9]
[10]
Fig. 6.
[11]
Average
blackscholes
bodytrack
canneal
dedup
facesim
ferret
fluidanimate
freqmine
streamcluster
swaptions
vips
x264
1.2
1
0.8
0.6
0.4
0.2
0
SRAM
[12]
[13]
[14]
V.
CONCLUSION
[15]
[16]
[17]
[18]
[19]
[20]
[21]
ACKNOWLEDGMENT
[22]
[23]
[24]
384