Chen 2019
Chen 2019
Magnetic skyrmion (MS), a vortexlike region with reversed magnetization in nanomagnets, has recently emerged as an exciting
development in the field of spintronics. It has a number of beneficial features, including remarkably high stability, ultralow depinning
current density, and extremely compact size. Due to these benefits, skyrmions have generated great interest in the design of spintronic
memory. In this paper, we evaluate the use of skyrmion-based memory as a last-level cache for general-purpose processors. In the
skyrmion-based memory structure, data can be densely packed as multiple bits in a long magnetic nanotrack. Write operations are
performed by injecting a spin-polarized current in the nanotrack. Since multiple skyrmions (each representing a bit) are packed into
a single nanotrack, they need to be accessed by shifting them along the nanotrack with a charge current passing through a spin-Hall
metal (SHM). We identify the following key challenges associated with MS-based cache design: 1) the high-current requirements
for skyrmion nucleation limit the density benefits offered by these structures, since the transistor supplying write currents is the
limiting factor that determines the bit-cell area; 2) the proposed nanotrack structure results in significant performance overheads
due to the latency arising from the shift operations; 3) the skyrmions move toward the edge of the nanotrack during shift operations
owing to the Magnus force. Hence, an additional idle operation time is required to relax skyrmions back through the repulsive
force from the edge; and 4) to avoid annihilation of skyrmions from the edge, the duration and the current density of the shift
operation have to be well controlled. To overcome these challenges, a multi-bit skyrmion cell with appropriate peripheral circuit
is proposed, considering the heterogeneity in the read/write characteristics. The density benefits are explored by performing the
layout of different multi-bit cells. We perform a systematic device-circuit-architecture co-design to evaluate the feasibility of our
proposal. Our experiments demonstrate the potential of, and the challenges involved in, using skyrmion-based memory as last-level
caches.
Index Terms— Dzyaloshinskii–Moriya interaction (DMI), magnetic skyrmion (MS), magnus force, spin-Hall metal (SHM).
TABLE I
C OMPARISON OF H IGH -K M ATERIALS U SED IN THE P RESENT
S IMULATIONS
directly, or by a vertical injection of a spin-polarized cur- C. Detection of the Presence of a Skyrmion (Read Operation)
rent perpendicular to the plane (CPP), which is obtained by
Electrical detection of skyrmions at room temperature
injecting a charge current through the SHM layer. We choose
through the magnetoresistance effect has been proposed and
CPP method in our proposed device structure as a skyrmion
recently demonstrated in experiments [31], [32]. In this paper,
undergoes a larger Slonczewski in-plane torque instead of a
we use this mechanism to perform a read operation. Specif-
smaller field like out-of-plane torque, higher velocities can be
ically, we introduce a read port that includes a read MTJ,
obtained with lower current densities.
a reference MTJ, and two access transistors. A read operation
The motion of skyrmions can be well explained by Theile’s
is performed by connecting the read wordlines (RWL) to VDD ,
equation [20]
driving BL to Vread , and SL to GND. Here, the ferromagnetic
G × v d − Dαv d + jspin = 0 (1) nanotrack at the read region serves as the “free layer” of the
read MTJ, and the resistance of the read MTJ is denoted as
where jspin represents the vertical spin current generated
Rsk (Rap ) with the presence (absence) of a skyrmion under the
from the charge current flowing through the SHM (in blue)
read MTJ. The voltage divider consisting of the reference MTJ
underlayer. The longitudinal and the transverse velocity can
with resistance (Rap ) in series with the read MTJ will drive
be written as
the output of the inverter high in the presence of a skyrmion,
αD y G and vice versa. It is to be noted that the trip point of the
v dx = 2 jspin, v d = 2 jspin (2)
G + α2 D2 G + α2 D2 inverter is selected between the maximum voltage (with the
Hence, for G = 0, the motion of skyrmions deviates from the absence of a skyrmion, high-resistance state) and the minimum
intended direction. The transverse motion of a skyrmion stops voltage (with the presence of a skyrmion, low-resistance state)
at a certain distance from the edge owing to the skyrmion-edge at node “A.” However, since the average magnetization of
interaction. The final displacement with respect to the edge a skyrmion is not parallel (m z = −1) to the fixed layer
decreases as SHM current density increases. Skyrmions are (m z = −1), the resistance change of the read MTJ is lower
annihilated if the applied charge current density is larger than here compared with a full parallel-to-antiparallel resistance
a certain value (Jani ), which is a function of the operation time. switching of an MTJ. To achieve sufficient resistance change
Fig. 2(a) shows that the critical annihilation current density can for read operation, we use an MTJ of diameter 20 nm and
be significantly increased by reducing the shift operation time. ∼ 200% magnetoresistance ratio. We also match the size of
Moreover, a high energy barrier is induced on the boundaries the skyrmion to the size of the read MTJ to ensure that the
by adhering high-K materials at the edges, allowing skyrmions region captured by the read MTJ is closer to m z = −1 (anti-
to be well confined in the nanotrack with larger current parallel to the fixed layer), which, in turn, leads to higher
injection [22], [23]. Fig. 2(b) shows the comparison of the magnetoresistance change. Table II gives the comparison of
critical annihilation current density (Jani ) for three different the voltage swing (V) at node “A” in Fig. 1 in the presence
high-K materials (FePt, Nd2 Fe14 B, SmCo5 ) with the edge and the absence of a skyrmion under a read current of ∼ 1.25
width ranging from 1 to 5 nm for < 1 ns shift duration. × 10−5 A by pulling up the BL voltage to Vread (0.8 V) and
The corresponding material parameters, adopted from [19], SL to GND. As shown in Table II, changing the width of the
[24], are shown in Table I. Utilizing high-K materials at nanotrack increases the skyrmion dimension (the region cap-
the edges, makes switching the spin direction much harder tured by the read MTJ is closer to m z = −1), which, in turn,
when a skyrmion approaches the edge due to the Magnus leads to a greater voltage swing as a higher magnetoresistance
force, thereby keeping the skyrmion in the nanotrack. The change could be achieved. However, this also increases the
velocity of skyrmions, which increases with increased current required reliable spacing between consecutive skyrmions to
density, is therefore, enhanced during shifts, achieving faster free them from repulsive force between neighboring skyrmions
shift operations. Note that the induced energy barrier from [33]. Moreover, since the fixed layer of the read MTJ is located
1500309 IEEE TRANSACTIONS ON MAGNETICS, VOL. 55, NO. 8, AUGUST 2019
shifted right to the next address “0 × 1.” “1” is written into the
address “0 × 0” during the next write cycle, and then the data
in the nanotrack are again shifted to the right. By repeatedly
writing data into the address “0 × 0” and subsequently right
shifting all stored data to the next address, a sequence of bits
can be written to the nanotrack. To read the stored data at,
say, address “0 × 5,” the bit is shifted right by two positions
to reach the location under the read MTJ. Similarly, to write
data at a specific address, we first shift the bit to the position
where the write port is located. Before writing a new data into
the address, the previously stored data are cleared by injecting
a current with spin polarization in the opposite direction to the
magnetization of the skyrmion center. To prevent stored data in
the nanotrack overflowing during shift operations, we extend
Fig. 3. Logical view of a multi-bit MS-based cell with (a) single write/read the nanotrack by having extra data bits (light yellow part
port or (b) single write, multiple read ports. A sequence of bits is stored in
the nanotrack. in Fig. 3). In the worst case scenario for this example, to access
the stored data at address “0 × 0,” the bit is required to be
shifted right by seven positions. Thus, seven extra bits are
at the center region of the nanotrack, the deviation between the required to avoid the loss of stored data from address “0 × 1”
position of a skyrmion and a read port degrades the resistance to “0 × 7.” The write/read latency is dependent on the location
change with the absence/presence of a skyrmion. Therefore, where a bit is stored. However, the average read latency can
a read operation to a specific skyrmion bit requires an idle be alleviated by introducing multiple read ports, as shown
operation after each shift operation, i.e., the total number of in Fig. 3(b). The current location of the read port is referred
idle operations is equal to the number of shift operations to as the current port status. In order to access a bit from a
required for reading a skyrmion bit. This operation relaxes the multi-bit cell, a shift controller determines the appropriate read
skyrmions back to the center region through edge repulsion. port and calculates the number of shift operations required by
We achieve this by turning all access transistors OFF, which comparing the input address bits with the current port status.
stabilizes the magnetization of the nanotrack. This also results in a reduction of the number of extra bits
required to avoid data loss. Table III lists the bias voltage
conditions for write/shift/read/clear/idle operations.
III. M ULTI -B IT S KYRMION C ELL D ESIGN
Fig. 3 shows the logical representation of data stored along
the nanotrack. Depending on the existence of a skyrmion, A. Density of the Skyrmion-Based Multi-Bit MS Cell
different logic values can be stored along the nanotrack as Fig. 4 shows the layout of an 8/16/32 bit MS cell with a
multiple bits. We denote the presence of a skyrmion to single write/read port. As discussed in Section II, the current
represent logic “1,” while its absence denotes logic “0.” A requirement for the write operation is considerably higher
current injected into the SHM (blue layer) from the right than that for the read and shift operations. Hence, as shown
can shift skyrmions to the right-hand side of the nanotrack, in Fig. 4(a), for an 8 bit MS cell with single write/read port,
and vice versa. The logical views of a multi-bit cell with a the cell area is dominated by the peripheral write transistors
single write/read port and a cell with single write and multiple since the dimension of the write transistors is much larger
read ports are shown in Fig. 3(a) and (b), respectively. Note than the nanotrack. Note that the length of the nanotrack is
that the read ports can be placed at any location along the determined by the number of stored bits and the read ports.
nanotrack; however, the write port is placed at the end of The total length of the nanotrack can be reduced by having
the long nanotrack to ensure simplicity for write operation. multiple read/write ports as fewer extra bits are required to
Consider Fig. 3(a) as an example. A write port at address prevent the stored data from being destroyed during shift
“0 × 0” and a read MTJ at address “0 × 7” with a sequence operations (light yellow part in Fig. 3). For the 8 bit MS cell
of 0’s and 1’s stored in the cell are presented. In the first write case with a single read port, the write transistors dominate the
cycle, “0” is written into the address “0 × 0,” and subsequently total cell area. Thus, the density (i.e., cell area per bit) of the
CHEN et al.: CACHE MEMORY DESIGN WITH MSS IN A LONG NANOTRACK 1500309
TABLE IV TABLE V
M ATERIAL PARAMETERS U SED FOR S IMULATION S YSTEM C ONFIGURATION
performance degradation over the SRAM and STT-MRAM energy outweigh the increase in shift energy, thereby leading
cache is attributed to the 2× higher cache capacity offered to improved cache energy. The energy benefits over SRAM
by the 16 bit configuration. For the 32 bit MS designs, reduce to 2.27× and 2.31× with the 32 bit MS high-K and
the performance improves by 0.4% with the high-K cache 32 bit MS no high-K designs, respectively. The benefits in
design and degrades by 0.6% for the design with no high-K energy are lower than the other two designs (8 bit and 16 bit
material, over the SRAM cache. This improvement is mainly configurations) since the resistance offered by the nanotrack
because of a reduced number of shift operations performed increases, which, in turn, increases the energy consumed for
on average with higher number of read and write ports in the each shift operation.
32 bit design. Note that the performance reduces by 0.6% and In contrast, the energy consumed by the MS-based cache
3.0%, respectively, over the STT-MRAM cache design, since designs is higher than the baseline iso-area STT-MRAM cache
the overall cache capacity does not increase with the 32 bit in all cases. Specifically, the 8 bit and 16 bit designs consume
MS design as discussed earlier. 1.29× and 1.27×, 1.30× and 1.28× higher energy than the
3) Energy Comparison: Fig. 10 illustrates the L2 cache STT-MRAM cache. Similarly, the 32 bit designs consume
energy consumed by the proposed cache designs compared to 1.37× and 1.34× energy over the STT-MRAM cache. This
the iso-area SRAM and STT-MRAM caches. The cache energy increase in energy is because of the additional shift energy
is normalized to the energy consumed by the STT-MRAM overheads and the reduced cache capacity arising from the
design. On average, we observe a 2.41× and 2.45× reduction larger write transistor requirements for the multi-bit MS cell.
in cache energy for the 8 bit MS high-K and 8 bit MS no In summary, our results show that skyrmion-based caches
high-K designs over the SRAM cache. This is due to the offer small improvements in performance with substantial
reduced leakage energy consumption with non-volatile MSs. energy reduction over an iso-area SRAM-based cache. They
The energy benefits are slightly higher for the 8 bit MS no also point to key avenues for improvement in skyrmion-based
high-K design because of lower shift energy consumed with memory—the high nucleation energy for skyrmions leads to
no high-K material on the nanotrack edges. For the 16 bit large write transistors, curtailing density benefits, while the
designs, the energy benefits were found to be 2.37× and latency due to shift operations limits the performance.
2.41× for the 16 bit-MS high-K and 16 bit MS no high-
K designs, respectively. The energy benefits are moderately VII. C ONCLUSION
lower for the 16 bit configurations over the 8 bit designs due In this paper, we explored MSs to design last-level caches.
to a higher energy consumed by the shift operations. Note that, We propose a multi-bit skyrmion-based cell design that packs
for a subset of benchmarks (canneal, ferret, streamclust, and multiple bits in a nanotrack. Since the size and spacing of
vips), the 16 bit configurations have a lower energy compared skyrmions can be down to nanometer scale, the skyrmion-
to the 8 bit designs. This is because of the lower capacity based nanotrack has the potential to provide significant density
misses observed in the 16 bit designs that eventually lead to benefits compared to other memory technologies. However,
lower write energy. In these benchmarks, the benefits in write the high current requirements for skyrmion nucleation is a
CHEN et al.: CACHE MEMORY DESIGN WITH MSS IN A LONG NANOTRACK 1500309
bottleneck to achieving significant density benefits. We ana- [17] F. Chen, Z. Li, W. Kang, W. Zhao, H. Li, and Y. Chen, “Process variation
lyzed different device tunings and design tradeoffs associated aware data management for magnetic skyrmions racetrack memory,”
in Proc. 23rd Asia South Pacific Design Automat. Conf. (ASP-DAC),
with the proposed bit cell and evaluated the area, performance, Jan. 2018, pp. 221–226.
and energy benefits while accounting for the peripheral cir- [18] R. Venkatesan, M. Sharad, K. Roy, and A. Raghunathan, “DWM-
cuit requirements. We designed a device–circuit–architecture TAPESTRI-An energy efficient all-spin cache using domain wall shift
based writes,” in Proc. Design, Automat. Test Eur. Conf. Exhib. (DATE),
framework to evaluate the system-level benefits of the pro- Mar. 2013, pp. 1825–1830.
posed design. Our experiments reveal considerable benefits [19] J. Sampaio, V. Cros, S. Rohart, A. Thiaville, and A. Fert, “Nucleation,
over an iso-area SRAM cache. However, the energy and stability and current-induced motion of isolated magnetic skyrmions
in nanostructures,” Nature Nanotechnol., vol. 8, no. 11, pp. 839–844,
performance are lower than an iso-area STT-MRAM cache, Nov. 2013.
suggesting the need for mechanisms to lower the current [20] A. A. Thiele, “Steady-state motion of magnetic domains,” Phys. Rev.
density requirements for skyrmion nucleation. Lett., vol. 30, no. 6, p. 230, Feb. 1973.
[21] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The PARSEC benchmark
suite: Characterization and architectural implications,” in Proc. Int. Conf.
ACKNOWLEDGMENT Parallel Archit. Compilation Techn. (PACT), Oct. 2008, pp. 72–81.
[22] P. Lai et al., “An improved racetrack structure for transporting a
This work was supported in part by the Center for Spintron- skyrmion,” Sci. Rep., vol. 7, Mar. 2017, Art. no. 45330.
ics, in part by Semiconductor Research Corporation (SRC) and [23] H. T. Fook, W. L. Gan, I. Purnama, and W. S. Lew, “Mitigation of
magnus force in current-induced skyrmion dynamics,” IEEE Trans.
Microelectronics Advanced Research Corporation (MARCO), Magn., vol. 51, no. 11, pp. 1–4, Nov. 2015.
in part by National Science Foundation, and in part by the [24] G. Zhao, X. Zhang, and F. Morvan, “Theory for the coercivity and
Vannevar Bush Faculty Fellowship. its mechanisms in nanostructured permanent magnetic materials,” Rev.
Nanoscience Nanotechnol., vol. 4, no. 1, pp. 1–25, Apr. 2015.
R EFERENCES [25] C. Reichhardt, D. Ray, and C. J. O. Reichhardt, “Quantized transport
for a skyrmion moving on a two-dimensional periodic substrate,” Phys.
[1] S. Parkin, M. Hayashi, and L. Thomas, “Magnetic domain-wall racetrack Rev. B, vol. 91, no. 10, Mar. 2015, Art. no. 104426.
memory,” Science, vol. 320, no. 5873, pp. 190–194, Apr . 2008. [26] C. Navau, N. Del-Valle, and A. Sanchez, “Interaction of isolated
[2] R. Venkatesan, V. Kozhikkottu, C. Augustine, A. Raychowdhury, skyrmions with point and linear defects,” J. Magn. Magn. Mater.,
K. Roy, and A. Raghunathan, “Tapecache: a high density, energy vol. 465, pp. 709–715, Nov. 2018.
efficient cache based on domain wall memory,” in Proc. ACM/IEEE [27] D. Stosic, T. B. Ludermir, and M. V. Miloševic, “Pinning of magnetic
Int. Symp. Low Power Electron. Design. New York, NY, USA: ACM, skyrmions in a monolayer Co film on Pt(111): Theoretical characteriza-
Jul. 2012, pp. 185–190. tion and exemplified utilization,” Phys. Rev. B, Condens. Matter, vol. 96,
[3] Z. Sun, W. Wu, and H. Li, “Cross-layer racetrack memory design no. 21, 2017, Art. no. 214403.
for ultra high density and low power consumption,” in Proc. 50th [28] X. Ma, C. J. O. Reichhardt, and C. Reichhardt, “Reversible vector
ACM/EDAC/IEEE Design Automat. Conf. (DAC). ACM, May 2013, ratchets for skyrmion systems,” Phys. Rev. B, vol. 95, no. 10, Mar. 2017,
pp. 1–6. Art. no. 104401.
[4] A. Ranjan, S. G. Ramasubramanian, R. Venkatesan, V. Pai, [29] C. Reichhardt, D. Ray, and C. J. O. Reichhardt, “Magnus-induced ratchet
K. Roy, and A. Raghunathan, “DyReCTape: A dynamically recon- effects for skyrmions interacting with asymmetric substrates,” New
figurable cache using domain wall memory tapes,” in Proc. Design, J. Phys., vol. 17, no. 7, Jul. 2015, Art. no. 073034.
Autom. Test Eur. Conf. Exhib. (DATE). Grenoble, France, Mar. 2015, [30] C. Reichhardt and C. J. O. Reichhardt, “Noise fluctuations and drive
pp. 181–186. dependence of the skyrmion Hall effect in disordered systems,” New
[5] R. Venkatesan, S. G. Ramasubramanian, S. Venkataramani, K. Roy, and J. Phys., vol. 18, no. 9, Sep. 2016, Art. no. 095005.
A. Raghunathan, “STAG: Spintronic-tape architecture for GPGPU cache [31] C. Hanneken, et al., “Electrical detection of magnetic skyrmions by tun-
hierarchies,” in Proc. ACM/IEEE 41st Int. Symp. Comput. Archit. (ISCA). nelling non-collinear magnetoresistance,” Nature Nanotechnol., vol. 10,
Jun. 2014, pp. 253–264. no. 12, pp. 1039–1042, Dec. 2015.
[6] A. Thiaville, Y. Nakatani, J. Miltat, and Y. Suzuki, “Micromag- [32] D. Maccariello et al., “Electrical detection of single magnetic skyrmions
netic understanding of current-driven domain wall motion in patterned in metallic multilayers at room temperature,” Nature Nanotechnol.,
nanowires,” EPL (Europhys. Lett.), vol. 69, no. 6, p. 990, Feb. 2005. vol. 13, no. 3, pp. 233–237, Mar. 2018.
[7] A. Fert, V. Cros, and J. Sampaio, “Skyrmions on the track,” Nature [33] X. Zhang et al., “Skyrmion-skyrmion and skyrmion-edge repulsions in
Nanotechnol., vol. 8, no. 3, pp. 152–156, 2013. skyrmion-based racetrack memory,” Sci. Rep., vol. 5, p. 7643, Jan. 2015.
[8] N. Nagaosa and Y. Tokura, “Topological properties and dynamics of [34] M. Najafi et al., “Proposal for a standard problem for micromagnetic
magnetic skyrmions,” Nature Nanotechnol., vol. 8, no. 12, pp. 899–911, simulations including spin-transfer torque,” J. Appl. Phys., vol. 105,
2013. no. 11, Jun. 2009, Art. no. 113914.
[9] R. Wiesendanger, “Nanoscale magnetic skyrmions in metallic films and [35] A. Vansteenkiste, J. Leliaert, M. Dvornik, M. Helsen, F. Garcia-Sanchez,
multilayers: A new twist for spintronics,” Nature Rev. Mater., vol. 1, and B. Van Waeyenberge, “The design and verification of MuMax3,”
no. 7, Jul. 2016, Art. no. 016044. AIP Adv., vol. 4, no. 10, Oct. 2014, Art. no. 107133.
[10] R. Tomasello, E. Martinez, R. Zivieri, L. Torres, M. Carpentieri, and [36] P. J. Metaxas et al., “Creep and flow regimes of magnetic domain-wall
G. Finocchio, “A strategy for the design of skyrmion racetrack memo- motion in ultrathin Pt/Co/Pt films with perpendicular anisotropy,” Phys.
ries,” Sci. Rep., vol. 4, p. 6784, Oct. 2014. Rev. Lett., vol. 99, no. 21, Nov. 2007, Art. no. 217208.
[11] I. Dzyaloshinsky, “A thermodynamic theory of weak ferromagnetism of [37] X. Fong, S. K. Gupta, N. N. Mojumder, S. H. Choday, C. Augustine,
antiferromagnetics,” J. Phys. Chem. Solids, vol. 4, no. 4, pp. 241–255, and K. Roy, “KNACK: A hybrid spin-charge mixed-mode simulator
1958. for evaluating different genres of spin-transfer torque MRAM bit-cells,”
[12] T. Moriya, “New mechanism of anisotropic superexchange interaction,” in Proc. Int. Conf. Simul. Semicond. Processes Devices, Sep. 2011,
Phys. Rev. Lett., vol. 4, no. 5, p. 228, Mar. 1960. pp. 51–54.
[13] S. Mühlbauer, et al., “Skyrmion lattice in a chiral magnet,” Science, [38] L. Liu, T. Moriyama, D. C. Ralph, and R. A. Buhrman, “Spin-torque
vol. 323, no. 5916, pp. 915–919, Feb. 2009. ferromagnetic resonance induced by the spin hall effect,” Phys. Rev.
[14] X. Z. Yu et al., “Real-space observation of a two-dimensional skyrmion Lett., vol. 106, no. 3, Jan. 2011, Art. no. 036601.
crystal,” Nature, vol. 465, no. 7300, pp. 901–904, Jun. 2010. [39] CACTI. Accessed: Apr. 9, 2019. [Online]. Available: www.hpl.
[15] X. Z. Yu et al., “Near room-temperature formation of a skyrmion crystal hp.com/research/cacti
in thin-films of the helimagnet FeGe,” Nature Mater., vol. 10, no. 2, [40] N. Binkert et al., “ The gem5 simulator,” SIGARCH Comput. Arch.
pp. 106–109, 2011. News, vol. 39, no. 2, pp. 1–7, Aug. 2011.
[16] S. Heinze et al., “Spontaneous atomic-scale magnetic skyrmion lat- [41] R. Perricone et al., “ Advanced spintronic memory and logic for
tice in two dimensions,” Nature Phys., vol. 7, no. 9, pp. 713–718, non-volatile processors,” in Proc. Design Automat. Test Eur. Conf.
Sep. 2011. Exhib. (DATE), Mar. 2017, pp. 972–977.