0% found this document useful (0 votes)
14 views

Chen 2019

This paper evaluates using magnetic skyrmions as cache memory in processors. Skyrmions can store multiple bits densely in a nanotrack but have challenges like high write currents limiting density and variable access times from shifting skyrmions. The paper proposes a multi-bit skyrmion cell and circuit design to address these issues and evaluates skyrmion memory's potential as a cache.

Uploaded by

Yousef Zahran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Chen 2019

This paper evaluates using magnetic skyrmions as cache memory in processors. Skyrmions can store multiple bits densely in a nanotrack but have challenges like high write currents limiting density and variable access times from shifting skyrmions. The paper proposes a multi-bit skyrmion cell and circuit design to address these issues and evaluates skyrmion memory's potential as a cache.

Uploaded by

Yousef Zahran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

IEEE TRANSACTIONS ON MAGNETICS, VOL. 55, NO.

8, AUGUST 2019 1500309

Cache Memory Design With Magnetic Skyrmions


in a Long Nanotrack
Mei-Chin Chen , Ashish Ranjan , Anand Raghunathan, Fellow, IEEE, and Kaushik Roy, Fellow, IEEE
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47906 USA

Magnetic skyrmion (MS), a vortexlike region with reversed magnetization in nanomagnets, has recently emerged as an exciting
development in the field of spintronics. It has a number of beneficial features, including remarkably high stability, ultralow depinning
current density, and extremely compact size. Due to these benefits, skyrmions have generated great interest in the design of spintronic
memory. In this paper, we evaluate the use of skyrmion-based memory as a last-level cache for general-purpose processors. In the
skyrmion-based memory structure, data can be densely packed as multiple bits in a long magnetic nanotrack. Write operations are
performed by injecting a spin-polarized current in the nanotrack. Since multiple skyrmions (each representing a bit) are packed into
a single nanotrack, they need to be accessed by shifting them along the nanotrack with a charge current passing through a spin-Hall
metal (SHM). We identify the following key challenges associated with MS-based cache design: 1) the high-current requirements
for skyrmion nucleation limit the density benefits offered by these structures, since the transistor supplying write currents is the
limiting factor that determines the bit-cell area; 2) the proposed nanotrack structure results in significant performance overheads
due to the latency arising from the shift operations; 3) the skyrmions move toward the edge of the nanotrack during shift operations
owing to the Magnus force. Hence, an additional idle operation time is required to relax skyrmions back through the repulsive
force from the edge; and 4) to avoid annihilation of skyrmions from the edge, the duration and the current density of the shift
operation have to be well controlled. To overcome these challenges, a multi-bit skyrmion cell with appropriate peripheral circuit
is proposed, considering the heterogeneity in the read/write characteristics. The density benefits are explored by performing the
layout of different multi-bit cells. We perform a systematic device-circuit-architecture co-design to evaluate the feasibility of our
proposal. Our experiments demonstrate the potential of, and the challenges involved in, using skyrmion-based memory as last-level
caches.
Index Terms— Dzyaloshinskii–Moriya interaction (DMI), magnetic skyrmion (MS), magnus force, spin-Hall metal (SHM).

I. I NTRODUCTION Magnetic skyrmions (MSs) have recently emerged as a


promising alternative for future memories [7]–[10]. They can
I NCREASED leakage current and process variations are a
major challenge to memories realized using deeply scaled
CMOS devices. The need for non-volatility (zero OFF-state
be observed in non-centrosymmetric bulk magnetic materials
or ultra-thin magnetic systems with breaking inversion sym-
leakage), higher density, and robustness has consequently metry and large spin orbital coupling. The state of a MS
led researchers to explore alternative technologies to replace can be explained by the presence of Dzyaloshinskii–Moriya
traditional CMOS-based on-chip memories. Several emerging interaction (DMI) [11], [12]—the DMI between two atomic
technologies such as phase change memory (PCM), resistive spins S1 and S2 with a neighboring atom can be expressed
random access memory (RRAM), spin-transfer torque mag- as HDM = −D1,2 · (S1 × S2 ) where D1,2 is the DM vector
netic RAM (STT-MRAM), and domain wall motion (DWM)- [7], [8], [13]–[16]. MSs have been shown to possess sev-
based memory have been proposed as potential substitutes for eral benefits over DWM-based racetrack memory in terms
SRAM and DRAM. One such promising high-density memory of stability, density and are less limited by imperfectness
technology, DWM-based racetrack memory, was proposed of the material. Specifically, topological properties prevent
by IBM [1]. In a racetrack memory, multiple data bits can the motion of skyrmions from being pinned at defect sites
be coded in a sequence of magnetic domains, separated by in a magnetic layer, and thus, skyrmions are more robust
domain walls, within a nanowire. DWM-based caches [2]–[5] information carriers.
have shown significant improvement in performance (with MSs can be stored as multiple bits in a long nanotrack to
higher packing density and better energy efficiency) over realize highly dense memory. Chen et al. [17] first demon-
other spintronic memory devices. However, the motion of strated the use of MSs to realize on-chip caches. This paper
domain walls might be pinned by the presence of defects proposed the use of a shift-based write mechanism [18] for
[6], raising concerns about the feasibility of DWM-based the creation of skyrmions. However, such an approach is
memory. considered to be applicable for DWM-based device. No exper-
imental (or simulation) results to date have demonstrated
the creation of skyrmions using the shift-based mechanism.
Manuscript received April 29, 2018; revised February 4, 2019 and March 11,
2019; accepted March 27, 2019. Date of publication April 24, 2019; date of In our work, a MS is written (or nucleated) by injecting a
current version July 18, 2019. Corresponding author: M.-C. Chen (e-mail: local spin-polarized current in the nanotrack, whereas the read
[email protected]). operation is performed by sensing the change in resistance
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. arising from the presence (or absence) of skyrmion at a specific
Digital Object Identifier 10.1109/TMAG.2019.2909188 location in the nanotrack. In order to read or write a bit stored
0018-9464 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1500309 IEEE TRANSACTIONS ON MAGNETICS, VOL. 55, NO. 8, AUGUST 2019

in the nanotrack, a variable number of shift operations are


required depending on the location relative to the read/write
port. The noticeably high density and non-volatility offered
by MS-based memory are key positives for last level on-chip
cache applications.
We explore the use of MSs as last-level on-chip caches
in general-purpose processors. We propose a multi-port
skyrmion-based cell and evaluate its potential in realizing
an on-chip memory array. Despite possessing a number of
beneficial attributes such as high stability, non-volatility, high
Fig. 1. Schematic of MS-based device and bit-cell. The proposed device
density,1 and low leakage, MSs pose certain challenges: structure can perform read/write/shift operations. A skyrmion can be nucleated
1) the current density required for skyrmion nucleation [19] in the nanotrack (yellow layer) by injecting a spin-polarized current through
is substantially higher, necessitating the need for large access the left MTJ. The motion of skyrmions can be driven by utilizing vertical
injection of a spin current generated from a charge current flowing through
transistors for writing a skyrmion, in turn limiting the density the SHM layer (blue layer). The reference MTJ is used to form a voltage
benefits; 2) the variable access latency arising from packing divider on the read port, and the presence of a skyrmion can be detected by
multiple bits in a single nanotrack leads to energy and per- sensing the voltage at the output of the inverter.
formance overheads; 3) the motion of skyrmions drifts away
from the direction of electron flow owing to the Magnus
force [20]. In order to relax the skyrmions back to the center II. S KYRMION -BASED M EMORY
region of the nanotrack, an idle operation time is needed, Fig. 1 shows the proposed MS-based device structure in
which leads to additional shift latency; and 4) skyrmions which skyrmions are stored in a ferromagnetic nanotrack adja-
might suffer annihilation through the edges due to large drive cent to an spin-Hall metal (SHM). To realize a bit-cell using
current density required for high-speed operation. To address this structure, we need to perform three different operations:
these challenges, we perform a design-space exploration for 1) a write operation; 2) a shift and an idle operation; and 3) a
the multi-bit skyrmion cell while considering the peripheral read operation. In the following paragraphs, we describe these
circuits required to perform these operations. We also per- operations in detail along with the peripheral circuits required
formed layout to estimate the density benefits of the proposed to perform these operations.
multi-bit cell. To keep skyrmions enclosed in the nanotrack
under high current injection, it is essential to analyze various
design choices possible and their impacts on system energy A. Nucleation of a Skyrmion (Write Operation)
and performance. We developed a device-circuit-architecture A skyrmion is nucleated in the nanotrack by injecting a
framework to understand these design points for the proposed local spin-polarized current through the MTJ on the left (write
multi-bit cell. MTJ). This is performed by charging the bitline (BL) to
The key contributions of this paper are as follows. VWRITE, sourceline (SL) to ground (GND), and turning on the
1) We explore the feasibility of last-level cache design for write access transistors by driving the write wordlines (WWL)
general-purpose processors with MS-based memory. to VDD . Nucleating a skyrmion requires that the injected
2) We propose a MS-based multi-bit cell and utilize suit- spin-polarized current exceeds certain threshold J th [19].
able circuit and architecture optimizations that mitigate We exploit spin-polarized current generated from the electrical
the unique challenges posed by the skyrmion structure. current through a 20 nm diameter write MTJ to create a
3) We develop a systematic device-to-architecture skyrmion. The proposed device structure consists of a 0.4 nm
co-design framework and perform an in-depth analysis thick ferromagnetic nanotrack adjacent to a 3 nm thick SHM.
of the density benefits, along with the energy and Detailed material parameters used in our simulations will be
performance tradeoffs associated with the proposed covered in Section V. In our simulation, a stable skyrmion can
skyrmion-based cache. Our experiments on the PARSEC be nucleated in a 60 nm nanotrack by injecting a spin-polarized
benchmark suite [21] demonstrate 2.41× improvement current through the write MTJ for 50 ps with a current
in cache energy with 2% average degradation in cache density of 6.8 × 1012 A/m2 . Note that the presence of DMI
performance over an iso-area traditional SRAM-based necessitates the need of high current density for nucleation.
L2 cache.
The rest of this paper is organized as follows. Section II B. Motion of Skyrmions (Shift Operation)
presents the fundamentals of skyrmion-based device and bit
cell. The design and optimization of a multi-bit skyrmion Skyrmions are packed as multiple bits in a long nan-
cell are described in Section III. Section IV demonstrates the otrack. Hence, in order to access a specific bit stored in
memory array organization. Section V presents the experimen- a long nanotrack, the corresponding skyrmion needs to be
tal methodology and the results are presented in Section VI. placed underneath the write (read) port via shift operations.
Finally, we conclude this paper in Section VII. A shift operation is accomplished by connecting shift word-
lines (SWL) to VDD , and precharging BL and SL to appropriate
1 The sizes of skyrmions and the spacing between them can be potentially voltage. The motion of skyrmions can be controlled by an
shrunk down to the nanometer scale. in-plane spin-polarized current flowing through the nanotrack
CHEN et al.: CACHE MEMORY DESIGN WITH MSS IN A LONG NANOTRACK 1500309

TABLE I
C OMPARISON OF H IGH -K M ATERIALS U SED IN THE P RESENT
S IMULATIONS

the high-K materials is known to depend on the width of


adhering high-K and the material properties [22]. Note that the
Fig. 2. (a) Critical annihilation current density versus various shift operation
time. (b) Critical annihilation current density in a 1 ns shift operation versus skyrmion Hall effect could be addressed by having notches,
various high-K materials with different width. The annihilation current density pinning effects, or ratchets geometries. However, these effects
is less sensitive to the width of the high-K materials with higher anisotropy are not considered in this paper. The interested readers are
constant.
referred to [25]–[30] for more detail.

directly, or by a vertical injection of a spin-polarized cur- C. Detection of the Presence of a Skyrmion (Read Operation)
rent perpendicular to the plane (CPP), which is obtained by
Electrical detection of skyrmions at room temperature
injecting a charge current through the SHM layer. We choose
through the magnetoresistance effect has been proposed and
CPP method in our proposed device structure as a skyrmion
recently demonstrated in experiments [31], [32]. In this paper,
undergoes a larger Slonczewski in-plane torque instead of a
we use this mechanism to perform a read operation. Specif-
smaller field like out-of-plane torque, higher velocities can be
ically, we introduce a read port that includes a read MTJ,
obtained with lower current densities.
a reference MTJ, and two access transistors. A read operation
The motion of skyrmions can be well explained by Theile’s
is performed by connecting the read wordlines (RWL) to VDD ,
equation [20]
driving BL to Vread , and SL to GND. Here, the ferromagnetic
G × v d − Dαv d + jspin = 0 (1) nanotrack at the read region serves as the “free layer” of the
read MTJ, and the resistance of the read MTJ is denoted as
where jspin represents the vertical spin current generated
Rsk (Rap ) with the presence (absence) of a skyrmion under the
from the charge current flowing through the SHM (in blue)
read MTJ. The voltage divider consisting of the reference MTJ
underlayer. The longitudinal and the transverse velocity can
with resistance (Rap ) in series with the read MTJ will drive
be written as
the output of the inverter high in the presence of a skyrmion,
αD y G and vice versa. It is to be noted that the trip point of the
v dx = 2 jspin, v d = 2 jspin (2)
G + α2 D2 G + α2 D2 inverter is selected between the maximum voltage (with the
Hence, for G = 0, the motion of skyrmions deviates from the absence of a skyrmion, high-resistance state) and the minimum
intended direction. The transverse motion of a skyrmion stops voltage (with the presence of a skyrmion, low-resistance state)
at a certain distance from the edge owing to the skyrmion-edge at node “A.” However, since the average magnetization of
interaction. The final displacement with respect to the edge a skyrmion is not parallel (m z = −1) to the fixed layer
decreases as SHM current density increases. Skyrmions are (m z = −1), the resistance change of the read MTJ is lower
annihilated if the applied charge current density is larger than here compared with a full parallel-to-antiparallel resistance
a certain value (Jani ), which is a function of the operation time. switching of an MTJ. To achieve sufficient resistance change
Fig. 2(a) shows that the critical annihilation current density can for read operation, we use an MTJ of diameter 20 nm and
be significantly increased by reducing the shift operation time. ∼ 200% magnetoresistance ratio. We also match the size of
Moreover, a high energy barrier is induced on the boundaries the skyrmion to the size of the read MTJ to ensure that the
by adhering high-K materials at the edges, allowing skyrmions region captured by the read MTJ is closer to m z = −1 (anti-
to be well confined in the nanotrack with larger current parallel to the fixed layer), which, in turn, leads to higher
injection [22], [23]. Fig. 2(b) shows the comparison of the magnetoresistance change. Table II gives the comparison of
critical annihilation current density (Jani ) for three different the voltage swing (V) at node “A” in Fig. 1 in the presence
high-K materials (FePt, Nd2 Fe14 B, SmCo5 ) with the edge and the absence of a skyrmion under a read current of ∼ 1.25
width ranging from 1 to 5 nm for < 1 ns shift duration. × 10−5 A by pulling up the BL voltage to Vread (0.8 V) and
The corresponding material parameters, adopted from [19], SL to GND. As shown in Table II, changing the width of the
[24], are shown in Table I. Utilizing high-K materials at nanotrack increases the skyrmion dimension (the region cap-
the edges, makes switching the spin direction much harder tured by the read MTJ is closer to m z = −1), which, in turn,
when a skyrmion approaches the edge due to the Magnus leads to a greater voltage swing as a higher magnetoresistance
force, thereby keeping the skyrmion in the nanotrack. The change could be achieved. However, this also increases the
velocity of skyrmions, which increases with increased current required reliable spacing between consecutive skyrmions to
density, is therefore, enhanced during shifts, achieving faster free them from repulsive force between neighboring skyrmions
shift operations. Note that the induced energy barrier from [33]. Moreover, since the fixed layer of the read MTJ is located
1500309 IEEE TRANSACTIONS ON MAGNETICS, VOL. 55, NO. 8, AUGUST 2019

TABLE II TABLE III


R EAD V OLTAGE S WING (V) U NDER A R EAD C URRENT OF ∼ 1.25 × B IAS V OLTAGE C ONDITIONS FOR VARIOUS O PERATIONS
10−5 A B ETWEEN THE P RESENCE AND THE A BSENCE OF A S KYRMION
FOR D IFFERENT N ANOTRACK W IDTHS AND THE C ORRESPONDING
S KYRMION R ADIUS . T HE R ELIABLE S PACING B ETWEEN
C ONSECUTIVE S KYRMIONS IS A LSO C OMPARED

shifted right to the next address “0 × 1.” “1” is written into the
address “0 × 0” during the next write cycle, and then the data
in the nanotrack are again shifted to the right. By repeatedly
writing data into the address “0 × 0” and subsequently right
shifting all stored data to the next address, a sequence of bits
can be written to the nanotrack. To read the stored data at,
say, address “0 × 5,” the bit is shifted right by two positions
to reach the location under the read MTJ. Similarly, to write
data at a specific address, we first shift the bit to the position
where the write port is located. Before writing a new data into
the address, the previously stored data are cleared by injecting
a current with spin polarization in the opposite direction to the
magnetization of the skyrmion center. To prevent stored data in
the nanotrack overflowing during shift operations, we extend
Fig. 3. Logical view of a multi-bit MS-based cell with (a) single write/read the nanotrack by having extra data bits (light yellow part
port or (b) single write, multiple read ports. A sequence of bits is stored in
the nanotrack. in Fig. 3). In the worst case scenario for this example, to access
the stored data at address “0 × 0,” the bit is required to be
shifted right by seven positions. Thus, seven extra bits are
at the center region of the nanotrack, the deviation between the required to avoid the loss of stored data from address “0 × 1”
position of a skyrmion and a read port degrades the resistance to “0 × 7.” The write/read latency is dependent on the location
change with the absence/presence of a skyrmion. Therefore, where a bit is stored. However, the average read latency can
a read operation to a specific skyrmion bit requires an idle be alleviated by introducing multiple read ports, as shown
operation after each shift operation, i.e., the total number of in Fig. 3(b). The current location of the read port is referred
idle operations is equal to the number of shift operations to as the current port status. In order to access a bit from a
required for reading a skyrmion bit. This operation relaxes the multi-bit cell, a shift controller determines the appropriate read
skyrmions back to the center region through edge repulsion. port and calculates the number of shift operations required by
We achieve this by turning all access transistors OFF, which comparing the input address bits with the current port status.
stabilizes the magnetization of the nanotrack. This also results in a reduction of the number of extra bits
required to avoid data loss. Table III lists the bias voltage
conditions for write/shift/read/clear/idle operations.
III. M ULTI -B IT S KYRMION C ELL D ESIGN
Fig. 3 shows the logical representation of data stored along
the nanotrack. Depending on the existence of a skyrmion, A. Density of the Skyrmion-Based Multi-Bit MS Cell
different logic values can be stored along the nanotrack as Fig. 4 shows the layout of an 8/16/32 bit MS cell with a
multiple bits. We denote the presence of a skyrmion to single write/read port. As discussed in Section II, the current
represent logic “1,” while its absence denotes logic “0.” A requirement for the write operation is considerably higher
current injected into the SHM (blue layer) from the right than that for the read and shift operations. Hence, as shown
can shift skyrmions to the right-hand side of the nanotrack, in Fig. 4(a), for an 8 bit MS cell with single write/read port,
and vice versa. The logical views of a multi-bit cell with a the cell area is dominated by the peripheral write transistors
single write/read port and a cell with single write and multiple since the dimension of the write transistors is much larger
read ports are shown in Fig. 3(a) and (b), respectively. Note than the nanotrack. Note that the length of the nanotrack is
that the read ports can be placed at any location along the determined by the number of stored bits and the read ports.
nanotrack; however, the write port is placed at the end of The total length of the nanotrack can be reduced by having
the long nanotrack to ensure simplicity for write operation. multiple read/write ports as fewer extra bits are required to
Consider Fig. 3(a) as an example. A write port at address prevent the stored data from being destroyed during shift
“0 × 0” and a read MTJ at address “0 × 7” with a sequence operations (light yellow part in Fig. 3). For the 8 bit MS cell
of 0’s and 1’s stored in the cell are presented. In the first write case with a single read port, the write transistors dominate the
cycle, “0” is written into the address “0 × 0,” and subsequently total cell area. Thus, the density (i.e., cell area per bit) of the
CHEN et al.: CACHE MEMORY DESIGN WITH MSS IN A LONG NANOTRACK 1500309

Fig. 6. Memory array organization of skyrmion-based multi-bit cells.

Fig. 4. Layout of a MS cell with single write/read port at the 45 nm


technology node (F). (a) 8-bit. (b) 16-bit. (c) 32-bit.
the 16 bit MS cell case with less than 5 read ports, the bit-cell
area is mainly determined by the nanotrack itself. In the case
of 32 bit MS cell, the multi-bit cell area is further dominated
by the nanotrack, therefore having an extra write port helps
to reduce the 15 extra bits required, thereby improving the
density (i.e., cell area per bit).

IV. A RRAY O RGANIZATION


Fig. 6 shows the memory array organization with the
proposed multi-bit cell. The wordlines for performing read,
write, and shift operations (i.e., RWLs, WWLs, SWLs) are
shared among all the multi-bit cells placed in a row. The BL
and the SL are shared among all the multi-bit cells placed in a
column. In this architecture, multiple words can be placed on
the same row and accessed independently. The address decoder
Fig. 5. Bit-cell area comparison for different multi-bit designs. is used to select a multi-bit skyrmion cell in the array, with
the shift control logic selecting the appropriate word. Note that
the sense amplifier shared across the entire column detects the
8 bit MS cell does not improve further by introducing more output signal as logic “0” or “1.”
read ports (as presented in Fig. 5). On the other hand, for the
16/32 bit MS cell, since the nanotrack dominates the cell area
with one read port, the density can be improved by packing A. Skyrmion-Based Cache Design
more bits within a smaller area. Hence, as shown in Fig. 5, To evaluate the benefits of the proposed memory array at
at the 45 nm technology node (F), for an 16 bit cell with the application level, we integrate it as a last-level cache
one/two read ports, the CMOS transistors require 135.59 and in the memory hierarchy of a general-purpose processor.
112.67 F2 /bit, respectively. Fig. 5 shows the comparison of Toward this end, we follow the DWM-based hybrid cache
the cell size of the proposed MS cells, i.e., 8/16 bit MS cells organization presented in TapeCache [2], i.e., the tag array is
with one write port and 32 bit MS cell with both a single designed with SRAM to avoid variable access latency during
write port and two write ports while varying the number of performance-critical tag lookup operations, and the data array
read ports. We also show the cell size of SRAM (triangle) is realized using the proposed multi-bit skyrmion array. The
and 1T-1R STT-MRAM (star) on the figure for reference. The data array is further composed of randomly addressable clus-
total area of a multi-bit MS cell is determined by the number ters, each of which stores multiple cache blocks. We assume
of read and write transistors, as well as the length of the a bit-interleaved mapping of the cache blocks in each cluster,
nanotrack. For an 8 bit MS cell, the cell size is dominated by such that a given cache block can be accessed in parallel after
the write transistors when the total number of read ports is less performing an appropriate number of shift operations to all
than 3. Although having more read ports beyond 3, shortens the nanotracks within a cluster. The addressing policy and the
the nanotrack with fewer extra bits required, the area of read cache management policies are also assumed to be similar to
peripheral transistors inevitably increases too. Similarly, for that of TapeCache.
1500309 IEEE TRANSACTIONS ON MAGNETICS, VOL. 55, NO. 8, AUGUST 2019

TABLE IV TABLE V
M ATERIAL PARAMETERS U SED FOR S IMULATION S YSTEM C ONFIGURATION

B. System-Level Evaluation Framework


The device parameters obtained with the proposed sim-
ulation framework are used as technology parameters in a
modified version of CACTI [39] to evaluate the read/write
characteristics of the skyrmion-based cache. CACTI is an
V. E XPERIMENTAL M ETHODOLOGY integrated tool that is commonly used by computer architects
In this section, we present a brief description of the simu- for modeling dynamic power, access latency, area, and leakage
lation framework and present the experimental setup used to power of caches. It takes inputs as the cache parameters
evaluate our proposal. (e.g., capacity, block size, associativity etc.), the number of
read/write ports in cache, and bitcell-level technology para-
A. Simulation Framework meters to produce the array-level characteristics mentioned
above. These array-level characteristics are then reflected in
Micromagnetic simulations of the skyrmion device
GEM5 [40], a cycle accurate architectural simulator that
are performed using the tool Mumax3 [34], [35]. The
models a wide range of instruction set architectures (ISAs)
magnetization dynamics of MSs driven by vertical current
along with a detailed and flexible memory system. Specifically,
can be expressed by
we model the skyrmion cache architecture in GEM5 to eval-
γ uate the proposed design as an L2 cache. In our experiments,
τ = (m × Heff + α(m × (m × Heff ))) + τ S L
1 + α2 we perform an iso-area replacement of L2 cache and compare
 − α  − α the energy and performance of the proposed design with that of
τS L = β (m × (m p × m)) − β m × mp
1 + α2 1 + α2 SRAM-based and STT-MRAM-based caches. All the memory
jz h̄ technologies considered in the evaluation are based on a 45 nm
β =
Msat ed technology node. 2 The CMOS baseline system configuration
P2 used in our analysis is shown in Table V. We perform a
 = (3)
( + 1) + (2 − 1)(m · m p )
2 full-system simulation for 1 billion instructions in the regions
of interest for caches across a suite of multi-threaded bench-
where m is the normalized magnetization vector, m p is the
marks from PARSEC [21], a benchmark suite typically used
fixed-layer polarization, γ is the Gilbert gyromagnetic ratio, α
for studies of chip-multiprocessors.
is the Gilbert damping parameter, Heff is the effective field, jz
is the current density along the z-axis, Msat is the saturation VI. R ESULTS AND D ISCUSSION
magnetization, e is the elementary charge, d is the skyrmion
layer thickness, P is the polarization of conduction electron, A. Device-Level and Circuit-Level Results
the Slonczewski  parameter characterizes the spacer layer, As we discussed in Section III, shift operations are involved
and  is the secondary spin transfer term. The material in both write and read operations. However, during shift
parameters used in our simulations correspond to Co/Pt mul- operations, the trajectory of skyrmions in the nanotrack bends
tilayers [36] and are shown in Table IV. We consider a 0.4 nm away from the center as a result of Magnus force. Thus, an idle
thick Co nanotrack with perpendicular magnetic anisotropy on operation is required to relax skyrmions back to the center
a 3 nm Pt substrate inducing DMI. The sample is discretized region through edge repulsion after every shift operation.
into an element size of 1 × 1 × 0.4 nm3 . The non-equilibrium Fig. 7 shows the comparison of the required relaxation time
green’s function (NEGF)-based spin transport simulation has and the longitudinal shift distance within various operation
been used in order to obtain the resistance of the MTJ [37]. times for a current of 1.44 × 10−5 A and 5.76 × 10−5 A,
The charge current (Ie ) flowing through the SHM and the respectively, with and without adhering high-K materials at
corresponding spin current (Is ) are calculated using [38] both the edges. High-K materials are adhered to prevent
skyrmions from annihilation under a current of 5.76 × 10−5 A.
AMTJ
Is = θsh Ie (4) The longitudinal velocity is proportional to the injection
ASHM current density, and the required relaxation time is related
where AMTJ and ASHM are the cross-sectional areas of the to the transverse shift distance, which increases with increas-
MTJ and SHM, respectively, and θsh is the spin-Hall angle. ing drive current or operation time. Since skyrmions in the
The spin current from (4) is used to analyze the magnetization
2 We used a commercial 45 nm technology that was readily available to us,
dynamics with the generalized Landau-Lifshitz-Gilbert-
rather than predictive technology models, for our simulations. We expect the
Slonczewski (LLGS) equation. Magnetization dynamics sim- energy to further scale by ∼0.15× from 45 nm technology node to 15 nm,
ulations are performed using the Mumax3 platform [34], [35]. a state-of-the-art technology node [41].
CHEN et al.: CACHE MEMORY DESIGN WITH MSS IN A LONG NANOTRACK 1500309

Fig. 7. Comparison of relaxation time and final position of skyrmion under


a current of 1.44 × 10−5 A and 5.76 × 10−5 A, subjected to the nanotrack Fig. 8. Array-level comparison of read and write characteristics with iso-area
without (solid) and with (dashed) high-K , respectively. High-K materials SRAM and STT-MRAM.
are adhered under the current of 5.76 × 10−5 A to avoid skyrmions from
annihilation from edges.
edges of the nanotrack. In our evaluations, the shift energy
per operation was found to be 9.51× 10−4 and 8.68× 10−5
nanotrack stop at a certain distance to the edge owing to the pJ for the 8 bit-based memory array with and without high-K
skyrmion-edge interaction, the required relaxation time is the material on the nanotrack edges, respectively.
same after 1.2 and 0.8 ns operation time under a current of 1.44 2) Performance Evaluation: Fig. 9 shows the compar-
× 10−5 A and 5.76 × 10−5 A, respectively. With the aid of
ison of the instructions per cycle (IPC) for six differ-
adhering high-K materials, skyrmions can be operated under ent skyrmion-based cache configurations with SRAM and
a higher current density, and thus higher transverse velocity
STT-MRAM caches. We consider eight different L2 cache
can be reached. However, a higher relaxation time is also designs under iso-area conditions: 1) a 2 MB SRAM cache
required, which increases the shift latency. Since the reliable with one read/write port; 2) an 8 MB STT-MRAM cache with
spacing in our case is ∼ 74 nm, a current of 1.44 × 10−5 A
one read/write port; 3) two 2 MB MS-based cache designs
for 1 ns and 5.76 × 10−5 A for 0.2 ns is required during the with three read ports and one write port, storing 8 bits in
shift operation, leading to a 0.9 and 1.3 ns relaxation time, the nanotrack, with and without high-K material at the two
respectively. We compare the performance evaluation and
edges (8 bit-MS high-K and 8 bit-MS no high-K ); 4) two
energy consumption for 8/16/32 bit-MS with either high-K 4 MB MS-based cache designs with three read ports and
materials at the two edges (for a current of 5.76 × 10−5 A) or
one write port, storing 16 bits in the nanotrack, either having
no such material at the edges (for a current of 1.44 × 10−5 A). high-K material on the edges or no high-K material (16 bit-
MS high-K and 16 bit-MS no high-K ); and 5) two 4 MB
B. System-Level Results MS-based cache designs with eight read ports and two write
In this section, we present the array-level analysis of the ports, storing 32 bits in the nanotrack with either a high-K
proposed MS-based cache design and then evaluate the impact material at the edges or no such material (32 bit MS high-
on system performance and cache energy. K and 32 bit MS no high-K ). The IPC is normalized to
1) Array-Level Results: Fig. 8 shows the comparison of dif- the 2 MB SRAM-based cache design. Across all benchmarks,
ferent energy/latency components of the proposed MS-based the 8 bit MS high-K design leads to an average degradation
array with an STT-MRAM and SRAM array. The MS-based of 2.0% and 4.3% in performance compared to the SRAM and
array is realized using an 8 bit multi-bit cell. As shown in STT-MRAM designs. This degradation is primarily due to two
the figure, the write energy for the MS-based array is 1.78× factors: 1) reduced cache capacity (iso-capacity with respect
higher than STT-MRAM, and 5.5× higher than SRAM. This to SRAM and 0.25× capacity with respect to STT-MRAM)
is because of the high current requirements for skyrmion and 2) shift overhead arising from the memory structure.
nucleation. The write latency for the MS-based array is slightly In contrast, for the 8 bit-MS no high-K design, the system
(4%) lower than STT-MRAM but 2.4× higher than SRAM. performance further reduces by 3.3% and 5.6% compared to
On the other hand, the read energy (and latency) is identical the SRAM and the STT-MRAM cache as a result of additional
for both STT-MRAM and the proposed MS-based array due shift latency incurred for each cache access in the absence of
to similar read mechanisms, and 1.1× higher than SRAM. high-K material at the edges.
Furthermore, we observe the read latency is identical for On the other hand, the two 16 bit MS configurations
the MS-based array, STT-MRAM, and SRAM. Apart from (16 bit MS high-K and 16 bit MS no high-K ) degrade the
the read and write energies, the MS array also consumes performance by 0.1% and 2.7% compared to the SRAM cache
shift energy during read/write operations. The shift energy on average, respectively. Furthermore, we observe a 2.4%
is a function of the length of the nanotrack, the number of and 5.0% reduction in performance for the two designs when
read/write ports in the bit-cell, and the high-K material on the compared with the STT-MRAM cache design. The smaller
1500309 IEEE TRANSACTIONS ON MAGNETICS, VOL. 55, NO. 8, AUGUST 2019

Fig. 9. L2 cache performance comparison across different memory technologies.

Fig. 10. Energy trends across different memory technologies.

performance degradation over the SRAM and STT-MRAM energy outweigh the increase in shift energy, thereby leading
cache is attributed to the 2× higher cache capacity offered to improved cache energy. The energy benefits over SRAM
by the 16 bit configuration. For the 32 bit MS designs, reduce to 2.27× and 2.31× with the 32 bit MS high-K and
the performance improves by 0.4% with the high-K cache 32 bit MS no high-K designs, respectively. The benefits in
design and degrades by 0.6% for the design with no high-K energy are lower than the other two designs (8 bit and 16 bit
material, over the SRAM cache. This improvement is mainly configurations) since the resistance offered by the nanotrack
because of a reduced number of shift operations performed increases, which, in turn, increases the energy consumed for
on average with higher number of read and write ports in the each shift operation.
32 bit design. Note that the performance reduces by 0.6% and In contrast, the energy consumed by the MS-based cache
3.0%, respectively, over the STT-MRAM cache design, since designs is higher than the baseline iso-area STT-MRAM cache
the overall cache capacity does not increase with the 32 bit in all cases. Specifically, the 8 bit and 16 bit designs consume
MS design as discussed earlier. 1.29× and 1.27×, 1.30× and 1.28× higher energy than the
3) Energy Comparison: Fig. 10 illustrates the L2 cache STT-MRAM cache. Similarly, the 32 bit designs consume
energy consumed by the proposed cache designs compared to 1.37× and 1.34× energy over the STT-MRAM cache. This
the iso-area SRAM and STT-MRAM caches. The cache energy increase in energy is because of the additional shift energy
is normalized to the energy consumed by the STT-MRAM overheads and the reduced cache capacity arising from the
design. On average, we observe a 2.41× and 2.45× reduction larger write transistor requirements for the multi-bit MS cell.
in cache energy for the 8 bit MS high-K and 8 bit MS no In summary, our results show that skyrmion-based caches
high-K designs over the SRAM cache. This is due to the offer small improvements in performance with substantial
reduced leakage energy consumption with non-volatile MSs. energy reduction over an iso-area SRAM-based cache. They
The energy benefits are slightly higher for the 8 bit MS no also point to key avenues for improvement in skyrmion-based
high-K design because of lower shift energy consumed with memory—the high nucleation energy for skyrmions leads to
no high-K material on the nanotrack edges. For the 16 bit large write transistors, curtailing density benefits, while the
designs, the energy benefits were found to be 2.37× and latency due to shift operations limits the performance.
2.41× for the 16 bit-MS high-K and 16 bit MS no high-
K designs, respectively. The energy benefits are moderately VII. C ONCLUSION
lower for the 16 bit configurations over the 8 bit designs due In this paper, we explored MSs to design last-level caches.
to a higher energy consumed by the shift operations. Note that, We propose a multi-bit skyrmion-based cell design that packs
for a subset of benchmarks (canneal, ferret, streamclust, and multiple bits in a nanotrack. Since the size and spacing of
vips), the 16 bit configurations have a lower energy compared skyrmions can be down to nanometer scale, the skyrmion-
to the 8 bit designs. This is because of the lower capacity based nanotrack has the potential to provide significant density
misses observed in the 16 bit designs that eventually lead to benefits compared to other memory technologies. However,
lower write energy. In these benchmarks, the benefits in write the high current requirements for skyrmion nucleation is a
CHEN et al.: CACHE MEMORY DESIGN WITH MSS IN A LONG NANOTRACK 1500309

bottleneck to achieving significant density benefits. We ana- [17] F. Chen, Z. Li, W. Kang, W. Zhao, H. Li, and Y. Chen, “Process variation
lyzed different device tunings and design tradeoffs associated aware data management for magnetic skyrmions racetrack memory,”
in Proc. 23rd Asia South Pacific Design Automat. Conf. (ASP-DAC),
with the proposed bit cell and evaluated the area, performance, Jan. 2018, pp. 221–226.
and energy benefits while accounting for the peripheral cir- [18] R. Venkatesan, M. Sharad, K. Roy, and A. Raghunathan, “DWM-
cuit requirements. We designed a device–circuit–architecture TAPESTRI-An energy efficient all-spin cache using domain wall shift
based writes,” in Proc. Design, Automat. Test Eur. Conf. Exhib. (DATE),
framework to evaluate the system-level benefits of the pro- Mar. 2013, pp. 1825–1830.
posed design. Our experiments reveal considerable benefits [19] J. Sampaio, V. Cros, S. Rohart, A. Thiaville, and A. Fert, “Nucleation,
over an iso-area SRAM cache. However, the energy and stability and current-induced motion of isolated magnetic skyrmions
in nanostructures,” Nature Nanotechnol., vol. 8, no. 11, pp. 839–844,
performance are lower than an iso-area STT-MRAM cache, Nov. 2013.
suggesting the need for mechanisms to lower the current [20] A. A. Thiele, “Steady-state motion of magnetic domains,” Phys. Rev.
density requirements for skyrmion nucleation. Lett., vol. 30, no. 6, p. 230, Feb. 1973.
[21] C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The PARSEC benchmark
suite: Characterization and architectural implications,” in Proc. Int. Conf.
ACKNOWLEDGMENT Parallel Archit. Compilation Techn. (PACT), Oct. 2008, pp. 72–81.
[22] P. Lai et al., “An improved racetrack structure for transporting a
This work was supported in part by the Center for Spintron- skyrmion,” Sci. Rep., vol. 7, Mar. 2017, Art. no. 45330.
ics, in part by Semiconductor Research Corporation (SRC) and [23] H. T. Fook, W. L. Gan, I. Purnama, and W. S. Lew, “Mitigation of
magnus force in current-induced skyrmion dynamics,” IEEE Trans.
Microelectronics Advanced Research Corporation (MARCO), Magn., vol. 51, no. 11, pp. 1–4, Nov. 2015.
in part by National Science Foundation, and in part by the [24] G. Zhao, X. Zhang, and F. Morvan, “Theory for the coercivity and
Vannevar Bush Faculty Fellowship. its mechanisms in nanostructured permanent magnetic materials,” Rev.
Nanoscience Nanotechnol., vol. 4, no. 1, pp. 1–25, Apr. 2015.
R EFERENCES [25] C. Reichhardt, D. Ray, and C. J. O. Reichhardt, “Quantized transport
for a skyrmion moving on a two-dimensional periodic substrate,” Phys.
[1] S. Parkin, M. Hayashi, and L. Thomas, “Magnetic domain-wall racetrack Rev. B, vol. 91, no. 10, Mar. 2015, Art. no. 104426.
memory,” Science, vol. 320, no. 5873, pp. 190–194, Apr . 2008. [26] C. Navau, N. Del-Valle, and A. Sanchez, “Interaction of isolated
[2] R. Venkatesan, V. Kozhikkottu, C. Augustine, A. Raychowdhury, skyrmions with point and linear defects,” J. Magn. Magn. Mater.,
K. Roy, and A. Raghunathan, “Tapecache: a high density, energy vol. 465, pp. 709–715, Nov. 2018.
efficient cache based on domain wall memory,” in Proc. ACM/IEEE [27] D. Stosic, T. B. Ludermir, and M. V. Miloševic, “Pinning of magnetic
Int. Symp. Low Power Electron. Design. New York, NY, USA: ACM, skyrmions in a monolayer Co film on Pt(111): Theoretical characteriza-
Jul. 2012, pp. 185–190. tion and exemplified utilization,” Phys. Rev. B, Condens. Matter, vol. 96,
[3] Z. Sun, W. Wu, and H. Li, “Cross-layer racetrack memory design no. 21, 2017, Art. no. 214403.
for ultra high density and low power consumption,” in Proc. 50th [28] X. Ma, C. J. O. Reichhardt, and C. Reichhardt, “Reversible vector
ACM/EDAC/IEEE Design Automat. Conf. (DAC). ACM, May 2013, ratchets for skyrmion systems,” Phys. Rev. B, vol. 95, no. 10, Mar. 2017,
pp. 1–6. Art. no. 104401.
[4] A. Ranjan, S. G. Ramasubramanian, R. Venkatesan, V. Pai, [29] C. Reichhardt, D. Ray, and C. J. O. Reichhardt, “Magnus-induced ratchet
K. Roy, and A. Raghunathan, “DyReCTape: A dynamically recon- effects for skyrmions interacting with asymmetric substrates,” New
figurable cache using domain wall memory tapes,” in Proc. Design, J. Phys., vol. 17, no. 7, Jul. 2015, Art. no. 073034.
Autom. Test Eur. Conf. Exhib. (DATE). Grenoble, France, Mar. 2015, [30] C. Reichhardt and C. J. O. Reichhardt, “Noise fluctuations and drive
pp. 181–186. dependence of the skyrmion Hall effect in disordered systems,” New
[5] R. Venkatesan, S. G. Ramasubramanian, S. Venkataramani, K. Roy, and J. Phys., vol. 18, no. 9, Sep. 2016, Art. no. 095005.
A. Raghunathan, “STAG: Spintronic-tape architecture for GPGPU cache [31] C. Hanneken, et al., “Electrical detection of magnetic skyrmions by tun-
hierarchies,” in Proc. ACM/IEEE 41st Int. Symp. Comput. Archit. (ISCA). nelling non-collinear magnetoresistance,” Nature Nanotechnol., vol. 10,
Jun. 2014, pp. 253–264. no. 12, pp. 1039–1042, Dec. 2015.
[6] A. Thiaville, Y. Nakatani, J. Miltat, and Y. Suzuki, “Micromag- [32] D. Maccariello et al., “Electrical detection of single magnetic skyrmions
netic understanding of current-driven domain wall motion in patterned in metallic multilayers at room temperature,” Nature Nanotechnol.,
nanowires,” EPL (Europhys. Lett.), vol. 69, no. 6, p. 990, Feb. 2005. vol. 13, no. 3, pp. 233–237, Mar. 2018.
[7] A. Fert, V. Cros, and J. Sampaio, “Skyrmions on the track,” Nature [33] X. Zhang et al., “Skyrmion-skyrmion and skyrmion-edge repulsions in
Nanotechnol., vol. 8, no. 3, pp. 152–156, 2013. skyrmion-based racetrack memory,” Sci. Rep., vol. 5, p. 7643, Jan. 2015.
[8] N. Nagaosa and Y. Tokura, “Topological properties and dynamics of [34] M. Najafi et al., “Proposal for a standard problem for micromagnetic
magnetic skyrmions,” Nature Nanotechnol., vol. 8, no. 12, pp. 899–911, simulations including spin-transfer torque,” J. Appl. Phys., vol. 105,
2013. no. 11, Jun. 2009, Art. no. 113914.
[9] R. Wiesendanger, “Nanoscale magnetic skyrmions in metallic films and [35] A. Vansteenkiste, J. Leliaert, M. Dvornik, M. Helsen, F. Garcia-Sanchez,
multilayers: A new twist for spintronics,” Nature Rev. Mater., vol. 1, and B. Van Waeyenberge, “The design and verification of MuMax3,”
no. 7, Jul. 2016, Art. no. 016044. AIP Adv., vol. 4, no. 10, Oct. 2014, Art. no. 107133.
[10] R. Tomasello, E. Martinez, R. Zivieri, L. Torres, M. Carpentieri, and [36] P. J. Metaxas et al., “Creep and flow regimes of magnetic domain-wall
G. Finocchio, “A strategy for the design of skyrmion racetrack memo- motion in ultrathin Pt/Co/Pt films with perpendicular anisotropy,” Phys.
ries,” Sci. Rep., vol. 4, p. 6784, Oct. 2014. Rev. Lett., vol. 99, no. 21, Nov. 2007, Art. no. 217208.
[11] I. Dzyaloshinsky, “A thermodynamic theory of weak ferromagnetism of [37] X. Fong, S. K. Gupta, N. N. Mojumder, S. H. Choday, C. Augustine,
antiferromagnetics,” J. Phys. Chem. Solids, vol. 4, no. 4, pp. 241–255, and K. Roy, “KNACK: A hybrid spin-charge mixed-mode simulator
1958. for evaluating different genres of spin-transfer torque MRAM bit-cells,”
[12] T. Moriya, “New mechanism of anisotropic superexchange interaction,” in Proc. Int. Conf. Simul. Semicond. Processes Devices, Sep. 2011,
Phys. Rev. Lett., vol. 4, no. 5, p. 228, Mar. 1960. pp. 51–54.
[13] S. Mühlbauer, et al., “Skyrmion lattice in a chiral magnet,” Science, [38] L. Liu, T. Moriyama, D. C. Ralph, and R. A. Buhrman, “Spin-torque
vol. 323, no. 5916, pp. 915–919, Feb. 2009. ferromagnetic resonance induced by the spin hall effect,” Phys. Rev.
[14] X. Z. Yu et al., “Real-space observation of a two-dimensional skyrmion Lett., vol. 106, no. 3, Jan. 2011, Art. no. 036601.
crystal,” Nature, vol. 465, no. 7300, pp. 901–904, Jun. 2010. [39] CACTI. Accessed: Apr. 9, 2019. [Online]. Available: www.hpl.
[15] X. Z. Yu et al., “Near room-temperature formation of a skyrmion crystal hp.com/research/cacti
in thin-films of the helimagnet FeGe,” Nature Mater., vol. 10, no. 2, [40] N. Binkert et al., “ The gem5 simulator,” SIGARCH Comput. Arch.
pp. 106–109, 2011. News, vol. 39, no. 2, pp. 1–7, Aug. 2011.
[16] S. Heinze et al., “Spontaneous atomic-scale magnetic skyrmion lat- [41] R. Perricone et al., “ Advanced spintronic memory and logic for
tice in two dimensions,” Nature Phys., vol. 7, no. 9, pp. 713–718, non-volatile processors,” in Proc. Design Automat. Test Eur. Conf.
Sep. 2011. Exhib. (DATE), Mar. 2017, pp. 972–977.

You might also like