Amoeba
Amoeba
Amoeba: An Autonomous Backup each system has limitations in efficiency in detecting ransomware
and Recovery SSD for Ransomware attacks and managing backup data.
For example, the limitations of FlashGuard [9] are manifolded:
Attack Defense (1) it generates many backup pages, which results in performance
degradation, because it triggers frequent garbage collection; (2) it
Donghyun Min , Donggyu Park, Jinwoo Ahn, requires frequent communication between the host and the SSD,
Ryan Walker, Junghee Lee , Sungyong Park , because the host makes the final decision of the ransomware attack;
and Youngjae Kim (3) manual investigation is required for recovery; and (4) it does
not have a method to control the backup space. SSD-Insider [10]
Abstract—Ransomware is one of growing concerns in enterprise and government
organizations, because it may cause financial damages or loss of important data. adopts a method to detect ransomware attack pattern only by over-
Although there are techniques to detect and prevent ransomware, an evolved write pattern in particular. However, this system has the problem
ransomware may evade them because they are based on monitoring known that it can not distinguish between normal overwrite and ransom-
behaviors. Ransomware can be mitigated if backup copies of data are retained in a ware overwrite attack.
safe place. However, existing backup solutions may be under ransomware’s In this paper, we propose Amoeba, an autonomous backup and
control and an intelligent ransomware may destroy backup copies too. They also recovery SSD, to solve these problems. Amoeba automatically per-
incur overhead to storage space, performance and network traffic (in case of
forms the infection detection of the ransomware, alerts, backup data
remote backup). In this paper, we propose an SSD system that supports
automated backup, called Amoeba. In particular, Amoeba is armed with a
management and recovery inside the SSD. Unlike FlashGuard [9]
hardware accelerator that can detect the infection of pages by ransomware attacks and SSD-Insider [10], Amoeba implements data content-based
at high speed and a fine-grained backup control mechanism to minimize space inspection for high-accuracy ransomware detection. Specifically,
overhead for original data backup. For evaluation, we extended the Microsoft SSD first, the ransomware detection is accelerated and the communica-
simulator to implement Amoeba and evaluated it using the realistic block-level tion with the host is minimized by designing a hardware accelerator
traces, which are collected while running the actual ransomware. According to our to calculate the Ransomware Attack Risk Indicator (RARI) for the
experiments, Amoeba has negligible overhead and outperforms in performance
write request in the SSD. Second, Amoeba supports the fine-grained
and space efficiency over the state-of-the-art SSD, FlashGuard, which supports
management of the backup page by determining the backup page,
data backup within the device.
according to the probability of ransomware infection for each page.
Consequently, Amoeba manages only one backup page of each
Index Terms—Solid-state drive (SSD), storage security, ransomware attack
page, minimizing GC overhead as well as backup space.
Ç
2 RANSOMWARE ATTACKS AND DEFENSE
2.1 Internal Operations for SSDs
1 INTRODUCTION SSDs can read and write page by page, and erase block by block. In
NAND flash, however, the erase operation unit is a block that is
RANSOMWARE is a type of malware that takes user data files as hostage
composed of several pages. A whole block should be erased during
by encrypting them until the victim pays a ransom. In July and June
in-place update, which causes considerable overhead. So, SSDs
2017, massive ransomware attacks occurred and more than 12,000
perform an out-of-place update that writes data to a new empty
computers were attacked [1]. Since ransomware may cause immediate
page and invalidates an existing page to minimize the overhead of
financial damages, it is in urgent need to find an effective way to miti-
the erase. Invalid pages are collected after the erase during Gar-
gate ransomware. For example, Atlanta had to spend more than
bage Collection. To perform the out-of-place update, the SSD man-
$2.6M to recover from the recent ransomware attack in 2018 [2].
ages each page which has states of free, valid, and invalid. free is a
Existing techniques detect and prevent ransomware by identify-
blank page that can be written. When data is written to a page, its
ing known behaviors of ransomware such as frequent access of
state changes to the valid state. If data is overwritten on the same
cryptographic libraries and receiving encryptions keys from a
logical page afterward, the mapped physical page changes to the
remote server [3], [4], [5], [6], [7]. However, these techniques can be
invalid state and the data is written to a new free page, which is to
evaded [3]. The guaranteed solution to ransomware is data backup.
be mapped with the logical page address. A page in invalid state
A filesystem may retain backup copies and recover them if they are
cannot be used until it is erased by the GC.
infected by ransomware [8].
NAND flash memory based solid-state drives (SSDs) are a good
2.2 Data Backup Inside an SSD
candidate to implement a data backup mechanism. It always
Ransomware is a malware that encrypts and takes the user data
retains a previous version of data because of its out-of-place update
files as hostage until the victim pays a ransom. Ransomware reads
until an internal Garbage Collection (GC) process sweeps stale
and encrypts the user data, and then overwrites it. Therefore, pages
data. If an SSD backs up files internally, it is transparent to users,
that are infected with ransomware show a typical IO performance
and backup copies cannot be destroyed by ransomware and privi-
pattern of Read After Write. The solution to cope with ransom-
leged system software.
ware is to back up the original data in advance and restore it when
FlashGuard [9] and SSD-Insider [10] are systems that manage
infected. However, existing methods, which backup and restore
backup data inside the SSD for ransomware attack protection, but
data by a file system, requires additional space cost for backup and
IO performance overhead to calculate the ransomware infection,
D. Min, D. Park, J. Ahn, S. Park, and Y. Kim are with Sogang University, Seoul 04107, and implies the risk of the damage to the backup data copy, due to
South Korea. E-mail: {mdh38112, dgpark, jinu37, parksy, youkim}@sogang.ac.kr.
R. Walker and J. Lee are with the University of Texas at San Antonio, San Antonio, the intelligent ransomware attacks. To solve this problem, Flash-
TX 78249. E-mail: {ryan.walker, junghee.lee}@utsa.edu. Guard [9] proposes a mechanism to perform data backup in the
Manuscript received 1 July 2018; revised 13 Sept. 2018; accepted 5 Oct. 2018. Date of SSD rather than the OS. If any disk page shows a pattern of
publication 27 Nov. 2018; date of current version 10 Dec. 2018. Read After Write, it saves the backup of that page in the device
(Corresponding author: Youngjae Kim.) space to solve the overhead issues of the existing file system.
For information on obtaining reprints of this article, please send e-mail to: reprints@ieee.
org, and reference the Digital Object Identifier below. FlashGuard adds a backup state (backup) between valid and
Digital Object Identifier no. 10.1109/LCA.2018.2883431 invalid states considering the feature of an SSD device in which the
1556-6056 ß 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
244 IEEE COMPUTER ARCHITECTURE LETTERS, VOL. 17, NO. 2, JULY-DECEMBER 2018
Fig. 1. A description of the OOB metadata area to support the backup mechanism.
LPN: logical page number, PPN: physical page number, BPN: backup page
number.
Fig. 2. Non-deterministic Finite Automata (NFA) for Amoeba.
valid state page is not deleted immediately but, is kept in the from the flash memory, and compute their similarity. At the same
invalid state for a certain period of time. If Read After Write pat- time, the number of occurrences counts up according to the new
tern, which is suspected to be an access pattern of ransomware, is data to compute the entropy. Then, the DMA controller writes the
found, it switches the valid state to the backup state instead of new data to its destination flash memory. After transferring all of
invalid for all overwrites. So, data backup is possible without allo- the new data, it finalizes similarity and entropy, and reports them
cating additional space. However, the Read After Write detection to the firmware (FTL). Since similarity and entropy are computed
method used by FlashGuard has a false positive issue, which while data is transferred, most of their performance overhead is
makes a backup copy even in normal IO performance. In particu- hidden by parallelizing data transfer and computation.
lar, a backup is generated for every subsequent write, once the It is difficult to judge ransomware attack by considering only sim-
Read After Write pattern is executed on a specific page. So, the ilarity, entropy, and intensity individually. Amoeba uses a strong
backup space rapidly grows and the space for invalid state page ransomware classifier that considers similarity, entropy, and inten-
decreases, though there is no ransomware infection. It also sity. The RARI value is obtained by normalizing the three indicators
increases the number of page moves during GC, resulting in with the MinMaxScaler method and taking the logistic classification.
greater performance overhead due to GC. The RARI value can be formulated by the following equation:
In addition, multiple backup pages of a logical page should be
individually checked during the recovery, to find the one that is 1
r¼ ; z ¼ a SIM þ b ENT þ g INT þ d: (1)
not infected by the ransomware. Finally, FlashGuard lacks proper 1 þ ez
mechanisms to control, when the backup page space massively
grows in SSDs. FlashGuard merely keeps backup space for a spe- In Equation (1), z represents the result of linear equation of the
cific period and then deletes it all. However, the speed of the page for the write request. SIM, ENT , and INT mean similarity,
backup page growth in the SSD can differ from the usage fre- entropy, and write Intensity. a, b and g mean weights and d means
quency, so that there is a risk that the backup space occupies a bias. In particular, we obtained all these values using logistic regres-
whole SSD within a short period of time. sion from machine learning. r is the RARI value of the write request
page, which means the possibility of being ransomware and is com-
puted using Equation (1). Amoeba computes the RARI value for
3 AMOEBA: DESIGN AND IMPLEMENTATION each page write operation and identifies ransomware attacks.
This section describes the backup mechanism of Amoeba. In partic-
ular, Amoeba calculates the risk of ransomware attacks for all page 3.2 Backup and Recovery
write operations. The RARI hardware module is implemented by Amoeba uses the Out-of-band (OOB) region of a page to imple-
expanding the DMA controller for internal writing to NAND flash ment the backup and recovery mechanism as described in Fig. 1.
inside the SSD. The RARI module measures the risk of all pages, The OOB region of a page contains the backup page number (BPN)
and it is used to determine whether to back up the page. and the RARI value of the page.
Ransomware Detection and Backup Operation. Fig. 2 describes the
state transition behavior mechanism of Amoeba logical pages
3.1 Ransomware Risk Calculation using Non-deterministic Finite Automata (NFA) and its states and
Amoeba’s RARI uses intensity, similarity, and entropy as indica- transitions are described in Table 1. The graphical representation
tors to determine the risk of potential ransomware attacks. of the NFA consists of states (Node) and inputs for state transition
The intensity can be easily obtained by counting the number of (Edge). Table 2 presents a description of the state and input of
write requests. It is implemented by the firmware, Flash Transla- Amoeba NFA. When the first page write occurs to a free page, its
tion Layer (FTL) at a negligible cost. However, to compute similar- state changes from OF to OV through the state transition. Next,
ity and entropy, every byte of incoming data as well as old data when the page is in the OV state, normal write requests on that
need to be accessed, which may incur excessive overhead if com- page overwrite the page.
puted by the firmware. Thus, we propose to execute them by a Amoeba backup mechanism first checks if the write request is a
Direct Memory Access (DMA) controller. pattern of the Read-After-Write for every overwrite request. Second,
An SSD controller usually has an internal DMA to transfer data it checks whether the RARI value of the current valid page is larger
from temporary buffer of the main memory to the destination than the defined threshold value. If both conditions are satisfied,
NAND flash memory. When a write request arrives from the host Amoeba will regard it as a write request of ransomware attack
interface, it is temporarily stored in the main memory of an SSD. (OWransomware ). If either of the two conditions is not satisfied, it will
The data is transferred to the destination NAND flash memory by be regarded as a normal write request (OWnormal ).
a DMA controller. While the data is being transferred, similarity The state transitions of the ransomware overwrite and normal
and entropy are calculated. overwrite depend on the state of the page (OV or VB).
To calculate similarity, the DMA controller needs to access both
new and old data. Thus, it needs to issue an additional page read OWransomware or OWnormal upon OV State: Suppose that a
to the NAND flash memory where the old data is stored. The logical page LPNða;ovÞ has been mapped to a physical page
DMA controller reads new data from the main memory, old data PPNðb;validÞ .
IEEE COMPUTER ARCHITECTURE LETTERS, VOL. 17, NO. 2, JULY-DECEMBER 2018 245
TABLE 1 TABLE 2
A Description of the State and Input of the NFA in Amoeba SSD Simulator Configurations
TABLE 3 TABLE 4
Workload Characteristics for Ransomware Attacks Results for Ransomware Detection Accuracy
and Recovery Cost Comparisons
Number Size
Type
Num % Avg (KB) Total (MB) %
pdf 1114 34.69 926.58 1008.02 20.31
html 307 9.56 51.70 15.50 0.31
image files 357 11.12 166.97 58.21 1.17
xls 318 9.90 363.07 112.75 2.27
ppt 125 3.89 1483.33 181.07 3.65
doc 74 2.30 428.70 30.98 0.62
zip files 123 3.83 12292.58 1476.31 29.75
others 793 24.69 4404.65 2080.24 41.92
Total 3211 100 1582.745 4963.08 100
Amoeba embeds a special RARI computation hardware on the
DMA module for fast ransomware detection and performs autono-
mous backup and recovery within the device. The experimental
results demonstrate the backup overhead is negligible while mini-
mizing the error of ransomware attacks.
ACKNOWLEDGMENTS
This work was supported by the National Research Foundation
of Korea (NRF) grant funded by the Korea Government (MSIT)
(No. NRF-2018R1A1A1A05079398).
5 CONCLUSION
This paper proposes Amoeba, an autonomous backup and rec-
overy SSD to defend against ransomware attacks. Specifically,