Ransomware Detection Using Machine Learning A Revi
Ransomware Detection Using Machine Learning A Revi
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT Ransomware attacks are on the rise in terms of both frequency and impact. The shift to remote
work due to the COVID-19 pandemic has led more people to work online, prompting companies to adapt
quickly. Unfortunately, this increased online activity has provided cybercriminals numerous opportunities to
carry out devastating attacks. One recent method employed by malicious actors involves infecting corporate
networks with ransomware to extract millions of dollars in profits. Ransomware falls into the category
of malware. It works by encrypting sensitive data and demanding payments from victims to receive the
encryption keys necessary for decrypting their data. The prevalence of this type of attack has prompted
governments and organisations worldwide to intensify their efforts to combat ransomware. In response, the
research community has also focused on ransomware detection, leveraging technologies such as machine
learning. Despite this increased attention, practical solutions for real-world applications remain scarce in
the existing literature. Numerous surveys have explored literature within the domain. Still, there is a notable
lack of emphasis on the design of ransomware detection systems and the practical aspects of detection,
including real-time and early detection. Against this backdrop, our review delves into the existing literature
on ransomware detection, specifically examining the machine-learning techniques, detection approaches,
and designs employed. Finally, we highlight the limitations of prior studies and propose future research
directions in this crucial area.
INDEX TERMS Ransomware detection, machine learning, deep learning, early detection, real-time
detection, survey
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
FIGURE 1: Number of global news items published about ransomware and the total cryptocurrencies received into ransomware
addresses in millions [3].
of lockdowns [4]. This forced companies worldwide which $2.3 million was later recovered by the Department of
to shift their workforce online to continue business Justice from crypto wallets used during the exchange.
operations. In response, organisations adapted rapidly, The Costa Rican government suffered a similar attack in
transforming how business is conducted online [5]. As a 2022 from the ransomware group Conti, who demanded $20
result, this has increased the attack surface for criminals, million to recover critical files after releasing ransomware
ultimately creating more avenues for malicious actors to across various departments. As a result of the attack, in-
launch attacks. ternational trade within the country ground to a halt, and
• The rise of criminal cooperatives: Cybercriminals no over 1,500 servers were infected, forcing government staff to
longer work in independent silos. Online criminal or- revert to pen and paper to complete operations. A national
ganisations have started collaborating for economies of emergency was declared to mitigate further damages, and
scale. Modern malware is often an amalgamation of although the ransom was not paid, businesses suffered losses
malicious scripts or executables written by several par- of up to $125 million during 48 hours [9].
ties. For example, one party may focus on creating the In response to the growing threat, governments worldwide
ransomware executable while another may specialise in are pursuing legal avenues 2 to increase security controls in
breaching the target network and harvesting sensitive major organisations as shown in Table 1. As a result, many
information [6]. This has led to business models such countries are enforcing mandatory reporting requirements
as Ransomware-as-a-Service (RaaS)1 which offers af- for organisations managing critical infrastructures such as
filiates commissions for distribution. gas pipelines, electricity distributors and hospitals. By de-
These factors have created the perfect environment for sign, many of these laws increase attack transparency and
malicious actors to launch devastating ransomware attacks, encourage victims to respond to security incidents promptly,
which are becoming bolder and more frequent, as seen during ultimately reducing the impact on the public. While organ-
the attack on Colonial Pipeline Co., the biggest oil pipeline isations are advised not to pay the ransom after an attack,
company in the United States (U.S.). On 7 May 2021, the governments often fall short in outlawing the payment to
hacking group Darkside breached the company’s network malicious actors to retrieve their data. The U.S. has led legal
via leaked credentials from the dark web. Once inside, the efforts to disrupt ransomware groups, such as establishing
hackers exfiltrated 100 gigabytes of company data and re- sanctions against ransomware operators and designating sev-
leased ransomware across the network before demanding eral virtual currency exchanges as complicit in facilitating
USD 4.4 million to restore files and cease the attack [7]. To financial transactions for ransomware actors. These efforts
mitigate further damage, the company shut down operations have led to a dramatic decline in revenue generated by ran-
for several days, resulting in supply and demand constraints somware, as shown in Fig. 1. Insurance companies offering
across the East Coast, inflating the gas price to more than $3 cyber extortion policies also play an essential role in miti-
per gallon [8]. Despite recommendations against submitting gating the ransomware threat. In most cases, subscribing to
to ransom demands, the company’s CEO released USD 4.4 these policies requires organisations to adhere to a minimum
million worth of cryptocurrencies to the Darkside group, of standard of security controls to reduce the attack surface.
This approach, however, can encourage ransomware groups
1 Similar to the Software-as-a-Service model, the ransomware backend 2 https://www.whitehouse.gov/briefing-room/statements-
infrastructure is managed by the malicious actors and distributed by affiliates releases/2021/10/14/joint-statement-of-the-ministers-and-representatives-
for a commission. from-the-counter-ransomware-initiative-meeting-october-2021/
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
to demand higher ransoms since insurance companies cover using dynamic analysis under the "real-time" analysis
the cost [10]. banner. Another example is in the survey by A. Alqah-
The research community has increased their focus on ran- tani and F.T. Sheldon [12]. In their study, the authors
somware detection using machine learning (ML) to mitigate investigated the techniques to build early detection
the rise of ransomware attacks. The earliest known study con- ML models and discussed their limitations. Similar to
taining the keywords "ransomware detection" within Scopus the aforementioned survey, the "real-time" and "early-
appeared in 2016 3 . This aligns with the rise in ransomware- detection" paradigms are grouped.
related News items published 4 worldwide due to increased 4) There is a lack of clarity around early detection tech-
ransomware attacks in 2016 as shown in Fig. 1. niques. The importance of detecting ransomware be-
Several surveys have reviewed the ransomware detection fore files are encrypted has been highlighted in pre-
domain over this period as shown in Table. 3. These surveys vious surveys [20]. Despite this, most surveys sim-
broadly focus on: (1) the features or input data used to plistically view early detection as the discovery of
train machine learning algorithms [11]–[32]; (2) ransomware ransomware anytime before the encryption of files. As
behaviour and trends [13], [14], [16]–[22], [25], [27], [28], a result, ransomware researchers have little guidance
[31], [32]; (3) detection techniques [11]–[14], [16]–[27], on when and where features can be collected during
[29]–[32]; (4) the algorithms used to detect ransomware the ransomware attack lifecycle to detect ransomware
[11]–[13], [15], [16], [18], [19], [25]–[32]; (5) ransomware early. A. Alqahtani and F.T. Sheldon [12] discuss early
prevention strategies [20]. However, despite recent research detection and its limitations in their survey. The authors
efforts, several key limitations emerge from existing surveys: highlight the challenges of collecting features in the
1) Previously presented ransomware trends have become early phases of a ransomware attack but recommend
outdated. Ransomware trends have been covered in exploring cryptographic API calls as a solution, which
several surveys [13], [14], [16]–[22], [25], [27], [31], is very late in the attack lifecycle. Al-rimy et al. [14],
[32]. Still, the trends covered in previous surveys and Urooj et al. [31] highlight early ransomware de-
need to be updated due to the evolving nature of tection studies throughout the literature and provide
ransomware. Razaulla et al. [27] point out the need recommendations accordingly, without distinguishing
for more attention to the top ransomware, earning the between different stages of the attack lifecycle. The
highest revenue over the past few years. most comprehensive survey on early-detection tech-
2) The design of ransomware detection systems is un- niques was conducted by Moussaileb et al. [23]. In
derstudied. No other survey holistically reviews the their survey, the authors group feature types by ran-
design approaches of existing ransomware detection somware attack phases. However, the authors neglect
systems through a pragmatic lens. Such an approach ML-based detection approaches during the delivery
is important to determine if the proposals presented phase of an attack, failing to capitalise on early de-
throughout the literature are suitable for "real-world" tection techniques before the ransomware reaches the
use within enterprise environments. Urooj et al. [31] host.
reviewed some real-time detection systems throughout Given this context, our survey examines machine learning
the literature and highlighted the lack of practicality ML-based ransomware detection systems in the existing lit-
of current proposals. However, little explanation is erature. The goal is to enhance the feasibility of forthcoming
given as to why. Only Or-Meir et al. [24] provided a proposals. To achieve this objective, we systematically ad-
comprehensive review of the design aspects of malware dress the previously mentioned limitations in this survey. We
analysis environments. However, since their study only take a comprehensive approach by scrutinising ransomware
focuses on analysis techniques, it does not adequately detection systems and exploring machine learning mecha-
explore ransomware detection using machine learning. nisms, detection approaches, and architecture. Additionally,
3) Real-time and early detection techniques are neglected. we identify crucial limitations and provide recommendations
Despite being an important facet, no other studies for guiding future research.
have examined the mechanics of real-time and early
detection approaches in-depth or explained what dif- A. CONTRIBUTIONS
ferentiates them. A handful of studies have attempted To address the gaps in previous surveys and direct research
to cover early and real-time detection in their work. efforts toward improving the practicality of ransomware de-
Consequently, the terms "Dynamic detection", "early tection systems, we offer the following contributions:
detection", and "real-time detection" have been used
interchangeably despite being separate paradigms. An 1) Review of the latest ransomware families, trends and
example can be seen in the survey by Moussaileb et government responses: We review the most pervasive
al. [23], which groups delayed detection techniques ransomware families seen worldwide during recent at-
tacks in section III-B. We also comprehensively review
3 https://www.scopus.com/ the latest trends and the tactics, techniques and proce-
4 https://www.dowjones.com/products/factiva dures (TTPs) observed in recent ransomware attacks.
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
TABLE 1: Various Acts and Bills by Governments and worldwide to tackle the growing threat of Ransomware. n A Bill or Act
was proposed but did not proceed
Country Document Description
Security of Critical Infrastructure Act 2018 Mandatory reporting requirements for Critical infrastructure
Australia Describes offences involving extortion of data, handling of unauthorised data,
Crimes Legislation Amendment (Ransomware and seizing of digital assets. Offences range from 25 years imprisonment for
Action Plan) Bill 2022 n attacks against critical infrastructure, 10 years for cyber extortion, 10 years
for producing, supplying or obtaining data under an arrangement for payment
Ransomware Payments Bill 2021 n Mandatory reporting requirements for entities that make ransomware payments
Stricter laws for unauthorised tampering with data with up to 5 years’
Brazil law 14.155 of the Brazilian Penal Code
imprisonment for electronic damage
Amendments to the Telecommunications Act, the establishment of
Canada BILL C-26
Cyber Security programs, reporting of Cyber Incidents
Network and Information Security Directive An update to the NIS directive to improve supply chain risk, improve
European Union (NIS2 Directive) reporting requirements and stricter enforcement requirements
General Data Protection Regulation (GDPR) Mandatory controls to protect sensitive data
The Information Telecommunication Act
India Mandatory reporting for cyber incidents within 6 hours.
of 2000 (70B)
Laws designed to protect and regulate the collection, processing, disclosure,
Saudi Arabia Personal Data Protection Law (PDPL)
or retention of personal data.
The Network and Information Systems Mandatory reporting for operators of essential services. Fines of up to
U.K
Regulations 2018 17 million pounds for not establishing effective cyber security measures
Mandatory reporting for ransomware payments made by critical
Cyber Incident Reporting Act of 2021 infrastructure operators (within 24 hours for companies with more than
50 employees)
U.S Mandatory reporting for cyber intrusions within 24 hours for covered
Cyber Incident Notification Act of 2021
entities
Designations of state sponsors, imposing of sanctions, mandatory
Sanction and Stop Ransomware Act of 2021 reporting requirements for ransomware attacks and payments, regulation
of cryptocurrency exchanges
Forbids financial institutions from making ransomware payments without
Ransomware and Financial Stability Act of 2021
federal approval
Mandatory reporting for ransomware payments to the Department of
Ransom Disclosure Act
Homeland Security within 48 hours of paying a ransom
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
cal. As a result, we consult several grey literature data sources TABLE 2: A summary of the inclusion and exclusion criteria
to collate information about the latest tactics, techniques and used to locate studies within the survey
procedures from ransomware observed in the wild. Specifi- # Inclusions Exclusions
cally, these sources included 1) the European Union Agency The study focuses on
The study is based on IOT
ransomware detection using
for Cyber Security, 2) the Cyber Security and Infrastructure 1 machine learning or deep
systems, ransomware on
Security Agency, and 3) the Australian Signals Directorate. mobile platforms, or is not
learning within Windows
Windows based
To collate a collection of ransomware detection studies, we systems
The article was peer-reviewed The article was not published
searched several reputable databases to gather appropriate 2 and published in a reputable in a reputable journal article
literature. These included: 1) ACM Digital Library; 2) IEEE journal or conference paper or conference paper
Explore; 3) Google Scholar; 4) Science Direct; 5) Scopus; 6) 3
The articles are published Literature published as news
surveys or research papers articles or magazines
Springer; 7) Taylor & Francis; 8) Wiley Online Library.
Articles published within the
4 Articles older than 5 years
last 5 years
B. SEARCH KEYWORDS 5
The study was published
Studies not written in English
in English
Search terms were chosen to locate articles on ransomware
detection using machine learning. Consequently, we care-
fully selected the following terms to locate relevant literature
within academic archives: 1) ’ransomware’; 2) detection’; 3)
’machine learning’; 3) ’deep learning’. While ransomware is
a subtype of malware, we limit our search to ’ransomware’
only to align with the scope of the survey.
C. SELECTION CRITERIA
We adhered to the methodology illustrated in Figure 2 for
the literature review. Given the specific focus on Windows-
based ransomware, articles about solutions for IoT and mo-
bile platforms, such as Android, were intentionally excluded.
Additionally, to streamline the investigation, articles that did
not align with the inclusion criteria outlined in Table 2 were
excluded. These criteria encompassed factors such as peer-
reviewed status, language (English), and direct relevance to
ransomware detection. Through this filtration process, the
initial corpus of 497 articles was narrowed down to 75. FIGURE 4: Literature included in this survey, segregated by
year
D. RESULTS
Fig. 3 shows the articles segregated by journal. 23% (17)
of articles combined were sourced from ACM, Hindawi, and Security, Cluster Computing and Expert Systems with
IOS, MDPI, PeerJ and Wiley. 17% (13) of articles originated Applications. The largest number of articles were sourced
from Springer, from journals such as Applied Intelligence from IEEE, representing 31% (23) of articles from a range
and Cybersecurity. 29% (22) of articles came from Science of journals such as IEEE Access and IEEE Transactions on
Direct, originating from various journals such as Computers Computing. Since 2018, the number of articles related to
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
ransomware detection has grown steadily, as shown in Fig. attention to ransomware detection to address the threat. Most
4. 28% (21) of articles were published in 2023, which was of these studies focus on crypto-ransomware since it is the
larger than in previous years, indicating steady growth of the most common ransomware criminal groups use.
domain.
2) Locker-ransomware
III. UNDERSTANDING RANSOMWARE Otherwise known as Screenlocker ransomware. Locker-
This section looks at ransomware in depth. We define ran- ransomware blocks users out of system resources by hijack-
somware, highlight recent ransomware attacks and discuss ing system operations such as input devices and applications
ransomware behaviour that has been observed. [14]. Unlike crypto-ransomware, which encrypts user files,
locker ransomware often forces the victim to stay on the
A. RANSOMWARE DEFINED ransomware screen while leaving the underlying files intact.
Ransomware has been described throughout the literature as Removing locker ransomware usually means disabling the
a type of malware designed to extort data and hold it for malicious process or removing the malicious application.
ransom [33] or as malware that extorts users by locking them For this reason, it is a less effective means of extortion
out of computer resources and demanding payment to get ac- than crypto-ransomware and, hence, less commonly used by
cess back [16]. Overall, ransomware enables cybercriminals Cybercriminals.
to profit from the fear of losing or disclosing sensitive data.
Three types of ransomware facilitate this kind of activity, B. RANSOMWARE FAMILIES USED IN RECENT
namely: ATTACKS
This section outlines several prominent ransomware families
1) Crypto-ransomware recently observed in the wild based on their global impact
Most ransomware attacks reported are from crypto- and provides high-level information about their attacks and
ransomware [2]. Crypto-ransomware (short for crypto- techniques.
graphic ransomware) uses cryptography to encrypt the vic- • Rhysida (2023): Rhysida is a newly identified ran-
tim’s files. Unlike other types of malware, which operate in somware strain actively impacting various sectors, in-
stealth, ransomware quickly encrypts as many files as pos- cluding education, healthcare, manufacturing, informa-
sible. Early crypto-ransomware used symmetric encryption tion technology, and government departments. The ac-
keys to encrypt files [34]; however, recent ransomware uses tors behind Rhysida specifically target entities with
asymmetric encryption to increase the chance of payment. weak authentication controls, utilising remote services
Once the target files are encrypted, the victim is instructed like virtual private networks (VPNs) to connect ex-
to pay the malicious actor (in cryptocurrencies) for the de- ternally. Employing "living off the land" techniques
cryption key to recover the files. Researchers have increased for stealth, they utilise RDP for lateral movement and
6 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
PowerShell. Encryption involves a robust 4096-bit RSA and Remote Desktop Services Remote Code Execu-
encryption key with a ChaCha20 algorithm, signified tion. Mimikatz is employed for privilege escalation,
by a .rhysida extension on encrypted files. Rhysida em- while TightVNC facilitates connections with external
ploys a "double extortion" strategy as explained in sec- Command and Control centres, showcasing typical ran-
tion III-C5, threatening to expose sensitive data unless somware tactics. Lockbit incorporates advanced evasion
the ransom is paid, aligning with common ransomware techniques, including disabling antivirus software, log
practices [35]. removal, and self-deletion. This ransomware strain has
• ALPHV Blackcat (2023): Whilst ALPHV Blackcat has been involved in high-profile attacks, such as the Accen-
been in the wild for a few years, in February 2023 the ture incident, where hackers breached the system and
group announced the release of a new version called demanded a hefty $50 million ransom [38]. The latest
"ALPHV Blackcat Ransomware 2.0 Sphynx update". iteration, Lockbit 3.0, introduces bug bounty programs
ALPHV Blackcat affiliates infiltrate victim networks, to enhance its software [39].
employing remote access tools such as AnyDesk, Mega • BlackByte (2021): A Ransomware-as-a-Service (RaaS)
sync, and Splashtop for data exfiltration. They use le- variant that is delivered by phishing or exploiting soft-
gitimate tools like Plink and Ngrok for network access, ware vulnerabilities to gain access, such as the un-
utilising Brute Ratel C4 and Cobalt Strike as beacons for patched ProxyShell vulnerabilities in Microsoft Ex-
command and control. Evilginx2 facilitates adversary- change Servers [40]. Evades detection by “living off the
in-the-middle attacks to acquire multifactor authenti- land” or using legitimate software such as certutil to
cation credentials. To evade detection, they leverage download additional components and the Remote Desk-
allowlisted applications like Metasploit and clear logs top Protocol (RDP) to establish remote connections.
on the exchange server. Data is transferred through Attacks corporate and critical infrastructure sectors such
Mega.nz or Dropbox, and ransomware is activated, ac- as government, financial, food and agriculture facilities.
companied by an embedded ransom note [36]. since • AvosLocker (2021): Operates under the RaaS model
their inception, ALPHV Blackcat has siphoned over and targets Linux and virtual machines. Often uses
$300 million in ransom payments. well-known tools such as AnyDesk to establish con-
• Lockbit3.0 (2022): Lockbit stands out as one of the nections, mimiKatz to exploit authentication protocols,
most widespread ransomware strains, functioning under and CobaltStrike to deliver malicious payloads [41]. It
the Ransomware-as-a-Service model. Despite recent it- encrypts files into the ".avos" extension, then directs
erations, the original Lockbit (version 1.0) surfaced in victims to a ransomware note created in every directory
2019. Affiliates distributing Lockbit ransomware earn a called "GET_YOUR_FILES_BACK.txt". AvosLocker
substantial share, ranging between 60-75% of the illicit is notable for employing affiliates to exploit vulnerabil-
profits. Notably, Lockbit has been linked to 16% of all ities in Microsoft Exchange Server and Proxy Shell for
ransomware attacks in the U.S. [37]. The Lockbit group ransomware delivery. The malicious actors associated
frequently exploits zero-day vulnerabilities for initial with AvosLocker have a track record of engaging vic-
access, exemplified by vulnerabilities like Log4Shell tims directly for ransom negotiations. Moreover, they
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
resort to Distributed Denial of Service (DDoS) attacks keyboard language as a precaution to avoid installation
to pressure victims into compliance. on systems configured with specific languages.
• Hive (2021): Gains access through single-factor logins • REvil (2019): One of the most prolific ransomware
through RDP, Virtual Private Networks (VPNs) or re- strains is also known as Sodinokibi [53]. Operating
mote network protocols. Once in, the malicious actors under a Ransomware-as-a-Service model, REvil intro-
exploit multiple Microsoft Exchange vulnerabilities and duced the double extortion model. Responsible for the
load malicious backdoor scripts (known as webshells) infamous Kaseya attack, whereby the malicious actors
to allow the attackers to execute malicious PowerShell demanded $70 million. OFAC has designated sanctions
scripts. Hive disables well-known AV products and against two perpetrators for their role in the Kaseya
uses RDP and Windows Management Instrumentation attack [54]. The pair received over USD 200 million in
(WMI) to move laterally. The malicious actors leave a ransom payments in Bitcoin [2]. Since REvil operates as
ransom note "HOW_TO_DECRYPT.txt" in every direc- a RaaS offering, affiliates have used various mediums to
tory and publish data on an Onion site if the victim distribute the malicious payload. For example, phishing
doesn’t pay [42]. Responsible for large-scale attacks emails compromised RDP sessions and software vulner-
such as the attack against the Costa Rican Government abilities seen in the Kaseya attack [55]. Unlike other
[43] and the disruption of three hospitals in Ohio and ransomware samples, REvil uses different encryption
West Virginia, forcing staff to use paper charts [44]. algorithms, such as Elliptic-curve Diffie-Hellman, in-
• Darkside (2020): Known for their stealthy techniques stead of RSA, which is faster by comparison. REvil uses
in the initial stages of an attack and exploiting compro- advanced evasion techniques such as process injection,
mised contractor accounts and servers to access external starting in safe mode and disabling security tools. It
servers. Initially establishes Command and Control con- is known to detect the default system language and
nections using the Remote Desktop Protocol (RDP) over terminate itself if the language is found on a list [56].
HTTPS and routed through the TOR network [45]. This • Conti (2019): One of the top three ransomware groups
ingeniously masks the web traffic to evade detection. that also use the RaaS model and double extortion. Conti
Darkside uses well-known tools, such as Mimikatz and also sells access to organisations if ransoms are not paid
psexec, to harvest credentials. Malicious payloads are [57]. Known for attacks against the Costa Rican Gov-
not deployed until later stages in the attack lifecycle and ernment and forcing the country’s president to declare a
will cease execution if it is debugged with a VM [46]. state of emergency [9]. The Conti ransomware encrypts
It uses process injection to evade detection and, after its payload to evade detection from security software
successful encryption, deletes all tools to remove traces. and uses DLL injection to load malicious payloads into
It gained notoriety for the notable Colonial Pipeline memory. Once established on a target server, it spreads
attack, coercing the company to pay a USD 5 million across the network using Server Message Block (SMB)
ransom [47] and forcing the price of gas to increase to [58].
$3 per gallon. The group behind the attacks is known
to launch highly customised attacks and is ruthless in C. RANSOMWARE ATTACK STAGES AND TRENDS
its attacks against hospitals, schools and governments Understanding ransomware behaviour involves dissecting the
[48]. After the attack, the group behind the malware is attack lifecycle into various stages, but achieving consensus
thought to have been disbanded (or rebranded) follow- on how these stages are defined remains a challenge. Some
ing pressure from the FBI. studies take a holistic approach, categorising ransomware
• Clop (2019): Inflicting financial damages exceeding attack stages into reconnaissance, distribution, execution,
$500 million, the group behind Clop recently faced encryption, and extortion [48]. In contrast, others focus solely
intervention from the Ukrainian police, resulting in the on post-infection steps [16], ignoring activities preceding
arrest of gang members involved in laundering illicit malware execution, such as delivery to the target system.
funds acquired through their attacks [49]. This hacking Certain studies adopt established attack frameworks like
group is notorious for exploiting zero-day vulnerabil- the Lockheed Cyber-kill-chain [59], outlining phases such
ities to breach remote networks, exemplified by the as reconnaissance, weaponisation, delivery, exploitation, in-
pre-authentication command injection vulnerability on stallation, command and control, and actions on objectives.
the GoAnywhere MFT platform [50]. Upon establish- The reconnaissance phase, challenging to detect and often
ing connections with external Command and Control uncertain regarding the intended target, is omitted in some
servers (C&Cs), Clop downloads widely recognised models [60] and is shown in Fig. 6.
hacking tools such as Cobalt Strike and Truebot [51]. For a successful ransomware attack, malicious files must
Clop employs AES, RSA, and RC4 encryption tech- be delivered to the target machine through phishing, spam,
niques, appending the ".clop" extension to encrypted drive-by-downloads, exploits, or social engineering [34]. In
files. Like other ransomware strains, Clop scans run- the case of crypto-ransomware, after execution, the malware
ning processes to identify security software, disabling paves the way for file encryption, employing evasion tech-
or uninstalling them [52]. Notably, Clop checks the niques to thwart security software as explained further in sec-
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
3) Evasion techniques
FIGURE 6: Stages of a ransomware attack [60]
Modern ransomware employs sophisticated techniques to
avoid detection. In the past, early malware utilised obfus-
tion III-C3. Once a foothold is established, the ransomware cation methods like polymorphism and code encryption to
enters a discovery phase, identifying files to encrypt and hide ransomware files. Still, dynamic detection methods dis-
gathering crucial information about the target environment cussed in the literature have proven effective against these
[14], [60]. External servers are often contacted for addi- techniques [62]. In the initial stages of an attack, ransomware
tional modules or file transfers. The ransomware achieves commonly uses evasion techniques such as return-oriented
its primary objective by encrypting sensitive data following programming (ROP) to conceal the malicious file within
these steps. The victim receives instructions for file recov- benign processes [63] or DLL side-loading to inject the
ery through payment, conveyed through a ransom note or malicious payload into a benign process [64]. Additionally,
changes to desktop backgrounds. This overview sets the stage ransomware may utilise fileless malware, such as PowerShell
for discussing key trends observed in recent ransomware scripts, to disable security software and download the ran-
attacks. somware payload [65]. Because fileless malware operates
directly in the system’s memory, it often goes undetected by
1) Ransomware-as-a-Service security software [66]. More recently, ransomware has been
The profitability of ransomware has increased with criminal observed restarting infected systems in safe mode to evade
groups adopting a Ransomware-as-a-Service (RaaS) business detection by security software that may not load during this
model [14]. Similar to Software-as-a-Service, RaaS involves mode [67].
selling or renting ransomware capabilities to affiliates in
exchange for a commission [6]. Ransomware groups em- 4) Encryption
ulate legitimate software companies, even offering product Arguably, one of the most significant functions of ran-
improvement or bug discovery rewards [39]. Group leaders somware lies in its advanced encryption methods. In the
typically maintain strict control over their software and pay- early stages, ransomware utilised symmetric encryption to
ment infrastructure, providing affiliates with commissions of encrypt files rapidly. However, a key issue with symmetric
up to 60-75% in cryptocurrencies [61]. This model allows encryption was using the same key for encryption and de-
ransomware developers to concentrate on their core product cryption. If the encryption key was discovered during system
while aspects like money laundering or reconnaissance are forensics, the extortion could be thwarted [68]. This led to
outsourced to third parties. Despite its advantages, govern- the widespread adoption of asymmetric encryption, where
ment agencies are taking notice, sanctioning well-known different keys are used for encryption and decryption. In this
groups like REvil, responsible for extorting over USD 200 method, the ransomware encrypts files using the public key
million [2]. while holding the private key, which is then sold back to
the victim for file decryption. Although effective, asymmetric
2) Modular nature of ransomware encryption is notably slower than its symmetric counterpart.
Many ransomware detection studies in the literature primar- Since ransomware aims to encrypt as many files as possible,
ily focused on identifying the actual executable responsi- prolonging the encryption process diminishes the attack’s
ble for encrypting files. Consequently, these studies often effectiveness. Recently, ransomware has embraced a hybrid
train and test machine learning models using samples from approach [69], encrypting files with a symmetric algorithm
open repositories like VirusTotal 5 . However, in real-world like AES and employing an asymmetric encryption algorithm
scenarios, executing the actual ransomware executable, re- such as RSA to encrypt the symmetric key. This encrypted
sponsible for file encryption, typically occurs later in the key is then sent to the criminals for sale back to the victim
[70]. A recent innovative approach involves partial encryp-
5 https://www.virustotal.com/ tion of files, exemplified by LockBit. LockBit, which claimed
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
the fastest encryption time among ransomware, achieved this somware could be detected, as shown in Fig. 6. While "early
by encrypting only the first 4 bits of data [71]. detection" is a frequently used term, it’s important to note
that most ransomware detection proposals examined in the
5) Double extortion literature aim to identify ransomware before it encrypts sen-
Double extortion is a tactic growing in popularity among sitive data. The subsequent sections reveal that researchers
ransomware groups. In practice, this involves the attacker have effectively observed ransomware behaviour during its
exfiltrating the victim’s data and threatening to sell, auction, delivery, installation, communication with external servers,
or publish the data on third-party sites. Since ransomware and after payments have been made.
establishes external connections with C&Cs, the ability to
exfiltrate data is not hard to include as a feature. This was 1) Detection during delivery
recently observed in the modus operandi of the Rhysida Numerous studies have proposed solutions for detecting ran-
ransomware family to establish another source of income somware over the network before it reaches the host. This
from the same victim [35]. Since data breaches are often aligns with the "delivery phase" in the Cyber-Kill-Chain,
reported at the government level and sometimes incur heavy covering ransomware transmission from the source to the
penalties, the threat of leaking sensitive data to the internet intended target. While "early detection" is prevalent, it com-
is used as leverage to obtain the requested ransom. However, monly refers to identifying ransomware before it encrypts
paying the ransom fee is no guarantee that the data will not sensitive data. Yet, most studies asserting early detection
be published. typically extract features at the host level, specifically during
the installation of ransomware. Given that the ransomware
IV. RANSOMWARE DETECTION USING MACHINE binary is typically deployed in the final stages of an attack,
LEARNING this approach is inherently riskier, with a small window for
In the initial stages, detecting malware relied on file signa- detection. A different strategy involves detecting ransomware
tures to identify malicious behaviour. Signature-based de- activity before or during transmission before its payload runs
tection entails comparing file information, such as the MD5 on the host. Numerous authors have effectively showcased
hash, with a database of known malicious file hashes [72]. the success of this approach [74]–[79].
This method, widely adopted by antivirus software for its Liu and Patras [74] identify ransomware while it seeks new
high accuracy and ease of implementation, faced challenges victims on the network via Server Message Block (SMB)
in updating databases with new ransomware signatures, as protocol requests on port 445. Their method effectively coun-
evading detection became simple by altering a few lines of tered WannaCry ransomware, which exploited the Eternal-
code within the file. In response, researchers delved into Blue vulnerability to access other network systems. Using
behaviour-based detection to identify malicious software. Bi-ALSTM, the authors detect attack patterns before the
Unlike signature-based detection, behaviour-based detec- network is compromised, achieving an impressive 99.97%
tion seeks to unveil a file’s intended malicious actions by detection rate.
analysing it either at rest (static detection) or during ex- Berrueta et al. [75] employ a similar strategy, monitoring
ecution (dynamic behaviour). Machine learning facilitates the communication between clients and file servers using a
automation by creating ML models and comparing the file’s network probe. This probe captures and analyses file-sharing
behaviour with known benign and malicious files. Although traffic, including SMB and NFS traffic. In their study, the
ransomware emerged in 1989 with the AiDS family [73], authors extract features from network traffic, such as file
ransomware detection didn’t gain prominence until 2016 reads, writes, deletes, and rename actions. These features
[22]. By then, machine learning had become a well-explored are utilised to train and test various algorithms, including
research area for malware detection. Consequently, machine decision trees, tree ensembles, and neural networks, resulting
learning has been a recurrent theme in ransomware detection in the detection of ransomware activity with an impressive
studies. Early studies heavily relied on traditional ML tech- accuracy of 99.9%. The advantage of this approach lies in the
niques, while more recent research increasingly adopts deep fact that the probe analysing the traffic is not directly exposed
learning approaches. The taxonomy depicted in Fig. 7 out- to the ransomware, as the tool operates off-path. However,
lines the detection phases, machine learning techniques, and this approach is vulnerable to file-less ransomware.
system design approaches utilised in ransomware detection Maimó et al. [76] have developed a ransomware detec-
using machine learning. This serves as a summary for the tion system specifically designed for integrated clinical en-
subsequent sections. A comprehensive overview of studies vironments (ICE). Given that certain clinical data, much
from the literature is presented in Table: 4. like some ransomware traffic, travels in an encrypted form
across the network, examining individual network packets
A. RANSOMWARE DETECTION PHASES becomes challenging. To overcome this, the authors monitor
In this section, our goal is to comprehend the early de- network flows instead of scrutinising each packet separately.
tection possibilities within the ransomware attack lifecycle. Their study uses a sliding window technique to observe
To achieve this, we align studies with the Cyber-Kill-Chain network traffic for anomalies. The features captured include
(CKC) [60] to pinpoint the earliest stage at which ran- TCP/UDP network features such as source IP, destination
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
IP, and destination port. The study utilises the One-Class ing ransomware behaviour during this phase are illuminated
Support Vector Machine (OC-SVM) anomaly detection al- by: (1) the initiation or execution of an application or process;
gorithm to pinpoint unusual traffic, and the Random Forest (2) observed modifications to files, or; (3) the identification
algorithm is applied to categorise the identified traffic pat- of system-wide anomalies, such as changes in CPU activity.
terns as malicious or benign. Consequently, most studies derive features from the host,
An alternative to detecting ransomware behaviour during incorporating data such as system calls (syscalls) or Windows
transit is post-delivery detection [80], for example, whilst the Application Programming Interfaces (APIs). To categorise
file has reached its destination but has not been activated. malicious files, studies employ techniques like monitoring
This often involves extracting the static features of the file, the execution cycles of ransomware or transferring files to
such as printable strings and opcodes, and classifying these secure virtual machines (VM) for detonation and subsequent
files before installation. However, ransomware can easily analysis.
evade this approach using obfuscation techniques such as Syscalls prove effective in capturing suspicious behaviour
polymorphism and encryption [22]. immediately after the launch of a malicious file. Ransomware
While detecting ransomware during its delivery has proven necessitates interaction with the executive layer of the op-
effective, it comes with various challenges, including: (1) erating system (OS) to effect system-wide changes, such as
Obfuscation, encrypted payloads, or encrypted tunnels often deleting shadow files. A common approach involves extract-
hinder detection; (2) Finding features without needing the ing syscalls at set time intervals and subsequently modelling
host to be infected is challenging; (3) Packet inspection is the behaviour based on this data [81]. While studies have
an expensive and difficult task. demonstrated accurate results in modelling ransomware be-
haviour using syscalls, it’s important to note that extracting
2) Detection while ransomware is installing or running syscalls is more intricate than pulling API calls and de-
The prevalent method for ransomware detection involves mands a kernel driver for real-time ransomware detection.
monitoring its activities while it is running or post-execution, Despite this complexity, syscalls offer practical features for
aligning with the installation phase of the Cyber-Kill-Chain. ransomware detection because: (1) standard API calls are
This period spans from the execution of the malicious binary resource-intensive; (2) API calls provide limited telemetry at
to the encryption of sensitive data and the display of the ran- the user level, and; (3) syscalls offer visibility into rootkits
somware note. In the existing literature, instances of detect- and low-level malware activities, such as the deletion of
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
backup files [82]. 4) Detection after the ransom has been paid
An alternative strategy involves deploying decoy files to iden- An emerging area of research involves detecting ransomware
tify the presence of ransomware on a host. Shaukat et al. [83] activity even after the ransom has been paid. By leveraging
employed this approach by monitoring for changes to decoy the transparent nature of cryptocurrency flows, some studies
files. This method proves accurate for detection, as there is utilise machine learning to identify transactions facilitating
circumstantial evidence of ransomware following tampering ransom payments to malicious actors. While this method
with a file. However, detection typically occurs only after doesn’t prevent ransomware from executing or encrypting
files are encrypted, providing a limited time window for files, it provides insight into the extent of infection caused
ransomware removal. Another approach involves monitoring by a particular ransomware strain. This approach proves
system logs for malicious behaviour. In contrast to methods beneficial in threat intelligence scenarios, helping to gauge
that require a trigger for detection, such as the launch of an any uptick in ransomware transactions.
application, holistic system tracking can identify malware Al-Haija and Alsulami [86] employ the Bitcoin network
even when advanced evasion techniques, such as process for ransomware activity detection. Their study utilises a
injection or hiding behind multiple processes, are employed. dataset comprising around 3 million transactions, segmented
Roy and Chen [84] monitor for anomalies using the Windows into 41,413 ransomware transactions and 2.9 million benign
Logging Service (WLS) to extract Event ID sequences. Once transactions. They use 10 features, including Bitcoin address,
extracted, these sequences are sent to a centralised server on year, count, income, etc., to train and test supervised machine
the network, which employs BiLSTM-CLF for classification. learning models like shallow neural networks (SNN) and
Detecting ransomware during the installation phase of the optimisable decision trees (ODT). The authors achieve a
attack lifecycle has proven to be effective, given the diverse remarkable 99.9% accuracy in classifying transactions as
range of features available for detecting malicious behaviour. benign or malicious and a 99.4% accuracy rate in classifying
ransomware into their respective families.
3) Detection while ransomware is communicating with an Alsaif [87] adopts a similar approach, extracting 18 payment-
external C&C related features such as Bitcoin address, transaction day,
An additional method for ransomware detection involves transaction amount, and timestamp from the same dataset.
monitoring communication with an external server, com- The author then employs supervised machine learning mod-
monly called a Command and Control centre (C&C). Once els like Logistic Regression (LR), Random Forest (RF),
persistence is established on a target machine, ransomware and Extreme Gradient Boosting (XGBoost) for classifying
frequently communicates with an external server to facilitate ransomware transactions, achieving an impressive 99.08%
tasks such as exchanging private encryption keys, download- accuracy.
ing supplementary modules, or exfiltrating sensitive data. Overall, leveraging the Bitcoin network proves to be success-
To circumvent network-level blocking of external domain ful in detecting ransomware transactions. This method’s util-
names, ransomware often incorporates domain generation ity extends to threat detection, and if integrated into Bitcoin
algorithms (DGA) to evade detection. This incorporation miners, it can add value by flagging suspicious transactions
enables ransomware to communicate with an external server early on.
without disclosing hardcoded IP addresses, complicating
forensic efforts. B. MACHINE LEARNING TECHNIQUES
Almashhadani et al. [85] applied machine learning to iden- In this section, we examine the various machine learning
tify suspicious communications within a network and detect (ML) techniques employed in ransomware detection as doc-
domain names that resemble those generated by domain gen- umented in the literature. Our objective is to introduce a
eration algorithms (DGAs). Utilising 16 semantic features, taxonomy for categorising ransomware detection systems
including metrics like the number of vowels and entropy cal- based on the detection phase, ML techniques utilised, and
culation for suspicious domain names, the authors employed the design approach employed in their construction. Also,
KNN for classification, achieving a notable detection accu- we provide an overview of datasets frequently used in the
racy of 98%. Despite this success, the study acknowledges literature to help guide future research efforts.
potential evasion techniques, such as using shortened URLs
or alterations to reduce the entropy of malicious domain 1) Features used to detect ransomware
names. Most ransomware communicates with an external Cybersecurity researchers employ both static and dynamic
server; however, many studies assume such communication features for ransomware detection. Static features, extracted
will occur. Consequently, if ransomware opts not to com- from executable files at rest, include printable strings, Op-
municate with an external Command and Control (C&C), codes, and function calls. While effective in detecting mali-
detecting it during the C&C phase becomes futile. Encryp- cious activity, static-based detection can be evaded through
tion poses an additional challenge during this phase, as many obfuscation. Dynamic features acquired at run-time offer a
malware samples establish connections through encrypted robust detection method. Unlike static features, dynamic ones
tunnels or onion networks like TOR, making detection more are harder to obfuscate as their extraction occurs after the
challenging. malware reveals the executable’s true nature. Static features
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
Network features
System features
False positives
File features
Op Codes
Precision
Accuracy
Syscalls
Images
Recall
Bytes
API
F1
Roy and Chen [84] BiLSTM, CRF 17 Ransomware samples 99.67%
Hsu et al. [88] SVM 4 Ransomware samples, 22 benign files 92% 94% 84% 88%
LR, SVM, RF, J48
2600 Malware samples, 550 Ransomware
Poudyal and Dasgupta [89] AdaBoost (RF/J48), 99.72% 0.1%
samples from VirusTotal. 540 Goodware
Neural Network
DT, RF, Gradient Boosting,
Aurangzeb el al. [90] 160 Malware samples from VirusShare 94%
Extreme Gradient Boosting
90,364 Malware samples from VirusTotal,
Naive Bayes, KNN, RF, 39,136 from VirusShare (distilled into
Molina et al. [91] 94.92% 95.16% 94.44% 94.61%
ANN, LSTM, BiLSTM 19,499 Ransomware samples from 2010-
2019 from 21 families
85,000 malicious Domain
DT, Ensemble tree,
Almashhadani et al. [85] Generation Algorithms (DGA) from 20 94.52% 4%
Naive Bayes, SVM, KNN
Ransomware families. 85,000 Goodware
1,354 Ransomware samples from 14
Ahmed et al. [92] DT, KNN, LR, RF, SVM families (from VirusShare and VirusTotal) 97.4% 1.6%
1,358 Goodware
1,232 Ransomware samples from 14
Deep Learning, SVM, RF,
Sharmeen et al. [93] families from VirusShare and VirusTotal. 95.96%
Multi-class classifier
1,308 Goodware samples from Windows 7
MOGWO (Deep learning
Swarm Intelligence based 582 Ransomware samples, 942 Goodware
Khan et al. [94] 87.91% 12%
algorithm), Naive Bayes, samples
AdaBoost, Decision Stump
904 Ransomware from VirusShare and
Kok et al. [95] RF 100% 0%
942 Goodware samples
840 Ransomware from VirusTotal (Cerber,
Khammas [96] RF 97.74% 0.6%
TeslaCrypt and Locky), 840 Goodware
1,846 Ransomware and Goodware from
Kok et al. [97] RF 99%
VirusTotal and theZoo
RF, Naive Bayes, LG, KNN,
1,000 Ransomware, 900 Malware, 300
Bae et al. [98] SVM, Stochastic Gradient 98.65% 98.25% 98.94% 98.54%
Goodware samples
Descent (SGD)
Extra tree classifier, RF,
1,000 Ransomware samples divided into
Keong et al. [99] Gradient Boosting, KNN, 96.53% 96.23% 96.44% 96.25%
18 families
LR, Guassian Process, SVM
220 Locky, 220 Cerber, 220 TeslaCrypt,
Homayoun et al. [81] LSTM, CNN, MLP 99 CryptoWall, 28 TorrentLocker, 77 97.2% 2.7%
Sage and 220 Goodware from 2016-2017
Malware Capture Facility Project (MCFP)
RF, LibSVM, Bayes Net,
Almashhadani et al. [100] dataset with Locky samples from 98.72% 2.1%
RT
VirusShare
KNN, Linear Model, DT,
Lee et al. [101] DT Ensemble, Kernel Trick, 1,200 files encrypted by Ransomware 100%
Neural Network
PEDA, RF, Naive Bayes,
Kok et al. [102] Ensemble (RF and Naive 582 Ransomware and 942 Goodware 99.3% 1.56%
Bayes)
LR, SVM, ANN, RF, 574 Ransomware from VirusShare from
Shaukat and Ribeiro [83] 98.25% 0.56%
Gradient Tree Boosting 12 families, 442 Goodware
Naive Bayes, Bayes Net,
Ransomware including Cerber, TeslaCrypt,
Cohen and Nissim [103] J48, RF, LR, LogiBoost, 92.2% 5.2%
Vipasana, Chimera and HiddenTear
SMO, Bagging, AdaBoost
DT, RF, Naive Bayes, Ransomware including Cerber, TeslaCrypt,
Nissim et al. [104] 97.9% 0%
Bayesian Network, SVM Vipasana, Chimera and HiddenTear
Regularized Logistic Regression Resilient Information Systems Security (RISS)
Abbasi et al. [105] 94.33%
(RLR), RF, DT, SVM, kNN dataset with 582 ransomware and 942 goodware
Masum et al. [106] DT, RF, NB, LR, NN 138,047 Ransomware samples from VirusShare 99% 99% 97% 97%
1,500 malware samples from Virusshare.com
Chaganti et al. [107] CNN, CNN-LSTM, DNN 96% 96% 96%
and 875 benign files from portableapps.com
CCF BDCI-21 (5,841 malware samples)
Li et al. [108] XGBoost 99.2%
Microsoft BIG-15 (10,868 samples)
Resilient Information Systems Security (RISS)
Ba’abbad and Batarfi [109] Hoeffding Tree Classifier (HTC) dataset containing 582 ransomware 99.4%
and 942 goodware
Various Ransomware samples from
Woralert et al. [110] LSTM
MalwareBazaar
35,367 samples from VirusShare
Deep reinforcement learning (DRL)
Deng et al. [111] 27,118 benign samples from Windows 97.9% 97.4% 97.9% 97.7%
based on Double Deep Q Network (DDQN)
688 samples from [112]
RF, DT, SVM, kNN, XGBoost, DNN,
Thummapudi et al. [113] 100 samples from VirusShare 92.3% 99.5% 97% 95.6% 3%
LSTM
5 datasets: 6,263 from VirusShare, 7,703 from
Sorel-20M, 668 from ISOT, 6,263 malware
Gulmez et al. [114] CNN, XAI (LIME and SHAP) 98.2% 4.7%
from VX Heaven and 14,797 benign samples
from informer.com
RF, LR, SVM, J48, Gradient Boosting
Fernando and Komninos [115] 720 ransomware samples, 2000 benign 94.3% 94.3% 0.8%
Trees, BN, MLP, SGD
Karbab et al. [116] CNN, LSTM, MLP 45k Benign, 38k Ransomware 93.66% 1.93%
Mohan et al. [117] Random Forest 183 ransomware samples from Malware Bazaar 98.68%
Generative Adversarial Network (GANs), 8,152 from VirusShare and 1000 benign
Gazzan and Sheldon [118] 98% 96% 0.14%
CNNs, LSTM samples from informer.com
BGP update messages collected from
LightGBM, CNN, RNN (LSTM, GRU), Jan-21-21 until Jan-31-21 (11 days) from RIPE
Li et al. [119] 84.27% 84.23% 86.90%
Bi directional RNN (Bi-LSTM, Bi-GRU) remote collector and route views collector
TELXATL
SVM, LR, Deep Belief Networks (DBN),
Alqahtani and Sheldon [120] 39,378 ransomware files from VirusShare 94.60% 94.20% 97.40% 95.90%
CNN, MLP
Singh et al. [121] CNN, Pre-trained transformer algorithms Cloud encrytped dataset 99.50% 98.50% 97.64%
41,413 ransomware transactions from the
Alsaif et al. [87] LR, RF, XGBoost 99.08% 99.86% 99.16% 99.5%
UCI Machine Learning Repository
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
Network features
System features
False positives
File features
Op Codes
Precision
Accuracy
Syscalls
Images
Recall
Bytes
API
F1
215 Cyrpto ransomware from Sophos
Ayub et al. [122] Ensemble: DT, RF, AdaBoost, GB, SVM ReversingLabs (SOREL-20M), 3, 014 91.75% 91.99% 90.47% 91.05%
benign applications
AdaBoost, XGBoost, LightGBM, SVM, VirusTotal dataset with ransomware from 21
Chaithanya and Brahmananda [123] 91%
KNN, NB different families
1,106 ransomware samples from 20 different
Ganfure et al. [124] Affinity Propogation (AP) 0.20%
families from VirusShare. 11,311 benign files
15,000 malware samples with 5,000
Ciaramella et al. [125] CNN, VGG-16 96.90% 97% 96.90%
ransomware samples from VirusShare
Singh et al. [77] DT, SVM, RF, KNN, SGD, ANN 70 ransomware samples from 31 families 99.83% 99.82% 99.83% 99.83%
515 ransomware samples from 21 families
Ganfure et al. [126] CNN, OC-SVM, RATAFIA, EGB 98.20% 98.60%
from VirusShare
BitcoinHeist Ransomware Address Dataset.
Wazid et al. [127] RF, LR, DT, KNN Containing 2,916,697 benign transations and 98.98% 99.90%
41,413 ransomware related transactions
Prachi and Kumar [128] RF, SVM, kNN, LR, NB 50 ransomware samples from 10 families 99% 99.20% 98.90% 98.80% 0
Deep Contractive Autoencoder (CAE), RISS dataset containing 582 ransomware
Zahoora et al. [129] 93% 99% 93%
SVM, RF, LR and 942 benign samples
582 ransomware (11 families) and 942 benign
Aurangzeb et al. [130] SVM, RF, KNN, XGBoost, NN 98.00% 98% 94% 94%
samples from VirusShare
70 ransomware programs from Hybrid
Decision Trees (DT), Tree Ensembles
Berrueta et al. [75] Analysis and Malware Traffic Analysis (giving 99% 99.7% 100% 99.87%
(TE), Neural Networks (NN)
150 traffic traces in total)
Deep Contractive Autoencoder (DCAE)
582 ransomware samples and 942 goodware
Zahoora et al. [131] with Zero-shot learning, RF, GNB, 92.80% 95% 13%
samples from VirusShare
SVM, LR
Session-based: 4,128 ransomware, 4,128
DT, KNN, SVM, Discriminant analysis, benign time-based: 445 ransomware, 445
Almashhadani et al. [78] 99.88% 99.76% 100% 99.88% 0.024%
bagged tree benign from Malware Capture Facility
Project (MCFP)
211,807 ransomware files (8lock8, Powerware,
Kim et al. [132] SVM, Neural network 0.994
Jigsaw, Maktub, CryptoJoker) from
Du et al. [133] KNN, RF, DBSCAN 9,458 ransomware from 25 families 99% 98% 99% 99%
DCGAN (Deep convolutional GAN),
Zhang et al. [79] CICIDS2017, KDD99, SWaT, WADI 98.10% 9.28% 98.70%
TGAN (Transfer GAN)
RNN, MLP, SVM, NB, DT, GBDT, 3,604 benign samples and 2,792 malware from
Rhode et al. [134] 81.5% 14%
RF, AdaBoost VirusTotal and Windows 7
Wong et al. [135] Transfer learning: ECOC-SVM Malimg, MaleVis, Virus-MNIST, Dumpware10 98.87%
Clear traffic, Ransomware samples include
Anomaly detection: OC-SVM for anomaly WannaCry, Petya, BadRabbit, PowerGhost
Maimo et al. [76] 99.99% 92.32% 99.97% 95.96% 4.6%
detection, Naïve Bayes (NB) for classification (50,537 ransomware samples), 100,000 clean
samples
calls [91], [93], [95], [97], [98], [102], system calls (syscalls)
[120], [123], system features like running processes, DLL,
and registry entries [91], OpCodes [125], bytes [96], [132],
and hardware features [117], [126]. Among these, Win-
dows Application Programming Interfaces (API) calls are
extensively used for ransomware detection [91], [93], [95],
[97], [98], [102]. The Windows API facilitates interactions
between programs and the operating system. With Intel x86
CPUs employing four protection rings, lower-level protec-
tion rings have elevated privileges [136]. In the Windows
ecosystem, applications in user space operate with limited
system privileges, while kernel-level access is sought for
system resources. To request these resources, applications
FIGURE 8: Features used in surveyed studies use API calls through Kernel32.dll, channelled to the kernel
via ntdll.dll, as depicted in Fig. 9. This chokepoint is an
effective location to capture API calls commonly associated
may be ineffective in specific scenarios, such as fileless with malicious intent. As file encryption occurs at the system
malware, which doesn’t save a file on the hard disk for static level, API calls provide a means to detect ransomware before
feature extraction. Researchers typically utilise combinations the encryption process begins.
of static and dynamic features from the host, network, and API calls have been employed in various manners in the liter-
file system for comprehensive ransomware detection. Fig. ature to identify ransomware. Alqahtani et al. [120] focus on
8 illustrates the distribution of features utilized across the directly detecting ransomware activity within cryptographic
literature. API calls. Cryptographic API calls play a crucial role in
ransomware attacks as they are directly associated with the
a: Host features encryption process employed by ransomware to encrypt user
The prevalent approach for detecting ransomware activity is files and data. By monitoring and analysing these API calls,
at the host level, where the malware executes. Ransomware the early detection model can recognise the initial stages of
detection at runtime involves various methods, including API ransomware activity, enabling pre-emptive measures before
14 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
encryption occurs. Using Deep Belief Networks (DBN), the infection, which is considered low-risk and less invasive.
authors achieve a ransomware detection accuracy of 94.6%. However, in practice, the encryption of malicious payloads
However, this approach assumes that the pre-encryption by cybercriminals poses a detection challenge. Consequently,
phase of a ransomware attack is constant and can be defined cybersecurity researchers have proposed various features for
by static attributes such as time or specific API calls. The extraction to enhance the ability to detect ransomware activ-
limitation is that malicious attackers may use obfuscation ity over the network.
techniques to evade detection. Almashhadani et al. [100] introduced an early ransomware
Karbab et al. [116] adopt a method of classifying API se- detection system leveraging network features. Their study
quences to identify ransomware. In their research, the au- identifies ransomware before it can compromise the host,
thors execute ransomware in sandboxes, label each report, utilising 18 packet and flow-based features, including TCP,
convert the reports into sequences of words, and then em- HTTP, DNS, and NBNS traffic from PCAP files. The authors
ploy the Long Short-Term Memory (LSTM) classifier for train and test their machine learning model, achieving a note-
classification. The authors observe an F1-score of 93.66% worthy 98.72% detection accuracy with the random forest
in their production environment and a low false positive algorithm.
rate of 0.99%. However, other studies show that ransomware Almashhadani et al. [85] adopt a distinct approach, employ-
authors can elude detection by employing code obfuscation, ing machine learning to identify malicious URL calls from
anti-virtualisation or API spoofing. API spoofing becomes a within the network. Their study capitalises on the necessity
notable concern, especially when the monitored API calls for ransomware to establish communication with external
originate from the user level of the operating system, as Command and Control (C&C) servers for tasks like down-
applications can easily trigger dummy API calls to hide loading additional modules or exfiltrating data. To prolong
malicious intentions. the infection period, many malware variants utilise domain
Detecting ransomware through system calls (syscalls) ex- generation algorithms (DGA) to avoid detection, preventing
tracted at the executive level of the operating system pro- network administrators from effectively blocking hardcoded
vides an alternative approach. When invoked by the API, URLs and simplifying forensic efforts. The authors extract
the ntdll.dll file in the user space calls the Ntoskrnl.exe file, pertinent features by analysing 16 characteristics from do-
which is then directed to the system drivers. Researchers main name strings in incoming DNS request packets. These
have exploited this bottleneck to identify malicious calls characteristics encompass metrics such as the count of vow-
to the kernel. In a study by Nissim et al. [104], volatile els in the URL and entropy. Utilising the K-Nearest Neigh-
memory dumps were utilised to analyse syscall behaviour. bours (KNN) algorithm, they attain an accuracy detection
In their study, the authors detect ransomware activity with an rate of 94.52%. However, it is crucial to recognise that the
accuracy of 97.5% using the random forest algorithm. Unlike encryption of packets may hinder the effectiveness of this
API calls, syscalls exhibit greater resistance to obfuscation, approach.
given that they are disclosed at runtime and cannot be directly Li et al. [119] identify anomalies in the Border Gateway
invoked. Protocol (BGP) logs obtained during the WestRock ran-
Cohen and Nissim [103] also employed volatile memory somware attack, signalling the presence of ransomware ac-
dumps to model host behaviour but focused on 23 operating tivity. BGP is a path-vector routing protocol crucial for
system features such as DLLs, processes, mutexes, handles, determining optimal routes for data packets between dif-
services, and threads, instead of relying solely on APIs or ferent networks. Its significance in internet operations lies
syscalls. Classification of memory dumps using random for- in enabling autonomous systems to communicate and make
est resulted in a notable 92.2% accuracy rate. While volatile routing decisions based on factors like network policies and
memory dumps effectively detect ransomware, the drawback path preferences. The authors gathered publicly available
lies in the substantial storage needed to store dump files BGP records from major internet exchange points globally
extracted regularly. Consequently, malicious activity may and extracted 37 features, including metrics like average
be identified after ransomware has established a foothold. edit distance and duplicate announcements. Following the
Moreover, recent advancements in evasion techniques, such training of a LightGBM algorithm, ransomware activity is
as process injection, multiple collaborative processes, and detected with an accuracy rate of 84.27%.
return-oriented programming (ROP), enable malware to con- The mentioned studies employ network-based features to
ceal itself behind benign processes. Despite the challenges identify ransomware activity when the ransomware commu-
associated with using host-based features for ransomware nicates with other nodes. However, features like DNS records
detection, it remains the most common approach due to its and BGP records mainly reveal ransomware behaviour after
ease of implementation and the abundance of available data an attack has commenced, and their primary utility lies in
for classification. curtailing the ransomware’s further spread. An alternative
approach to detecting ransomware is intercepting and exe-
b: Network features cuting payloads before they reach the host, a method em-
Detecting ransomware over the network involves monitor- braced by Keong Ng et al. [99]. In their study, the authors
ing data in transit, aiming for early identification before intercept executable files through the Suricata inline Intrusion
VOLUME 4, 2016 15
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
Prevention System (IPS) at the network’s gateway entrance. files. One significant challenge associated with this approach
Upon detecting executable files, 54 static features, including is that encrypted files, whether malicious or benign, typically
header size and checksum, are extracted and used for clas- exhibit higher entropy [138]. Consequently, it is inaccurate
sification. Suspicious files are directed to a secure virtual to assume that files with higher entropy are inherently mali-
machine for detonation, enabling the classification of their cious.
dynamic features. However, a limitation of this approach Wong et al. [135] identify dormant ransomware files using a
is its inability to detect encrypted data transmitted through vision-based approach. In their study, binary files are trans-
tunnels or ransomware delivered via drive-by downloads. formed into images, and deep learning is employed to extract
While network-based features show promise in early ran- features. Finally, an ensemble configuration of Support Vec-
somware detection, researchers encounter challenges posed tor Machines (SVM) using Optimal Error Correction Output
by evasion techniques such as encryption and tunnelling Coding (ECOC) is employed for classification, resulting in a
and ensuring the availability of features to substantiate ran- detection accuracy of 98.87%.
somware activity without its execution on the host. In essence, relying on file-based features for ransomware
detection is reactive. This implies that such methods either:
c: File-system features (1) need the ransomware file to be stationary long enough for
Another widely used method for ransomware detection in- detection, or; (2) the ransomware has already been executed
volves analysing static files for suspicious activity. Re- and encrypted files. Consequently, systems only employing
searchers have employed various techniques to identify ran- this method are racing against time to safeguard other files
somware activity within files, focusing on the file content from encryption. The effectiveness of this approach may
or metadata. The content-based approach involves extracting diminish as computing power advances and encryption pro-
features from the binary file, such as its code, headers, or cesses become faster and more efficient. The practical value
API calls. Alternatively, the metadata-based method involves of this strategy emerges when integrated into a real-time de-
examining attributes like file size, entropy, file hash, etc. tection system that actively prevents further file encryption.
Lee et al. [101] employed machine learning to identify
changes in file entropy within the system. In their study, 2) Datasets used throughout literature for ransomware
the authors gauge file entropy on the network and apply research
various algorithms such as SVM, KNN, logistic regression, One significant challenge researchers face in ransomware
Decision Tree, random forest, and gradient boosting. Despite detection is the need for publicly available datasets for
achieving 100% detection accuracy using this method, it’s training and evaluating machine learning models. Despite
worth noting that the study does not include legitimately calls from previous authors emphasising the importance of
encrypted files in their dataset, raising questions about the dataset sharing [14], [21], up to 96% of datasets used in
real-world applicability of their perfect result. the literature have yet to be made public [165]. This trend
Similarly, Hsu et al. [88] adopt a comparable approach, persists in recent literature on ransomware detection, where
utilising entropy to identify files that have already been over 85% of studies constructed their datasets by collect-
encrypted by ransomware. They train SVM using 17 file- ing ransomware samples from repositories like VirusTotal6
based features, including the compression ratio of the file and VirusShare7 , detonating them on secure servers, and
and report a detection accuracy of 92% in their experiments.
However, the study notes a high false positive rate attributed 6 https://www.virustotal.com/
16 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
TABLE 5: A selection of publicly available datasets used throughout the literature for Ransomware detection research
Year Dataset name Description Location
Real-time malware traffic captures consisting of malware captures,
2015 Stratosphere IPS Feeds [139] [139]
normal activity and mixed captures
100GB of raw traffic (TCP dumps) generated in the Cyber Range Lab,
2015 The UNSW-NB15 Dataset [140] [141]
UNSW Canberra. Consists of real and synthetic attack behaviours
Roughly half a Terrabyte of data from 9 known Malware families
2015 Microsoft Malware Classification Challenge (BIG 2015) [142] [143]
collated from Microsofts real-time detection
Resilient Information Systems Security (RISS) -
2016 Dynamic analysis of 582 Ransomware samples and 942 benign samples. [145]
Ransomware Dataset [144]
66,301 full system recordings (system dumps) from 2014 to 2016,
2018 MALREC [146] [147]
consisting of mixed Malware including Ransomware
2018 Malware Benchmark for Research Dataset (EMBER) [148] 1 Million PE static files scanned in or before 2018 [149]
Kernel and User level calls extracted from Cuckoo sandbox of 1000
2018 Dynamic Malware Analysis kernel and user-level calls [150] [150]
malware and 1000 clean samples.
Data breaches and Ransomware attacks from 2004 to A public dataset consisting of data breaches and Ransomware attacks
2020 [151]
2020 (The University of Queensland) [151] from 2004 to 2020
420GB of Ransomware and benign execution traces. Consisting of 669
2020 ISOT Ransomware Detection Dataset [152] ransomware samples and 103 benign samples run on a Cuckoo [153]
sandbox with Windows 7 (64bit)
SOREL-20M: A large scale benchmark dataset for malicious PE A dataset of 10 million disarmed malware files and 20 million extracted
2020 [155]
detection [154] features. The dataset contains around 1 million ransomware samples
Bitcoin transactions from January 2009 until December 2018. Contains
2020 BitcoinHeist Ransomware Address Dataset [156] [156]
2.9 transactions with nearly 41,413 malicious transactions.
Static features extracted from 57,293 malware samples (821 ransomware
2021 BODMAS [157] samples included) and 77,142 benign samples that were collected from [158]
August 2019 until September 2020
Ransomware and user samples for training and validating ML File-sharing traffic analysis of more than 70 ransomware binaries from
2021 [159]
models [159] 26 families and more than 2,500 hours of benign traffic
Nearly 500k files encrypted from various ransomware families such as
2022 NapierOne [160] [161]
Notpetya, Lockbit, Maze, Phobos, Netwalker, Dharma and Ryuk
Storage patterns for 7 (TeslaCrypt, Cerber, WannaCry,
2022 RanSAP [162] GandCrab, Ryuk, Lockbit and Darkside) ransomware samples and 5 [163]
benign samples
PCAP logs extracted from more than 90 ransomware files (since 2015)
Open repository for the evaluation of Ransomware
2022 from different ransomware families. Logs include DNS/TCP connections [164]
Detection Tools [164]
and Input/Output (I/O) operations
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
rithms and compared the results to choose the most accurate such as combining neighbouring pixels, and then inputted
classifier [103], [104]. The varying results can be attributed to them into a dense layer for classification. This approach
the differences in features fed into the model and the datasets enabled them to achieve an impressive 96.9% accuracy rate
used. in ransomware detection.
In contrast to deep learning methods, traditional machine Autoencoders have recently gained attention in the ran-
learning (ML) relies on domain experts to perform feature somware detection domain. As an artificial neural network
engineering and meticulously label data before feeding it into for unsupervised learning, autoencoders encode input data
the machine learning model. Consequently, various studies into a compressed format and decode it back to the original
in the literature distinguish themselves by employing diverse data, minimising the difference between input and recon-
combinations of features and algorithms. Traditional ML structed output. Zahoora et al. [129] employed Contractive
techniques face the challenge of requiring constant updates Autoencoders (CAEs) to extract host-based features from
to the underlying model due to the evolving nature of ran- ransomware, encompassing API calls, registry key setups,
somware. This elevates the risk of a ransomware attack, and binary strings. The authors then used multiple classifiers
especially given that traditional ML models are typically up- to detect ransomware activity with high accuracy.
dated from scratch, and there may be a considerable time gap The utilisation of Generative Adversarial Networks (GANs)
between updates. Another drawback of traditional machine is an emerging area in ransomware detection research. GANs
learning is its difficulty handling intricate data structures and are designed to generate new data instances resembling a
sequential patterns. given dataset, proving highly valuable in adversarial learning
Overall, traditional ML offers ease of implementation, fast tasks, such as creating scenarios for zero-day attacks where
detection, low resource consumption, and accurate results at relevant data may be scarce. Several studies in the literature
the expense of careful preparation and labelling of the data. have employed GANs to generate synthetic datasets [79],
[118], augmenting real attack patterns due to the evolving
b: Deep learning techniques nature of ransomware and the custom tactics involved in
Traditional machine learning methods necessitate domain zero-day attacks.
experts to label and extract meaningful features, making However, despite the recent emphasis on deep learning
the process time-consuming. In contrast, deep learning ap- techniques, it’s noteworthy that these approaches tend to
proaches can handle raw data and extract significant fea- be resource-intensive and time-consuming. This can pose
tures without relying on domain experts. Deep learning, a challenges, especially when aiming for real-time ransomware
well-explored branch of machine learning, has demonstrated detection. Authors have proposed various solutions to ad-
promising outcomes, especially with sequential data and dress this limitation, including applying feature selection
visual object recognition. This is achieved by processing data techniques to reduce the number of features [167] and im-
across multiple layers and utilising general-purpose algo- proving classification speed through algorithm modifications
rithms to derive higher levels of abstraction from previous [168].
layer outputs. Consequently, deep learning models can reveal While deep learning techniques present an exciting paradigm
unique insights that may be overlooked. The use of deep within the ransomware detection domain and show promise
learning approaches within the ransomware detection domain in detecting ransomware and generating synthetic attack sce-
is an emerging trend. Studies in the literature have employed narios, their resource demands should be carefully consid-
deep learning algorithms like Long Short-Term Memory ered.
(LSTM) networks, Convolutional Neural Networks (CNNs),
and Autoencoders to detect ransomware. c: Ensemble Learning techniques
LSTM networks are frequently utilised in research due to Ensemble learning models combine multiple base models to
their capacity to retain previously acquired knowledge. This enhance detection accuracy. This approach compensates for
characteristic proves particularly beneficial in ransomware errors in individual models by leveraging the strengths of oth-
detection, where the model needs to integrate new samples ers, resulting in improved overall accuracy. Ensemble models
without discarding its existing knowledge. LSTMs excel address common challenges in machine learning, including
in recognising patterns within sequential data, such as the class imbalance, concept drift, and overfitting (often called
time-series data commonly present in logs associated with the curse of dimensionality) [169].
ransomware detection. In their study, Karab et al. [116] Random forest is the most widely used algorithm in the
applied LSTMs by representing run-time behavioural reports ransomware detection domain due to its simplicity of imple-
as sequences of words using word2vec, and inputting them mentation and high accuracy. Boosting is another frequently
into an LSTM network to identify malicious patterns. applied technique, with studies using Ada-Boost, Gradient
CNNs are also widely employed, especially for their effec- Boosting, or XGBoost. Other studies have adopted a fusion
tiveness in analysing visual data. Ciaramella et al. [125] approach, combining various algorithms and weighting their
employed CNNs to classify binary images as ransomware. outputs [99].
They converted ransomware binaries into images, processed Ensemble learning methods offer high accuracy and ease
them through multiple convolution layers to extract features, of implementation and have proven particularly valuable in
18 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
early ransomware detection scenarios where limited data is virtual machines, conducting feature engineering, and train-
available for classification [170]. ing and testing models in offline batches [73]. While this
approach often produces accurate results with sufficient up-
d: Hybrid approaches front training data, its drawback is the necessity to train the
Recent studies have embraced hybrid approaches involving entire model from scratch when updates are required [171].
different algorithms for distinct tasks. One such approach When encountering an unknown ransomware sample in the
involves employing deep learning for feature extraction and wild, retraining the model with all available data is imperative
subsequently utilising traditional machine learning algo- to guard against zero-day attacks. During the time window
rithms for classification. between model training, there is the potential for a zero-day
For instance, Zahoora et al. [129] combine Contractive Au- attack. The escalating number of newly detected malware
toencoders (CAEs) with cost-sensitive base classifiers for samples compounds this challenge. In 2023 alone, AV-Test
ransomware detection with high accuracy. The authors use recorded over 1.1 billion new malware samples, indicating a
deep CAEs for unsupervised feature extraction, leveraging 10% increase from the previous year 8 . Consequently, retrain-
their ability to extract robust feature representations. The ing non-incremental learning models from scratch with the
extracted features are then input into a cost-sensitive Pareto influx of new ransomware samples becomes an uphill battle
Ensemble of base classifiers, including cost-sensitive Support that demands time and resources.
Vector Machine (SVM), weighted Logistic Regression (LR),
and cost-sensitive Random Forest (RF). This hybrid approach b: Incremental Learning approaches
allows the authors to derive features through unsupervised Incremental learning, an alternative method that gradually
deep learning while benefiting from an ensemble of cost- updates the underlying machine learning model, has suc-
sensitive traditional ML algorithms, maintaining good clas- ceeded in domains like image classification. However, only
sification performance, and addressing class imbalance. a subset of studies have explored its utility in ransomware
Gulmez et al. [114] incorporate Explainable Artificial Intel- detection and classification.
ligence (XAI) with Convolutional Neural Networks (CNNs). Roy and Chen [84] addressed the challenge of evolving
While deep learning has proven to detect ransomware activity ransomware by incorporating incremental learning into their
successfully, it operates as a black box, making it challenging model, using bidirectional LSTM (Bi-LSTM) to update the
for security professionals to comprehend the patterns leading model with new data. Since incremental learning models are
to specific file classifications. XAI addresses this issue by continuously updated, there is potential for degradation to
highlighting the features or characteristics within the data occur on the underlying model as the data evolves. This leads
that significantly influence these classifications. The authors to a mismatch between training and real-world data, known
initially used CNN to extract useful features and conduct as concept drift. To manage this, the authors employed back-
classification in their study. Subsequently, they employ Inter- propagation with gradient descent to calculate and minimise
pretable Model-Agnostic Explanations (LIME) and Shapley the loss function.
Additive Explanations (SHAP) XAI models to provide local Al-rimy et al. [170] introduced a novel incremental iBag-
and global explanations for the detection outcomes. This ging technique to update the dataset to mitigate the lack of
approach yields a high accuracy rate and offers a unique sufficient information in the early phases of a ransomware
understanding of the features contributing most to detection. attack. Features were then selected using the Enhanced Semi-
Although hybrid approaches introduce an exciting new Random Subspace (ESRS) technique, and classification oc-
paradigm in the ransomware detection domain, further re- curred through an ensemble of algorithms. However, this
search is required to assess their real-world implications, par- approach is susceptible to drift due to the absence of a
ticularly regarding the trade-offs between model complexity, mechanism for minimising loss over time.
resource consumption, and classification performance. While these studies present positive results with incremental
learning approaches, several challenges emerge when using
4) Approaches to update the ML model this technique, including: (1) resource constraints during
This section delineates the methodologies employed in the model training; (2) the risk of forgetting previously learned
literature to update machine-learning models within the ex- ransomware samples when introducing new models, and; (3)
amined studies. The literature predominantly employs two susceptibility to concept drift. Overall, incremental learning
methods: non-incremental approaches, which involve updat- approaches offer a means to frequently update the underlying
ing the machine-learning model from scratch, and incremen- model without retraining it from scratch.
tal approaches, where the machine-learning model undergoes
gradual updates over time. C. RANSOMWARE DETECTION SYSTEM DESIGN
This section explores the design considerations and archi-
a: Non-incremental learning approaches
tectures of ransomware detection systems proposed in the
The predominant approach for training and testing machine- literature.
learning models in the literature is non-incremental. These
studies typically involve executing ransomware samples on 8 https://www.av-test.org/en/statistics/malware
VOLUME 4, 2016 19
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
TABLE 6: The strengths and limitations of detection approaches and ML models used in ransomware detection systems
Strengths Limitations
- High accuracy due to the ability of training
- The file must be located first before it can be
on the full dataset before classification occurs
sent to the virtual machine for detonation. This
- Larger range of data to train ML models
may not be practical in real-world scenarios
Delayed detection due to availability of more data
Detection approach
patterns
- Interpretability is easier than Deep learning
- Susceptible to overfitting
approaches (easier to understand how the
model arrived at a decision)
- High performance
- Deep learning can learn features from raw
- Resource intensive
data eliminating the need for complex
- Slower training rate, which may create a
feature engineering from domain experts
Deep learning vulnerability to zero-day ransomware
- Excels in detecting sequential relationships
- Difficulty in determining how the model
and temporal dependencies in time-series data
arrived at a decision
- Deep learning models are well-suited for
detecting complex patterns such as anomalies
1) Detection approaches
This section examines detection approaches presented in the
literature, encompassing both delayed and real-time detec-
tion methods. Fig. 11 provides a segmented view of the
detection approaches employed in the literature.
a: Delayed detection
Delayed detection, as shown in Fig. 12, involves detonating
the ransomware sample on a secure virtual machine to extract
the features required for classification. Delayed detection is
the most common approach presented throughout the liter-
ature due to its ease of implementation and high detection
accuracy since the classifying algorithm has more features. FIGURE 11: A segmented view of detection approaches used
This approach increases the chance of finding ransomware throughout the literature
while reducing the likelihood of encrypted sensitive data.
However, the caveat of this approach is that suspicious files
must first be intercepted to be detonated in a secured environ-
ment. Traditional ML algorithms can be leveraged, making use Deep Belief Networks (DBN) to classify ransomware
detection less resource-intensive. Several studies have lever- after extracting Cryptographic APIs and I/O request packets.
aged delayed detection approaches [12], [73]. Zuhair and Selamat [73] adopt a similar approach and ex-
Alqahtani and Sheldon [12] transfer suspicious executable tract dynamic features such as API calls, file operations and
files to a Cuckoo Sandbox for further analysis. The authors file settings after detonating the file in a secured sandbox.
20 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
FIGURE 12: Ransomware detection using a delayed ap- FIGURE 13: Ransomware detection approach in real-time.
proach.
the authors argue that this approach enables them to de-
tect ransomware on bare-metal servers. Deep learning has
Ransomware researchers have achieved impressive detection demonstrated strong real-time accuracy in detecting mali-
accuracies using delayed approaches. Still, there are sev- cious sequences, particularly when leveraging algorithms
eral challenges, as can be seen in Table. 6, such as: (1) such as LSTM. For instance, Homayoun et al. [81] capture
the suspicious file must be intercepted and detonated on a sequences of syscalls 10 seconds after launching applications
secured virtual machine. This is becoming more difficult as and classify the events with LSTM, achieving a 97.20%
malware evasion techniques become more advanced, partic- detection accuracy.
ularly since ransomware often has the awareness to detect While real-time detection has shown positive results, sev-
if it is being analysed within a virtualised environment and eral challenges are evident, including: (1) susceptibility to
prevents malicious payloads from being deployed, and; (2) concept drift; (2) the inherent limitation of having less data
the classification does not occur in real-time, which may slow available for classification compared to delayed detection
the user experience. approaches. This makes early detection in real-time chal-
lenging and riskier (especially if detection occurs on bare-
b: Real-time detection metal systems), and; (3) the higher maintenance cost of deep
Some studies have successfully detected ransomware in real- learning approaches due to their resource-intensive training,
time, the process of which can be seen in Fig 13. Real-time often requiring systems with multiple CPUs or GPUs in a
detection, distinct from dynamic analysis (extracting features centralised setup [81]. Overall, several studies have demon-
at run-time for subsequent analysis), involves live feature strated the capability of detecting ransomware in real-time.
extraction and classification from when a file is executed
until encryption is initiated. Some studies use partial real- 2) System Architecture
time approaches by intercepting the file in real-time and This section explores the architectural approaches employed
then using delayed detection methods for classification. For by studies in the literature, encompassing both centralised
instance, Shaukat and Ribeiro [83] scan suspicious files for and federated architectures.
static "red flags" in real-time and transfer them to a virtu-
alised environment for detonation. Keong et al. [99] adopt a a: Centralised architecture
similar method by intercepting executables passing through Most machine learning-based ransomware detection systems
an IPS gateway. These suspicious files are then forwarded are designed with a centralised architecture, as shown in
to a sandbox environment for detonation and subsequent Fig. 14. This means the model-building and classification
classification. functions are centralised. This architecture is typically pre-
Other studies propose complete real-time ransomware de- sented in two configurations in the literature: (1) feature
tection systems whereby the collection of data, feature ex- extraction and classification occur on the same system, and
traction and classification occur in near real-time. Roy and (2) features are extracted on client systems and then sent to
Chen [84] adopt this approach in their study and model run- an external server for classification. Both configurations have
time event sequences from the Windows Logging Service been extensively discussed [24]. Most studies handle feature
(WLS) with BiLSTM-CLF. This approach detects anony- engineering, and classification occurs on the same system. To
mous events caused by ransomware and achieves an im- address security concerns related to detonating ransomware
pressive detection rate of 99.87%. Unlike delayed methods, on bare-metal servers, researchers often execute ransomware
VOLUME 4, 2016 21
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
FIGURE 14: Ransomware detection using a centralised ar- FIGURE 15: Ransomware detection using a federated archi-
chitecture tecture
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
precision and recall, disregarding additional metrics [88], for real-time detection. Nonetheless, deep learning models
[90]. Meanwhile, others only report false positives [174]. are computationally demanding and costly to deploy [175].
Moreover, a limited number of studies provide insight into Some studies have addressed this challenge by deploying
the rationale behind their metric selection. Simply reporting their models in environments with ample CPU cores or GPUs
metrics such as accuracy may not provide a full picture, [176]. Consequently, most deep learning systems proposed
particularly in anomaly detection, where malicious activ- so far are confined to centralised servers capable of handling
ity constitutes a minority class. In these cases, accuracy the substantial hardware resources needed for training and
alone may not accurately reflect the efficacy of ransomware testing.
detection. This consideration extends to metrics like false
positives, where a false negative could be more detrimental 5) Machine learning models not updated
than a false positive in ransomware detection scenarios. The Most studies in the literature operate in batch mode with-
inconsistency in reported results across studies complicates out incorporating a mechanism for updating their machine-
comparisons and diminishes transparency in evaluating ran- learning models. Given the constant evolution of malware,
somware detection methods. it’s essential to regularly update these models with the lat-
est variants to ensure effective threat detection. However,
2) Validation of previous studies the prevalent approach in ransomware detection proposals
Several studies gauge the effectiveness of their methodolo- involves updating the underlying model from scratch. This
gies by comparing metrics reported in other studies. How- time-consuming process leaves a window of opportunity for
ever, this method poses challenges to validating previous new malware variants to evade detection during the update
findings due to variations in machine learning algorithms, period.
features, and datasets among studies. Moreover, the ongoing
evolution of malware and advancements in operating system
B. FUTURE DIRECTIONS
security features cast doubt on the relevance of previous
1) Reporting with intention
findings and their applicability in the current landscape. For
instance, some studies use legacy datasets or employ older As mentioned, many studies fail to select their reporting
versions of Windows for detonating ransomware samples, metrics carefully and instead rely on standard metrics like
which may not reflect the effectiveness of malware on newer Accuracy and F1 score without discussing their advantages
Windows versions with updated security measures. Validat- and limitations. This focus on competing with other studies
ing the results of prior research can provide insights into mal- on metrics like accuracy may not provide a complete un-
ware evolution and the relevance of earlier methodologies. derstanding of the experiment’s constraints. We encourage
future researchers to include a variety of metrics that shed
3) Lack of focus on real-time detection light on the study’s limitations, such as false positives, false
While a substantial body of research on ransomware de- negatives, and the Matthews correlation coefficient (MCC).
tection exists, most studies concentrate on enhancing the Researchers need to deliberate on their chosen metrics. For
detection accuracy of machine learning models without con- instance, metrics like the area under the receiver operat-
sidering their practical application in real-world scenarios. ing characteristic (AUC ROC) can exhibit high variability
As a result, numerous studies utilise "delayed detection" when applied to imbalanced datasets [177]. The Matthews
techniques to identify malicious behaviour. This method Correlation Coefficient (MCC) stands out for its robustness
entails detonating ransomware samples within a secure vir- against imbalanced datasets and is favoured as the metric of
tual environment for a specific duration, extracting features, choice. Unlike other metrics, such as AUC ROC, a high MCC
and subsequently inputting the data into an ML model for consistently corresponds to a high AUC ROC, but the reverse
classification. This approach is frequently adopted due to its is not always true [178]. Therefore, researchers should care-
straightforward implementation and high accuracy, often em- fully consider their needs and justify their metric choices to
ploying traditional machine learning models such as SVM. enhance transparency. Future researchers are also encouraged
However, achieving real-time detection with this approach is to provide comprehensive reports on the performance metrics
challenging because ransomware samples must be fully det- of their experiments, including detection timings. This will
onated before extracting relevant features. In environments help understand the potential trade-offs associated with using
with sensitive data, this poses a risk as files may become a particular technique.
encrypted before classification begins.
2) Synthetic datasets
4) Resource constraints As ransomware continues to evolve, many studies rely on
Previous studies primarily relied on traditional machine outdated samples or datasets, making ransomware research
learning algorithms for training and testing due to their primarily reactionary. Keeping datasets up to date presents
ease of implementation and reliable performance. However, challenges because live samples shared on public reposi-
there is a growing interest in utilising deep learning for ran- tories, like VirusShare, have a limited lifespan due to dis-
somware detection, offering superior accuracy and potential abled C&C servers. Synthetic datasets offer a solution to
VOLUME 4, 2016 23
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
this challenge by enabling ransomware researchers to con- ples with tailored behaviours and developing techniques to
centrate on improving ransomware detection systems rather effectively detect ransomware produced through adversarial
than the laborious task of assembling a collection of active learning.
ransomware to build a dataset. Researchers should aim to
develop synthetic datasets that are regularly updated, provid- VI. CONCLUSION
ing valuable resources for security researchers. Additionally, In this survey, we thoroughly reviewed ransomware detection
using synthetic datasets standardises detection results across studies in the literature. Our pragmatic approach involved a
studies, facilitating baseline comparisons. comprehensive examination of ransomware detection system
designs. Throughout this review, we covered various as-
3) Real-time detection approaches pects, including recent ransomware families observed in the
As observed in the survey, there is a noticeable lack of wild, government responses, available datasets, classification
focus on creating ransomware detection systems suitable based on detection timing, and machine learning techniques
for real-world scenarios, particularly in real-time detection. used, and discussed limitations and future recommendations.
Researchers should prioritise designing ransomware detec- Despite ransomware’s rapid evolution, many proposed de-
tion systems that are applicable in real-world settings. This tection systems lack practicality for real-world ransomware
involves exploring real-time detection methods that don’t attacks. As a result, most systems are reactive and involve
require virtual machines to detonate ransomware before clas- delayed detection. To effectively mitigate the ransomware
sification. Although deep learning approaches show potential threat, researchers must design detection systems capable of
in this area, more work is needed to streamline feature ex- swiftly identifying ransomware early on while addressing the
traction techniques and improve detection speed. Researchers evasion techniques employed by modern malware.
could utilise the insights from this survey to integrate features
from different phases of the attack lifecycle (such as those ACKNOWLEDGMENTS
for identifying ransomware delivery and communication with The work has been supported by the Cyber Security Research
C&C servers) to lessen the dependence on real-time detection Centre Limited whose activities are partially funded by the
at the host level. Finally, future researchers should also Australian Government’s Cooperative Research Centres Pro-
include reporting on the time taken to detect ransomware gramme.
activity, demonstrating the practical utility of their models in
real-time ransomware detection scenarios.
REFERENCES
[1] J. Ispahany and R. Islam, “Detecting malicious covid-19 urls using
4) Explore agent-based, incremental and transfer learning machine learning techniques,” in 2021 IEEE International Conference
approaches on Pervasive Computing and Communications Workshops and other
While transfer learning and incremental learning have suc- Affiliated Events (PerCom Workshops). IEEE, 2021, pp. 718–723.
[2] “Treasury continues to counter ransomware as part of whole-of-
ceeded in domains like healthcare and traffic control [179], government effort; sanctions ransomware operators and virtual currency
their potential in ransomware detection remains largely un- exchange,” Nov 2021. [Online]. Available: https://home.treasury.gov/
explored. Transfer learning could aid in detecting previ- news/press-releases/jy0471
[3] Chainalysis, “The 2023 crypto crime report,” Feb 2023.
ously unseen malware families, reducing the reliance on [Online]. Available: https://go.chainalysis.com/rs/503-FAP-074/images/
extensive training datasets. Similarly, incremental learning Crypto_Crime_Report_2023.pdf
offers promise by allowing algorithms to learn gradually, [4] “Australian broadband data demand: Data demand on the
nbn continues to reflect high network usage,” Aug 2020.
reducing the need for full model retraining and saving time [Online]. Available: https://www.nbnco.com.au/corporate-information/
and resources. Additionally, the application of multi-agent media-centre/media-statements/data-demand-continues-to-reflect
systems in ransomware detection has yet to be thoroughly [5] “Gartner cfo survey: 74% to shift some employees to
remote work permanently,” Apr 2020. [Online]. Available:
investigated. While multi-agent systems have been studied https://www.gartner.com/en/newsroom/press-releases/2020-04-
in intrusion detection [180], exploring the use of intelligent 03-gartner-cfo-survey-reveals-74-percent-of-orgs-to-shift-some-
agents collaborating within the ransomware domain presents employees-to-remote-work-permanently
[6] F. Yarochkin, “Ransomware as a service: Enabler
an intriguing avenue for further research. of widespread attacks,” Oct 2021. [Online]. Available:
https://www.trendmicro.com/vinfo/us/security/news/cybercrime-and-
5) Adversarial learning digital-threats/ransomware-as-a-service-enabler-of-widespread-attacks
[7] J. Beerman, D. Berent, Z. Falter, and S. Bhunia, “A review of colonial
While some studies have addressed adversarial learning and pipeline ransomware attack,” in 2023 IEEE/ACM 23rd International
its potential threats [34], [181], this area remains relatively Symposium on Cluster, Cloud and Internet Computing Workshops (CC-
unexplored within the ransomware detection domain. As GridW). IEEE, 2023, pp. 8–15.
ransomware detection methods continue to advance, it’s ex- [8] J. Dossett, “A timeline of the biggest ransomware attacks,” Nov 2021.
[Online]. Available: https://www.cnet.com/personal-finance/crypto/a-
pected that future ransomware will exploit adversarial learn- timeline-of-the-biggest-ransomware-attacks/
ing techniques to evade detection by mimicking benign be- [9] M. Burgess, “Conti’s attack against costa rica sparks a new ransomware
haviours learned from machine learning models. Therefore, it era,” Jun 2022. [Online]. Available: https://www.wired.com/story/costa-
rica-ransomware-conti/
is recommended that research efforts in adversarial learning [10] R. Falk and A.-L. Brown, “Underwritten or over-
be intensified, focusing on its application in generating sam- sold? - cyber security crc,” Oct 2021. [Online].
24 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
Available: https://cybersecuritycrc.org.au/sites/default/files/2021-10/ [32] D. Ucci, L. Aniello, and R. Baldoni, “Survey of machine learning
Underwritten%20or%20oversold%20%20-%20DV.pdf techniques for malware analysis,” Computers & Security, vol. 81, pp.
[11] F. Aldauiji, O. Batarfi, and M. Bayousef, “Utilizing cyber threat hunting 123–147, 2019.
techniques to find ransomware attacks: A survey of the state of the art,” [33] N. Hampton, Z. Baig, and S. Zeadally, “Ransomware behavioural analy-
IEEE Access, vol. 10, pp. 61 695–61 706, 2022. sis on windows platforms,” Journal of information security and applica-
[12] A. Alqahtani and F. T. Sheldon, “A survey of crypto ransomware attack tions, vol. 40, pp. 44–51, 2018.
detection methodologies: An evolving outlook,” Sensors, vol. 22, no. 5, [34] L. Caviglione, M. Choraś, I. Corona, A. Janicki, W. Mazurczyk,
p. 1837, 2022. M. Pawlicki, and K. Wasielewska, “Tight arms race: overview of current
[13] A. Alraizza and A. Algarni, “Ransomware detection using machine malware threats and trends in their detection,” IEEE Access, 2020.
learning: A survey,” Big Data and Cognitive Computing, vol. 7, no. 3, [35] C. The Federal Bureau of Investigation (FBI), I. S. A. (CISA),
p. 143, 2023. the Multi-State Information Sharing, and A. C. (MS-ISAC),
[14] B. A. S. Al-rimy, M. A. Maarof, and S. Z. M. Shaid, “Ransomware threat “Stopransomware: Rhysida ransomware,” Nov 2023. [Online].
success factors, taxonomy, and countermeasures: A survey and research Available: https://www.cisa.gov/sites/default/files/2023-11/aa23-319a-
directions,” Computers & Security, vol. 74, pp. 144–166, 2018. stopransomware-rhysida-ransomware_1.pdf
[15] I. Bello, H. Chiroma, U. A. Abdullahi, A. Y. Gital, F. Jauro, A. Khan, [36] T. F. B. of Investigation (FBI), the Cybersecurity, and I. S. A.
J. O. Okesola, and S. M. Abdulhamid, “Detecting ransomware attacks (CISA), “Stopransomware: Alphv blackcat,” Mar 2022. [Online].
using intelligent algorithms: Recent development and next direction from Available: https://www.cisa.gov/sites/default/files/2023-12/aa23-353a-
deep learning and big data perspectives,” Journal of Ambient Intelligence stopransomware-alphv-blackcat_0.pdf
and Humanized Computing, vol. 12, pp. 8699–8717, 2021. [37] A. C. S. C. (ASCS), “Understanding ransomware threat actors:
[16] E. Berrueta, D. Morato, E. Magaña, and M. Izal, “A survey on detection Lockbit,” Jun 2023. [Online]. Available: https://www.cyber.gov.au/
techniques for cryptographic ransomware,” IEEE Access, vol. 7, pp. about-us/advisories/understanding-ransomware-threat-actors-lockbit
144 925–144 944, 2019. [38] S. Gatlan, “Accenture confirms data breach af-
[17] N. M. Chayal, A. Saxena, and R. Khan, “A review on spreading and ter august ransomware attack,” Oct 2021. [Online].
forensics analysis of windows-based ransomware,” Annals of Data Sci- Available: https://www.bleepingcomputer.com/news/security/accenture-
ence, pp. 1–22, 2022. confirms-data-breach-after-august-ransomware-attack/
[18] D. W. Fernando, N. Komninos, and T. Chen, “A study on the evolution [39] L. Abrams, “Lockbit 3.0 introduces the first ran-
of ransomware detection using machine learning and deep learning somware bug bounty program,” Jun 2022. [Online].
techniques,” IoT, vol. 1, no. 2, pp. 551–604, 2020. Available: https://www.bleepingcomputer.com/news/security/lockbit-
[19] J. A. Gómez Hernández, P. García Teodoro, R. Magán Carrión, and 30-introduces-the-first-ransomware-bug-bounty-program/
R. Rodríguez Gómez, “Crypto-ransomware: A revision of the state of [40] F. B. of Investigation (FBI) and U. S. S. (USSS), “Indicators
the art, advances and challenges,” Electronics, vol. 12, no. 21, p. 4494, of compromise associated with blackbyte ransomware,” Feb 2022.
2023. [Online]. Available: https://www.ic3.gov/Media/News/2022/220211.pdf
[20] A. Kapoor, A. Gupta, R. Gupta, S. Tanwar, G. Sharma, and I. E. [41] F. B. of Investigation (FBI), Cybersecurity, and I. S. A. (CISA),
Davidson, “Ransomware detection, avoidance, and mitigation scheme: “Indicators of compromise associated with avoslocker ransomware,”
a review and future directions,” Sustainability, vol. 14, no. 1, p. 8, 2021. Mar 2022. [Online]. Available: https://www.ic3.gov/Media/News/2022/
[21] A. M. Maigida, S. M. Abdulhamid, M. Olalere, J. K. Alhassan, H. Chi- 220318.pdf
roma, and E. G. Dada, “Systematic literature review and metadata [42] C. The Federal Bureau of Investigation (FBI), I. S. A. (CISA),
analysis of ransomware attacks and detection mechanisms,” Journal of D. of Health, and H. S. (HHS), “Stopransomware: Hive ransomware,”
Reliable Intelligent Environments, vol. 5, no. 2, pp. 67–89, 2019. Nov 2022. [Online]. Available: https://www.cisa.gov/news-events/
[22] T. McIntosh, A. Kayes, Y.-P. P. Chen, A. Ng, and P. Watters, “Ran- cybersecurity-advisories/aa22-321a
somware mitigation in the modern era: A comprehensive review, research [43] S. Gatlan, “Costa rica’s public health agency hit
challenges, and future directions,” ACM Computing Surveys (CSUR), by hive ransomware,” Jun 2022. [Online]. Avail-
vol. 54, no. 9, pp. 1–36, 2021. able: https://www.bleepingcomputer.com/news/security/costa-rica-s-
[23] R. Moussaileb, N. Cuppens, J.-L. Lanet, and H. L. Bouder, “A survey on public-health-agency-hit-by-hive-ransomware/
windows-based ransomware taxonomy and detection mechanisms,” ACM [44] I. Ilascu, “Hive ransomware attacks memorial health
Computing Surveys (CSUR), vol. 54, no. 6, pp. 1–36, 2021. system, steals patient data,” Aug 2021. [On-
[24] O. Or-Meir, N. Nissim, Y. Elovici, and L. Rokach, “Dynamic malware line]. Available: https://www.bleepingcomputer.com/news/security/
analysis in the modern era—a state of the art survey,” ACM Computing hive-ransomware-attacks-memorial-health-system-steals-patient-data/
Surveys (CSUR), vol. 52, no. 5, pp. 1–48, 2019. [45] Cybersecurity, I. S. A. (CISA), and F. B. of Investigation (FBI),
[25] H. Oz, A. Aris, A. Levi, and A. S. Uluagac, “A survey on ransomware: “Darkside ransomware: Best practices for preventing business disruption
Evolution, taxonomy, and defense solutions,” ACM Computing Surveys from ransomware attacks,” Jul 2021. [Online]. Available: https:
(CSUR), vol. 54, no. 11s, pp. 1–37, 2022. //www.cisa.gov/news-events/cybersecurity-advisories/aa21-131a
[26] N. Rani, S. V. Dhavale, A. Singh, and A. Mehra, “A survey on machine [46] S. B. Shimol, “Return of the darkside: Analysis of a large-
learning-based ransomware detection,” in Proceedings of the Seventh scale data theft campaign,” Apr 2022. [Online]. Available: https:
International Conference on Mathematics and Computing: ICMC 2021. //www.varonis.com/blog/darkside-ransomware
Springer, 2022, pp. 171–186. [47] M. Schwirtz and N. Perlroth, “Darkside, blamed for gas pipeline
[27] S. Razaulla, C. Fachkha, C. Markarian, A. Gawanmeh, W. Mansoor, B. C. attack, says it is shutting down,” May 2021. [Online]. Available: https:
Fung, and C. Assi, “The age of ransomware: A survey on the evolution, //www.nytimes.com/2021/05/14/business/darkside-pipeline-hack.html
taxonomy, and research directions,” IEEE Access, 2023. [48] Y. K. B. M. Yunus and S. B. Ngah, “Ransomware: stages, detection and
[28] D. Smith, S. Khorsandroo, and K. Roy, “Machine learning algorithms evasion,” in 2021 International Conference on Software Engineering &
and frameworks in ransomware detection,” IEEE Access, vol. 10, pp. Computer Systems and 4th International Conference on Computational
117 597–117 610, 2022. Science and Information Management (ICSECS-ICOCSIM). IEEE,
[29] J. Singh and J. Singh, “A survey on machine learning-based malware 2021, pp. 227–231.
detection in executable files,” Journal of Systems Architecture, vol. 112, [49] S. Gatlan, “Ukraine arrests clop ransomware gang
p. 101861, 2021. members, seizes servers,” Oct 2021. [Online].
[30] V. Thangapandian, “Machine learning in automated detection of ran- Available: https://www.bleepingcomputer.com/news/security/ukraine-
somware: Scope, benefits and challenges,” in Illumination of Artificial arrests-clop-ransomware-gang-members-seizes-servers
Intelligence in Cybersecurity and Forensics. Springer, 2022, pp. 345– [50] N. I. of Standards and T. (NIST), “Cve-2023-0669 detail,” Jun 2023.
372. [Online]. Available: https://nvd.nist.gov/vuln/detail/CVE-2023-0669
[31] U. Urooj, B. A. S. Al-rimy, A. Zainal, F. A. Ghaleb, and M. A. Rassam, [51] Cybersecurity, I. S. A. (CISA), and F. B. of Investigation (FBI),
“Ransomware detection using the dynamic analysis and machine learn- “Stopransomware: Cl0p ransomware gang exploits cve-2023-
ing: A survey and research directions,” Applied Sciences, vol. 12, no. 1, 34362 moveit vulnerability,” Jun 2023. [Online]. Available:
p. 172, 2021. https://www.cisa.gov/news-events/cybersecurity-advisories/aa23-158a
VOLUME 4, 2016 25
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
[52] A. Mundo, “Clop ransomware,” Jan 2020. [Online]. Available: https: [76] L. Fernandez Maimo, A. Huertas Celdran, A. L. Perales Gomez, F. J.
//www.mcafee.com/blogs/other-blogs/mcafee-labs/clop-ransomware/ Garcia Clemente, J. Weimer, and I. Lee, “Intelligent and dynamic ran-
[53] A. C. S. C. (ASCS), “Annual cyber threat report, july 2021 to june 2022,” somware spread detection and mitigation in integrated clinical environ-
Nov 2022. [Online]. Available: https://www.cyber.gov.au/sites/default/ ments,” Sensors, vol. 19, no. 5, p. 1114, 2019.
files/2023-06/Understanding-Ransomware-Threat-Actors_LockBit.pdf [77] J. Singh, K. Sharma, M. Wazid, and A. K. Das, “Sinn-rd: Spline
[54] B. Toulas, “Revil ransomware member extradited to u.s. interpolation-envisioned neural network-based ransomware detection
to stand trial for kaseya attack,” Mar 2022. [Online]. scheme,” Computers and Electrical Engineering, vol. 106, p. 108601,
Available: https://www.bleepingcomputer.com/news/security/revil- 2023.
ransomware-member-extradited-to-us-to-stand-trial-for-kaseya-attack/ [78] A. O. Almashhadani, D. Carlin, M. Kaiiali, and S. Sezer, “Mfmcns: A
[55] A. C. S. C. (ASCS), “Kaseya vsa supply-chain ransomware attack,” multi-feature and multi-classifier network-based system for ransomworm
Jul 2021. [Online]. Available: https://www.cyber.gov.au/about-us/alerts/ detection,” Computers & Security, vol. 121, p. 102860, 2022.
kaseya-vsa-supply-chain-ransomware-attack [79] X. Zhang, J. Wang, and S. Zhu, “Dual generative adversarial networks
[56] E. Millington, “Revil,” Aug 2020. [Online]. Available: https: based unknown encryption ransomware attack detection,” IEEE Access,
//attack.mitre.org/software/S0496/ vol. 10, pp. 900–913, 2021.
[57] “Ransomware spotlight: Conti,” Dec 2021. [Online]. Avail- [80] A. Buriro, A. B. Buriro, T. Ahmad, S. Buriro, and S. Ullah, “Malwd&c:
able: https://www.trendmicro.com/vinfo/us/security/news/ransomware- A quick and accurate machine learning-based approach for malware
spotlight/ransomware-spotlight-conti detection and categorization,” Applied Sciences, vol. 13, no. 4, p. 2508,
[58] Cybersecurity and I. S. A. (CISA), “Conti ransomware,” Mar 2022. 2023.
[Online]. Available: https://www.cisa.gov/news-events/alerts/2021/09/ [81] S. Homayoun, A. Dehghantanha, M. Ahmadzadeh, S. Hashemi,
22/conti-ransomware R. Khayami, K.-K. R. Choo, and D. E. Newton, “Drthis: Deep ran-
[59] S. Barnum, “Standardizing cyber threat intelligence information with somware threat hunting and intelligence system at the fog layer,” Future
the structured threat information expression (stix),” Mitre Corporation, Generation Computer Systems, vol. 90, pp. 94–104, 2019.
vol. 11, pp. 1–22, 2012. [82] M. E. Ahmed, H. Kim, S. Camtepe, and S. Nepal, “Peeler: Profiling
[60] T. Dargahi, A. Dehghantanha, P. N. Bahrami, M. Conti, G. Bianchi, and kernel-level events to detect ransomware,” in European Symposium on
L. Benedetto, “A cyber-kill-chain based taxonomy of crypto-ransomware Research in Computer Security. Springer, 2021, pp. 240–260.
features,” Journal of Computer Virology and Hacking Techniques, vol. 15, [83] S. K. Shaukat and V. J. Ribeiro, “Ransomwall: A layered defense system
no. 4, pp. 277–305, 2019. against cryptographic ransomware attacks using machine learning,” in
[61] L. Abrams, “Lockbit ransomware self-spreads to 2018 10th International Conference on Communication Systems & Net-
quickly encrypt 225 systems,” May 2020. [Online]. works (COMSNETS). IEEE, 2018, pp. 356–363.
Available: https://www.bleepingcomputer.com/news/security/lockbit- [84] K. C. Roy and Q. Chen, “Deepran: Attention-based bilstm and crf
ransomware-self-spreads-to-quickly-encrypt-225-systems/ for ransomware early detection and classification,” Information Systems
[62] R. Islam, R. Tian, L. M. Batten, and S. Versteeg, “Classification of Frontiers, vol. 23, no. 2, pp. 299–315, 2021.
malware based on integrated static and dynamic features,” Journal of
[85] A. O. Almashhadani, M. Kaiiali, D. Carlin, and S. Sezer, “Maldomdetec-
Network and Computer Applications, vol. 36, no. 2, pp. 646–656, 2013.
tor: A system for detecting algorithmically generated domain names with
[63] D. C. D’Elia, L. Invidia, and L. Querzoni, “Rope: Covert multi-process machine learning,” Computers & Security, vol. 93, p. 101787, 2020.
malware execution with return-oriented programming,” in European
[86] Q. A. Al-Haija and A. A. Alsulami, “High performance classification
Symposium on Research in Computer Security. Springer, 2021, pp. 197–
model to identify ransomware payments for heterogeneous bitcoin net-
217.
works,” Electronics, vol. 10, no. 17, p. 2113, 2021.
[64] S. S. Chakkaravarthy, D. Sangeetha, and V. Vaidehi, “A survey on
[87] S. A. Alsaif et al., “Machine learning-based ransomware classification
malware analysis and mitigation techniques,” Computer Science Review,
of bitcoin transactions,” Applied Computational Intelligence and Soft
vol. 32, pp. 1–23, 2019.
Computing, vol. 2023, 2023.
[65] A. Afianian, S. Niksefat, B. Sadeghiyan, and D. Baptiste, “Malware dy-
[88] C.-M. Hsu, C.-C. Yang, H.-H. Cheng, P. E. Setiasabda, and J.-S. Leu,
namic analysis evasion techniques: A survey,” ACM Computing Surveys
“Enhancing file entropy analysis to improve machine learning detection
(CSUR), vol. 52, no. 6, pp. 1–28, 2019.
rate of ransomware,” IEEE Access, vol. 9, pp. 138 345–138 351, 2021.
[66] S. Kumar et al., “An emerging threat fileless malware: a survey and
research challenges,” Cybersecurity, vol. 3, no. 1, pp. 1–12, 2020. [89] S. Poudyal and D. Dasgupta, “Analysis of crypto-ransomware using ml-
[67] “Indicators of compromise associated with avoslocker ransomware,” based multi-level profiling,” IEEE Access, vol. 9, pp. 122 532–122 547,
Mar 2022. [Online]. Available: https://www.ic3.gov/Media/News/2022/ 2021.
220318.pdf [90] S. Aurangzeb, R. N. B. Rais, M. Aleem, M. A. Islam, and M. A. Iqbal,
[68] F. Tang, B. Ma, J. Li, F. Zhang, J. Su, and J. Ma, “Ransomspector: An “On the classification of microsoft-windows ransomware using hardware
introspection-based approach to detect crypto ransomware,” Computers profile,” PeerJ Computer Science, vol. 7, p. e361, 2021.
& Security, vol. 97, p. 101997, 2020. [91] R. M. A. Molina, S. Torabi, K. Sarieddine, E. Bou-Harb, N. Bouguila,
[69] P. Bajpai and R. Enbody, “Attacking key management in ransomware,” and C. Assi, “On ransomware family attribution using pre-attack paranoia
IT Professional, vol. 22, no. 2, pp. 21–27, 2020. activities,” IEEE Transactions on Network and Service Management,
[70] M. Keshavarzi and H. R. Ghaffary, “I2ce3: A dedicated and separated 2021.
attack chain for ransomware offenses as the most infamous cyber extor- [92] Y. A. Ahmed, B. Koçer, S. Huda, B. A. S. Al-rimy, and M. M. Hassan, “A
tion,” Computer Science Review, vol. 36, p. 100233, 2020. system call refinement-based enhanced minimum redundancy maximum
[71] “Ransomware spotlight: Lockbit,” Feb 2022. [Online]. Avail- relevance method for ransomware early detection,” Journal of Network
able: https://www.trendmicro.com/vinfo/us/security/news/ransomware- and Computer Applications, vol. 167, p. 102753, 2020.
spotlight/ransomware-spotlight-lockbit [93] S. Sharmeen, Y. A. Ahmed, S. Huda, B. Ş. Koçer, and M. M. Hassan,
[72] J. Singh and J. Singh, “A survey on machine learning-based malware “Avoiding future digital extortion through robust protection against ran-
detection in executable files,” Journal of Systems Architecture, p. 101861, somware threats using deep learning based adaptive approaches,” IEEE
2020. Access, vol. 8, pp. 24 522–24 534, 2020.
[73] H. Zuhair and A. Selamat, “Rands: A machine learning-based anti- [94] F. Khan, C. Ncube, L. K. Ramasamy, S. Kadry, and Y. Nam, “A digital
ransomware tool for windows platforms,” in Advancing Technology In- dna sequencing engine for ransomware detection using machine learn-
dustrialization Through Intelligent Software Methodologies, Tools and ing,” IEEE Access, vol. 8, pp. 119 710–119 719, 2020.
Techniques. IOS Press, 2019, pp. 573–587. [95] S. Kok, A. Abdullah, and N. Jhanjhi, “Early detection of crypto-
[74] H. Liu and P. Patras, “Netsentry: A deep learning approach to detecting ransomware using pre-encryption detection algorithm,” Journal of King
incipient large-scale network attacks,” Computer Communications, vol. Saud University-Computer and Information Sciences, 2020.
191, pp. 119–132, 2022. [96] B. M. Khammas, “Ransomware detection using random forest tech-
[75] E. Berrueta, D. Morato, E. Magaña, and M. Izal, “Crypto-ransomware nique,” ICT Express, vol. 6, no. 4, pp. 325–331, 2020.
detection using machine learning models in file-sharing network scenar- [97] S. Kok, A. Azween, and N. Jhanjhi, “Evaluation metric for crypto-
ios with encrypted traffic,” Expert Systems with Applications, vol. 209, p. ransomware detection using machine learning,” Journal of Information
118299, 2022. Security and Applications, vol. 55, p. 102646, 2020.
26 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
[98] S. I. Bae, G. B. Lee, and E. G. Im, “Ransomware detection using [120] A. Alqahtani and F. T. Sheldon, “Temporal data correlation providing
machine learning algorithms,” Concurrency and Computation: Practice enhanced dynamic crypto-ransomware pre-encryption boundary delin-
and Experience, vol. 32, no. 18, p. e5422, 2020. eation,” Sensors, vol. 23, no. 9, p. 4355, 2023.
[99] C. Keong Ng, S. Rajasegarar, L. Pan, F. Jiang, and L. Y. Zhang, [121] A. Singh, Z. Mushtaq, H. A. Abosaq, S. N. F. Mursal, M. Irfan, and
“Voterchoice: A ransomware detection honeypot with multiple voting G. Nowakowski, “Enhancing ransomware attack detection using transfer
framework,” Concurrency and Computation: Practice and Experience, learning and deep learning ensemble models on cloud-encrypted data,”
vol. 32, no. 14, p. e5726, 2020. Electronics, vol. 12, no. 18, p. 3899, 2023.
[100] A. O. Almashhadani, M. Kaiiali, S. Sezer, and P. O’Kane, “A multi- [122] M. A. Ayub, A. Siraj, B. Filar, and M. Gupta, “Rwarmor: a static-
classifier network-based crypto ransomware detection system: A case informed dynamic analysis approach for early detection of cryptographic
study of locky ransomware,” Ieee Access, vol. 7, pp. 47 053–47 067, windows ransomware,” International Journal of Information Security,
2019. pp. 1–24, 2023.
[101] K. Lee, S.-Y. Lee, and K. Yim, “Machine learning based file entropy [123] C. BN and B. SH, “Revolutionizing ransomware detection and criticality
analysis for ransomware detection in backup systems,” IEEE Access, assessment: Multiclass hybrid machine learning and semantic similarity-
vol. 7, pp. 110 205–110 215, 2019. based end2end solution,” Multimedia Tools and Applications, pp. 1–34,
[102] S. Kok, A. Abdullah, N. Jhanjhi, and M. Supramaniam, “Prevention of 2023.
crypto-ransomware using a pre-encryption detection algorithm,” Com- [124] G. O. Ganfure, C.-F. Wu, Y.-H. Chang, and W.-K. Shih, “Rtrap: Trapping
puters, vol. 8, no. 4, p. 79, 2019. and containing ransomware with machine learning,” IEEE Transactions
[103] A. Cohen and N. Nissim, “Trusted detection of ransomware in a private on Information Forensics and Security, vol. 18, pp. 1433–1448, 2023.
cloud using machine learning methods leveraging meta-features from [125] G. Ciaramella, G. Iadarola, F. Martinelli, F. Mercaldo, and A. Santone,
volatile memory,” Expert Systems with Applications, vol. 102, pp. 158– “Explainable ransomware detection with deep learning techniques,” Jour-
178, 2018. nal of Computer Virology and Hacking Techniques, pp. 1–14, 2023.
[104] N. Nissim, Y. Lapidot, A. Cohen, and Y. Elovici, “Trusted system- [126] G. O. Ganfure, C.-F. Wu, Y.-H. Chang, and W.-K. Shih, “Deepware:
calls analysis methodology aimed at detection of compromised virtual Imaging performance counters with deep learning to detect ransomware,”
machines using sequential mining,” Knowledge-Based Systems, vol. 153, IEEE Transactions on Computers, vol. 72, no. 3, pp. 600–613, 2022.
pp. 147–175, 2018. [127] M. Wazid, A. K. Das, and S. Shetty, “Bsfr-sh: Blockchain-enabled
[105] M. S. Abbasi, H. Al-Sahaf, M. Mansoori, and I. Welch, “Behavior-based security framework against ransomware attacks for smart healthcare,”
ransomware classification: A particle swarm optimization wrapper-based IEEE Transactions on Consumer Electronics, vol. 69, no. 1, pp. 18–28,
approach for feature selection,” Applied Soft Computing, vol. 121, p. 2022.
108744, 2022. [128] Prachi and S. Kumar, “An effective ransomware detection approach in a
[106] M. Masum, M. J. H. Faruk, H. Shahriar, K. Qian, D. Lo, and M. I. cloud environment using volatile memory features,” Journal of Computer
Adnan, “Ransomware classification and detection with machine learning Virology and Hacking Techniques, vol. 18, no. 4, pp. 407–424, 2022.
algorithms,” in 2022 IEEE 12th Annual Computing and Communication [129] U. Zahoora, A. Khan, M. Rajarajan, S. H. Khan, M. Asam, and T. Jamal,
Workshop and Conference (CCWC). IEEE, 2022, pp. 0316–0322. “Ransomware detection using deep learning based unsupervised feature
[107] R. Chaganti, V. Ravi, and T. D. Pham, “A multi-view feature fusion ap- extraction and a cost sensitive pareto ensemble classifier,” Scientific
proach for effective malware classification using deep learning,” Journal Reports, vol. 12, no. 1, p. 15647, 2022.
of Information Security and Applications, vol. 72, p. 103402, 2023. [130] S. Aurangzeb, H. Anwar, M. A. Naeem, and M. Aleem, “Bigrc-eml: big-
data based ransomware classification using ensemble machine learning,”
[108] S. Li, Y. Li, X. Wu, S. Al Otaibi, and Z. Tian, “Imbalanced malware
Cluster Computing, vol. 25, no. 5, pp. 3405–3422, 2022.
family classification using multimodal fusion and weight self-learning,”
IEEE Transactions on Intelligent Transportation Systems, 2022. [131] U. Zahoora, M. Rajarajan, Z. Pan, and A. Khan, “Zero-day ransomware
attack detection using deep contractive autoencoder and voting based
[109] I. Ba’abbad and O. Batarfi, “Proactive ransomware detection using ex-
ensemble classifier,” Applied Intelligence, vol. 52, no. 12, pp. 13 941–
tremely fast decision tree (efdt) algorithm: A case study,” Computers,
13 960, 2022.
vol. 12, no. 6, p. 121, 2023.
[132] G. Y. Kim, J.-Y. Paik, Y. Kim, and E.-S. Cho, “Byte frequency based
[110] C. Woralert, C. Liu, and Z. Blasingame, “Hard-lite: A lightweight
indicators for crypto-ransomware detection from empirical analysis,”
hardware anomaly realtime detection framework targeting ransomware,”
Journal of Computer Science and Technology, vol. 37, no. 2, pp. 423–
IEEE Transactions on Circuits and Systems I: Regular Papers, 2023.
442, 2022.
[111] X. Deng, M. Cen, M. Jiang, and M. Lu, “Ransomware early detection us- [133] J. Du, S. H. Raza, M. Ahmad, I. Alam, S. H. Dar, and M. A. Habib, “Dig-
ing deep reinforcement learning on portable executable header,” Cluster ital forensics as advanced ransomware pre-attack detection algorithm for
Computing, pp. 1–15, 2023. endpoint data protection,” Security and Communication Networks, vol.
[112] A. Continella, A. Guagnelli, G. Zingaro, G. De Pasquale, A. Barenghi, 2022, pp. 1–16, 2022.
S. Zanero, and F. Maggi, “Shieldfs: a self-healing, ransomware-aware [134] M. Rhode, P. Burnap, and A. Wedgbury, “Real-time malware process
filesystem,” in Proceedings of the 32nd Annual Conference on Computer detection and automated process killing,” Security and Communication
Security Applications, 2016, pp. 336–347. Networks, vol. 2021, pp. 1–23, 2021.
[113] K. Thummapudi, P. Lama, and R. V. Boppana, “Detection of ransomware [135] W. Wong, F. H. Juwono, and C. Apriono, “Vision-based malware de-
attacks using processor and disk usage data,” IEEE Access, 2023. tection: A transfer learning approach using optimal ecoc-svm configura-
[114] S. Gulmez, A. G. Kakisim, and I. Sogukpinar, “Xran: Explainable deep tion,” IEEE Access, vol. 9, pp. 159 262–159 270, 2021.
learning-based ransomware detection using dynamic analysis,” Comput- [136] M. N. Olaimat, M. A. Maarof, and B. A. S. Al-rimy, “Ransomware anti-
ers & Security, p. 103703, 2024. analysis and evasion techniques: A survey and research directions,” in
[115] D. W. Fernando and N. Komninos, “Fesad ransomware detection frame- 2021 3rd International Cyber Resilience Conference (CRC). IEEE,
work with machine learning using adaption to concept drift,” Computers 2021, pp. 1–6.
& Security, vol. 137, p. 103629, 2024. [137] P. Yosifovich, D. A. Solomon, and A. Ionescu, Windows Internals, Part 1:
[116] E. B. Karbab, M. Debbabi, and A. Derhab, “Swiftr: Cross-platform System architecture, processes, threads, memory management, and more.
ransomware fingerprinting using hierarchical neural networks on hybrid Microsoft Press, 2017.
features,” Expert Systems with Applications, vol. 225, p. 120017, 2023. [138] F. De Gaspari, D. Hitaj, G. Pagnotta, L. De Carli, and L. V. Mancini, “The
[117] P. M. Anand, P. S. Charan, and S. K. Shukla, “Hiper-early detection naked sun: Malicious cooperation between benign-looking processes,” in
of a ransomware attack using hardware performance counters,” Digital International Conference on Applied Cryptography and Network Secu-
Threats: Research and Practice, vol. 4, no. 3, pp. 1–24, 2023. rity. Springer, 2020, pp. 254–274.
[118] M. Gazzan and F. T. Sheldon, “An enhanced minimax loss function [139] S. I. Project, “Stratosphere laboratory datasets,” 2015, retrieved August
technique in generative adversarial network for ransomware behavior 30, 2022, from https://www.stratosphereips.org/datasets-overview.
prediction,” Future Internet, vol. 15, no. 10, p. 318, 2023. [140] N. Moustafa and J. Slay, “Unsw-nb15: a comprehensive data set for
[119] Z. Li, A. L. G. Rios, and L. Trajković, “Machine learning for detecting network intrusion detection systems (unsw-nb15 network data set),” in
the westrock ransomware attack using bgp routing records,” IEEE Com- 2015 Military Communications and Information Systems Conference
munications Magazine, vol. 61, no. 3, pp. 20–26, 2022. (MilCIS), 2015, pp. 1–6.
VOLUME 4, 2016 27
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
[141] N. Moustafa, “The unsw-nb15 dataset,” 2015. [Online]. Available: [167] B. Zhang, W. Xiao, X. Xiao, A. K. Sangaiah, W. Zhang, and J. Zhang,
https://research.unsw.edu.au/projects/unsw-nb15-dataset “Ransomware classification using patch-based cnn and self-attention net-
[142] R. Ronen, M. Radu, C. Feuerstein, E. Yom-Tov, and M. Ah- work on embedded n-grams of opcodes,” Future Generation Computer
madi, “Microsoft malware classification challenge,” arXiv preprint Systems, vol. 110, pp. 708–720, 2020.
arXiv:1802.10135, 2018. [168] A. N. Jahromi, S. Hashemi, A. Dehghantanha, K.-K. R. Choo, H. Karim-
[143] Microsoft, “Microsoft malware classification challenge (big 2015),” ipour, D. E. Newton, and R. M. Parizi, “An improved two-hidden-layer
Feb 2015. [Online]. Available: https://www.kaggle.com/c/malware- extreme learning machine for malware hunting,” Computers & Security,
classification vol. 89, p. 101655, 2020.
[144] D. Sgandurra, L. Muñoz-González, R. Mohsen, and E. C. Lupu, “Auto- [169] O. Sagi and L. Rokach, “Ensemble learning: A survey,” Wiley Interdisci-
mated dynamic analysis of ransomware: Benefits, limitations and use for plinary Reviews: Data Mining and Knowledge Discovery, vol. 8, no. 4,
detection,” arXiv preprint arXiv:1609.03020, 2016. p. e1249, 2018.
[145] I. C. London, “Resilient information systems security,” 2016. [Online]. [170] B. A. S. Al-rimy, M. A. Maarof, and S. Z. M. Shaid, “Crypto-ransomware
Available: https://rissgroup.org/ransomware-dataset/ early detection model using novel incremental bagging with enhanced
[146] G. Severi, T. Leek, and B. Dolan-Gavitt, “Malrec: compact full-trace semi-random subspace selection,” Future Generation Computer Systems,
malware recording for retrospective deep analysis,” in International vol. 101, pp. 476–491, 2019.
Conference on Detection of Intrusions and Malware, and Vulnerability [171] A. A. Darem, F. A. Ghaleb, A. A. Al-Hashmi, J. H. Abawajy, S. M.
Assessment. Springer, 2018, pp. 3–23. Alanazi, and A. Y. Al-Rezami, “An adaptive behavioral-based incremen-
[147] M. L. Laboratory, G. Tech, and NYU, “The malrec dataset,” 2016. tal batch learning malware variants detection model using concept drift
[Online]. Available: https://giantpanda.gtisc.gatech.edu/malrec/dataset/ detection and sequential deep learning,” IEEE Access, vol. 9, pp. 97 180–
[148] H. S. Anderson and P. Roth, “Ember: an open dataset for training static 97 196, 2021.
pe malware machine learning models,” arXiv preprint arXiv:1804.04637, [172] C. Thapa, K. K. Karmakar, A. H. Celdran, S. Camtepe, V. Varadharajan,
2018. and S. Nepal, “Feddice: A ransomware spread detection in a distributed
integrated clinical environment using federated learning and sdn based
[149] ——, “https://github.com/elastic/ember,” 2017. [Online]. Available:
mitigation,” in International Conference on Heterogeneous Networking
https://github.com/elastic/ember
for Quality, Reliability, Security and Robustness. Springer, 2021, pp.
[150] M. Nunes, “Dynamic malware analysis kernel and user-level calls,”
3–24.
2018. [Online]. Available: https://doi.org/10.5281/zenodo.1203289
[173] D. Morato, E. Berrueta, E. Magaña, and M. Izal, “Ransomware early
[151] R. Ko, E. Tsen, and S. Slapnicar, “Dataset of data breaches and
detection by the analysis of file sharing traffic,” Journal of Network and
ransomware attacks over 15 years from 2004,” 2020. [Online]. Available:
Computer Applications, vol. 124, pp. 14–32, 2018.
https://doi.org/10.14264/dfe5027
[174] D. Min, Y. Ko, R. Walker, J. Lee, and Y. Kim, “A content-based ran-
[152] B. Jethva, I. Traoré, A. Ghaleb, K. Ganame, and S. Ahmed, “Multilayer somware detection and backup solid-state drive for ransomware defense,”
ransomware detection using grouped registry key operations, file entropy IEEE Transactions on Computer-Aided Design of Integrated Circuits and
and file signature monitoring,” Journal of Computer Security, vol. 28, Systems, 2021.
no. 3, pp. 337–373, 2020. [175] S. Mittal, P. Rajput, and S. Subramoney, “A survey of deep learning on
[153] ——, “Botnet and ransomware detection datasets,” 2020. cpus: opportunities and co-optimizations,” IEEE Transactions on Neural
[Online]. Available: https://www.uvic.ca/ecs/ece/isot/datasets/botnet- Networks and Learning Systems, 2021.
ransomware/index.php [176] B. A. AlAhmadi and I. Martinovic, “Malclassifier: Malware family
[154] R. Harang and E. M. Rudd, “Sorel-20m: A large scale benchmark dataset classification using network flow sequence behaviour,” in 2018 APWG
for malicious pe detection,” arXiv preprint arXiv:2012.07634, 2020. Symposium on Electronic Crime Research (eCrime). IEEE, 2018, pp.
[155] Sophos, “Sorel-20m: Sophos-reversinglabs 20 million dataset,” 2020, re- 1–13.
trieved January 27, 2024, from https://github.com/sophos/SOREL-20M. [177] Y. Liu, Y. Li, and D. Xie, “Implications of imbalanced datasets for
[156] “BitcoinHeistRansomwareAddressDataset,” UCI Machine Learning empirical roc-auc estimation in binary classification tasks,” Journal of
Repository, 2020, DOI: https://doi.org/10.24432/C5BG8V. Statistical Computation and Simulation, pp. 1–21, 2023.
[157] L. Yang, A. Ciptadi, I. Laziuk, A. Ahmadzadeh, and G. Wang, “Bodmas: [178] D. Chicco and G. Jurman, “The matthews correlation coefficient (mcc)
An open dataset for learning based temporal analysis of pe malware,” in should replace the roc auc as the standard metric for assessing binary
2021 IEEE Security and Privacy Workshops (SPW). IEEE, 2021, pp. classification,” BioData Mining, vol. 16, no. 1, pp. 1–23, 2023.
78–84. [179] F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He,
[158] ——, “Bodmas malware dataset,” 2021. [Online]. Available: https: “A comprehensive survey on transfer learning,” Proceedings of the IEEE,
//whyisyoung.github.io/BODMAS/ vol. 109, no. 1, pp. 43–76, 2020.
[159] E. Berrueta, “Ransomware and user samples for training and validating [180] K. Sethi, Y. V. Madhav, R. Kumar, and P. Bera, “Attention based multi-
ml models,” 2021, retrieved January 27, 2024, from DOI:10.17632/ agent intrusion detection systems using reinforcement learning,” Journal
yhg5wk39kf.2. of Information Security and Applications, vol. 61, p. 102923, 2021.
[160] S. R. Davies, R. Macfarlane, and W. J. Buchanan, “Napierone: A modern [181] C. Yang, J. Xu, S. Liang, Y. Wu, Y. Wen, B. Zhang, and D. Meng, “Deep-
mixed file data set alternative to govdocs1,” Forensic Science Interna- mal: maliciousness-preserving adversarial instruction learning against
tional: Digital Investigation, vol. 40, p. 301330, 2022. static malware detection,” Cybersecurity, vol. 4, no. 1, pp. 1–14, 2021.
[161] ——, “Napierone,” 2022. [Online]. Available:
http://napierone.com.s3.eu-north-1.amazonaws.com/NapierOne/
index.html#NapierOne/
[162] M. Hirano, R. Hodota, and R. Kobayashi, “Ransap: An open dataset
of ransomware storage access patterns for training machine learning
models,” Forensic Science International: Digital Investigation, vol. 40,
p. 301314, 2022.
[163] ——, “Ransap: An open dataset of ransomware storage access patterns,”
2022. [Online]. Available: https://github.com/manabu-hirano/RanSAP/
[164] E. Berrueta, D. Morató, E. Magaña, and M. Izal, “Open repository for
the evaluation of ransomware detection tools,” 2020. [Online]. Available:
https://dx.doi.org/10.21227/qnyn-q136
[165] C. Grajeda, F. Breitinger, and I. Baggili, “Availability of datasets for
digital forensics–and what is missing,” Digital Investigation, vol. 22, pp.
S94–S105, 2017.
[166] S. Abt and H. Baier, “Are we missing labels? a study of the availability of
ground-truth in network security research,” in 2014 Third International
Workshop on Building Analysis Datasets and Gathering Experience
Returns for Security (BADGERS). IEEE, 2014, pp. 40–55.
28 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3397921
VOLUME 4, 2016 29
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4