Applsci 13 10169
Applsci 13 10169
sciences
Review
Machine-Learning Forensics: State of the Art in the Use of
Machine-Learning Techniques for Digital Forensic
Investigations within Smart Environments
Laila Tageldin 1, * and Hein Venter 2
1 Department of Computer Science, Sudan University of Science and Technology, Khartoum 11111, Sudan
2 Department of Computer Science, University of Pretoria, Pretoria 0002, South Africa; [email protected]
* Correspondence: [email protected]
Abstract: Recently, a world-wide trend has been observed that there is widespread adoption across all
fields to embrace smart environments and automation. Smart environments include a wide variety of
Internet-of-Things (IoT) devices, so many challenges face conventional digital forensic investigation
(DFI) in such environments. These challenges include data heterogeneity, data distribution, and
massive amounts of data, which exceed digital forensic (DF) investigators’ human capabilities to
deal with all of these challenges within a short period of time. Furthermore, they significantly
slow down or even incapacitate the conventional DFI process. With the increasing frequency of
digital crimes, better and more sophisticated DFI procedures are desperately needed, particularly
in such environments. Since machine-learning (ML) techniques might be a viable option in smart
environments, this paper presents the integration of ML into DF, through reviewing the most recent
papers concerned with the applications of ML in DF, specifically within smart environments. It
also explores the potential further use of ML techniques in DF in smart environments to reduce the
hard work of human beings, as well what to expect from future ML applications to the conventional
DFI process.
storage space, but rather communicate their data to other devices. On the other hand, some
of these devices generate such vast amounts of data that, should an investigator not act
fast enough, evidential data might be lost forever. The vast volume of data as well as the
short-lived data created by these smart devices become humanly impossible to sift through.
ML techniques may potentially be employed to assist with this dilemma in order to find
evidence much more effectively in a much shorter time span.
The numerous challenges that face traditional digital forensic investigation (DFI) in
smart environments result from the heterogeneity of, distribution of, and huge amounts of
data involved. This exceeds the capabilities of human DF investigators to cope with all these
challenges in a short time. It severely slows down or even incapacitates the conventional
DFI process. Due to the rapid pace at which digital crimes are committed, better and more
intelligent DFI techniques are sorely needed, especially in smart environments. Machine-
learning (ML) techniques might offer a solution to these challenges [4].
ML has recently been applied in DFI and is still evolving; for example, Ref. [5] designed
a new framework known as IoTDots to help protect the data collected by various smart
devices and applications. This features two main components: the IoTDots analyser and
the IoTDots modifier. The former scans the source code of the applications and detects
forensic information. The latter automatically inserts tracking logs and reports the results.
In an IoT system, particularly in the case of emergent configurations, data might
also be dynamic, making it difficult to classify information during live forensics. In this
sense, live forensics refers to a forensic investigation that is done in near-real time. Hence,
ref. [6] proposed a conceptual framework based on supervised machine-learning techniques.
One of the advantages of using supervised ML techniques in live forensics is the ability
of such techniques to predict possible events based on past occurrences. In addition,
automated feature identification was used to prevent redundancy throughout feature
selection and elimination.
The importance of ML in DFIs should not be underestimated, since such intelligent
technologies have the potential to support and significantly enhance the conventional DFI
process. ML technologies can potentially assist in the automation of manual DFI processes
when significant volumes and a large variety of data must be analysed. Using more
intelligent techniques will increase the chances of identifying and successfully investigating
cybercrimes in modern smart environments. This will help DF specialists get to the root
cause much faster and more efficiently [6].
For all the reasons mentioned above, ML holds great potential for DFIs. However, it is
a foreign field to most DF investigators, and the scope for new research is vast. That being
said, there exists a small corpus of research where ML technology was used to investigate
digital crimes [4].
ML techniques, which are often used to predict behaviour, make use of pattern recog-
nition software for investigators to analyse huge amounts of data. ML techniques seek to
learn from historical perspectives so as to predict future behaviour. Therefore, by using ML
techniques, investigators may gain the capability to recognise patterns of criminal activity
and learn from the historical data when, where, and how the cybercrime probably took
place.
The remainder of this paper is structured as follows. Section 2 provides some back-
ground on digital forensics, the ISO/IEC 27043 international standard on the DFI process,
smart environments, and ML. Section 3 presents state-of-the-art ML techniques used in
digital forensics. Section 4 discusses the role of ML techniques in the DFI process and future
directions in the use of ML in this process. The paper is concluded in Section 5.
2. Background
This section deals with digital forensics, the internationally standardised DFI process,
smart environments, and machine learning—all the important concepts of which the reader
needs to take cognisance in this paper.
Appl. Sci. 2023, 13, x FOR PEER REVIEW 3 of 12
Figure
Figure 1. 1. High-level
High-level overview
overview ofof the
the 27043
27043 international
international standard
standard [9].
[9].
The conventional DF process (i.e., the process that had been followed before ISO/IEC
The conventional DF process (i.e., the process that had been followed before ISO/IEC
27043 was imposed) was only concerned with initialisation, acquisitive, and investigative
27043 was imposed) was only concerned with initialisation, acquisitive, and investigative
processes. However, the conventional DF process consisted of various disparate process
processes. However, the conventional DF process consisted of various disparate process
models that were not harmonised. Therefore, Valjarevic and Venter [8] considered all rele-
models that were not harmonised. Therefore, Valjarevic and Venter [8] considered all rel-
vant models and other standards so as to address the disparities and harmonise them into
evant models and other standards so as to address the disparities and harmonise them
a single standardised model, known as ISO/IEC 27043. In addition to the harmonisation
into a single
effort, standardised
Valjarevic model,
and Venter knownthe
[8] added as readiness
ISO/IEC 27043. In addition
and concurrent to the classes.
process harmonisa-
tion effort,
However, since it has not been tailored for IoT and smart environments, classes.
Valjarevic and Venter [8] added the readiness and concurrent process using the
However,
ISO/IEC 27043 since
DFIitprocess
has notwithin
been tailored
the smart forenvironment
IoT and smart environments,
is still challenging,using
due tothe
the
ISO/IEC 27043 DFI process within the smart environment is still challenging,
wide variety of IoT devices that exist within this environment. The next section briefly due to the
wide variety
describes of IoT
smart devices thattoexist
environments allowwithin this environment.
the reader to understand The next section
the solutions brieflyby
proposed
describes smart
recent research. environments to allow the reader to understand the solutions proposed
by recent research.
2.3. Smart Environments
2.3. Smart
TheEnvironments
smart environment comprises various types of smart devices, sensors, and com-
Thethat
puters smart
areenvironment
connected tocomprises various
the internet types of smart
and embedded devices, sensors,
in numerous objects and
withincom-
this
puters that are connected
environment. to the internet
Smart environments have and embedded
fast grown intoin numerous
a network objects within this
of internet-enabled
environment.
devices, alsoSmart
known environments
as IoT deviceshave
[8].fast grown into
Currently, IoTa devices
networkareof internet-enabled
adopted in almost de-all
vices,
partsalso known
of our as IoT
lives, for devices
instance,[8]. Currently,
home IoT devices
temperature are adopted
management, in almost
smart all parts
lighting, smart
ofappliances, smart
our lives, for sensors,
instance, and smart
home cities [8].
temperature management, smart lighting, smart appli-
ances, Although a smart
smart sensors, andenvironment may improve our quality of life, it also provides a
smart cities [8].
new set of previously untapped data with tremendous forensic value, due to the huge
amount of data generated in this environment [4]. The rapid pace at which digital crimes
Appl. Sci. 2023, 13, 10169 4 of 12
are conceptualised and committed makes it essential to develop better and more intelligent
DFI techniques, especially in smart environments. ML techniques might offer a solution to
these challenges [4,10,11].
However, researchers argue that smart-environment DF is still at a progress level
where an international standard implementation of infrastructure for smart cities has
not been completed yet. Meanwhile, this provides an opportunity for law-enforcement
organizations and investigators to swiftly expand their DF solutions and capabilities [10,11].
The following section presents a brief background on ML techniques.
ment are the large volume of data and attack and violation detection. The proposed solu-
tions are summarised in Figures 2 and 3. The authors decided to split the summary into
two separate figures, since there were
two separate two main
figures, sincethemes detected
there were in all existing
two main solutions:in all existing solutions:
themes detected
the
the first theme involved firstsolutions
MLF theme involved
for largeMLF solutions
amounts for large
of data, whileamounts of data,
the second themewhile the second theme
involved MLF solutions involved MLF
for attack andsolutions
violationfor attack and violation detection.
detection.
Figure
Figure 2. MLF solutions for MLF solutions
large2.amounts of datafor
in large
smartamounts of data in smart environments.
environments.
Figure 2 summarises the applications of MLF that were reported in research papers
from 2018 to 2023 to serve as proposed solutions for dealing with the large amounts of
data generated in smart environments. The following list explains the elements of Figure
2 in more detail:
• The IoTDots framework was proposed as a solution to deal with the large amounts of
data collected by IoT devices and sensors.
• Automatic prioritisation of suspicious file artefacts was proposed as a solution to deal
with the growing volume of data and manual retrieval of suspicious files.
• Intelligent methods to automate problem-solving were proposed as a solution to deal with
the massive amounts of data that must be analysed for digital evidence.
• Automation using ML techniques for classification and AI techniques for prioritising suspi-
cious devices was proposed as a solution to deal with the growing number of cases
needing DF competence and the large volumes of data to be processed.
• Automatic text analysis to detect online sexual predatory talks was proposed as a solution
to deal with the growth
Figureof3. cybercrime
MLF solutions targeting
for attackminors, the large
and violation volume
detection of data,
in smart and
environments.
Figure 3. MLF solutions for attack and violation detection in smart environments.
the DFI process, which is done primarily by hand.
• The “VERITAS” mechanism Figure 2 summarisescollect
to automatically the applications
and extractof MLF that
forensic were reported
evidence from in research papers
Figure 3 summarises
from 2018 the applications
to 2023 to of ML in DF as proposed in research published
smart environments was proposed asserve as proposed
a solution to dealsolutions
with thefor dealing
large with the
amounts of large amounts of data
between 2018 and 2023 for
generated in detecting
smart data attacksThe
environments. andfollowing
violationslist
in explains
smart environments.
the elements of Figure 2 in
data that is generated in smart environments.
The following list
moreexplains
detail: Figure 3 in more detail:
• An intelligent
• intrusion detection
The IoTDots system was
framework to detect regular
proposed as and malicious
a solution attacks
to deal withonthedata
large amounts of
created in smart environments was proposed as
data collected by IoT devices and sensors. a solution to deal with the simple
and complex• attacks thatprioritisation
Automatic face IoT networks in particular.
of suspicious file artefacts was proposed as a solution to deal with
• A blockchain-assisted shared audit
the growing volume framework
of datafor
andidentifying data-scavenging
manual retrieval of suspiciousattacks
files.in
virtualised• resources wasmethods
Intelligent proposed as a solution
to automate to deal with
problem-solving were attack
proposedandasviolation
a solution to deal with
detection in smart environments.
the massive amounts of data that must be analysed for digital evidence.
• An intelligent
• forensic analysis
Automation using mechanism was proposed
ML techniques as a solution
for classification to deal with
and AI techniques the
for prioritising suspicious
probability of continual
devices wasattacks
proposed on IoT
as a devices
solutionand the with
to deal low processing
the growingpower number and of cases needing
memory of these DFdevices.
competence and the large volumes of data to be processed.
•
The following section discusses
Automatic the
text impacttoofdetect
analysis MLFonline
on the DFI process.
sexual predatory talks was proposed as a solution
to deal with the growth of cybercrime targeting minors, the large volume of data, and
4. The Impact of MLF the on
DFIthe DFI Process
process, which is done primarily by hand.
As can be seen in Section 3, a review of research papers examines the contributions
of ML techniques to DF in smart environments. It also identifies digital forensic issues that
each of the reviewed papers addresses and proposes solutions that are based on machine
learning to improve the DFI process.
Table 1 summarises the role of ML techniques in the DFI process. The column headers
Appl. Sci. 2023, 13, 10169 8 of 12
• The “VERITAS” mechanism to automatically collect and extract forensic evidence from
smart environments was proposed as a solution to deal with the large amounts of data
that is generated in smart environments.
Figure 3 summarises the applications of ML in DF as proposed in research published
between 2018 and 2023 for detecting data attacks and violations in smart environments.
The following list explains Figure 3 in more detail:
• An intelligent intrusion detection system to detect regular and malicious attacks on data
created in smart environments was proposed as a solution to deal with the simple and
complex attacks that face IoT networks in particular.
• A blockchain-assisted shared audit framework for identifying data-scavenging attacks in
virtualised resources was proposed as a solution to deal with attack and violation
detection in smart environments.
• An intelligent forensic analysis mechanism was proposed as a solution to deal with the
probability of continual attacks on IoT devices and the low processing power and
memory of these devices.
The following section discusses the impact of MLF on the DFI process.
Table 1 presents the role of ML techniques in ISO/IEC 27043:2015 DFI processes and
highlights gaps where ML techniques may be used to improve the processes. An “X”
in a particular cell indicates that the specific ML technique was applied in the processes
indicated. These techniques contributed mainly to the initialisation and investigative
processes of the ISO/IEC 27043:2015 set of standards, and there was a lack of application of
ML techniques to the other process areas of this standard.
Applying ML and AI techniques in the areas of ISO/IEC 27043:2015 can automate
and improve the DFI process, since the uncovered areas are currently mostly processed
manually. For example, the Markov chain model (see Table 1 [5]) already automates
the analysis process through two main components, referred to as the ‘modifier’ and
‘analyser’ components. The ‘modifier’ component examines smart applications in search
Appl. Sci. 2023, 13, 10169 9 of 12
Furthermore, DFR principles already assure the forensic soundness of the information
gathered, making it appropriate for litigation. Therefore, ML techniques can be applied
in the DFR process, making use of techniques such as noise-resistant algorithms, support
vector machines, and neural networks. By making use of such ML techniques, investigators
could deduct the rules of classification from existing and historical datasets and scenarios
to learn and train the readiness model. Clustering could then be applied to improve the
accuracy of classification and allow the model to make decisions by itself.
ML techniques are mainly used for prediction and classification. Therefore, the ac-
quisitive and concurrent processes in ISO/IEC 27043:2015 can be automated using ML
techniques. This will benefit the readiness and initialisation processes in this set of stan-
dards to predict and detect incidents using decision trees and neural networks.
On the other hand, the incorporation of ML techniques can be powerful for DFI, but
there is also a lack of interpretability and inadequate training data, which may lead to
powerless and improperly comprehended models [29].
5. Conclusions
By presenting an overview of MLF research papers from 2018 until 2023, this paper
shows how ML techniques have recently been used across different areas of the DFI process
in smart environments. Common challenges for DF in these environments were also
highlighted. Although intelligent technologies such as ML have the potential to aid in DFI,
these technologies mainly facilitate the automation of manual DFI processes. However,
this paper reports that numerous research papers found that ML techniques are applied in
DF in a bid to improve the efficiency of the DFI process by means of automation, which
decreases investigators’ manual effort and hard work. It also investigated numerous ways
to highlight what to expect in the future from MLF applications. Finally, it discussed the
role of ML techniques in the DFI process as advocated in ISO/IEC 27043:2015. This was
done to highlight gaps that need more attention and where ML techniques can also be
applied to improve the current DFI process.
In other words, the main contribution of this paper is to let the reader know what has
been done in this area and where current gaps are still evident. This will help researchers
not to do excessive research themselves to learn what the current gaps are. All the current
gaps can then be easily identified by a researcher, and the researcher can decide which of
those gaps to solve in their own future research.
Therefore, as mentioned in Section 4, the main limitations of ML are that there is a lack
of interpretability and inadequate training data, which may lead to powerless and poorly
comprehended models. Thus, it would be powerful to simulate results and then compare
the performance of different ML techniques in a future work. In addition, according to the
state of the art in applying ML techniques in DFI within smart environments (which mainly
involves the initialisation and investigative processes, as shown in Table 1), ML techniques
should be applied more prominently in the readiness processes, acquisitive processes, and
concurrent processes of DFI.
Author Contributions: Conceptualization, L.T.; methodology, L.T. and H.V.; validation, H.V. and
L.T.; formal analysis, L.T.; investiga-tion, L.T.; resources, L.T.; data curation, L.T.; writing—original
draft preparation, L.T.; writing—review and editing, H.V. and L.T.; visualization, L.T.; supervision,
H.V.; project administration, H.V.; funding acquisition, H.V. All authors have read and agreed to the
published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
Appl. Sci. 2023, 13, 10169 11 of 12
References
1. Popescul, D.; Radu, L.D. Data Security in Smart Cities: Challenges and Solutions. Inform. Econ. 2016, 20, 29–38. [CrossRef]
2. Quick, D.; Choo, K.-K.R. Big forensic data management in heterogeneous distributed systems: Quick analysis of multimedia
forensic data. Software Pract. Exp. 2016, 47, 1095–1109. [CrossRef]
3. Watson, S.; Dehghantanha, A. Digital forensics: The missing piece of the Internet of Things promise. Comput. Fraud. Secur. 2016,
2016, 5–8. [CrossRef]
4. Du, X.; Hargreaves, C.; Sheppard, J.; Anda, F.; Sayakkara, A.; Le-Khac, N.A.; Scanlon, M. SoK. In Proceedings of the 15th
International Conference on Availability, Reliability and Security, Virtual, 25–28 August 2020. [CrossRef]
5. Babun, L.; Sikder, A.; Acar, A.; Uluagac, A. IoTDots: A Digital Forensics Framework for Smart Environments. arXiv 2022,
arXiv:1809.00745. Available online: https://arxiv.org/abs/1809.00745 (accessed on 1 March 2023).
6. Kebande, V.R.; Ikuesan, R.A.; Karie, N.M.; Alawadi, S.; Choo, K.-K.R.; Al-Dhaqm, A. Quantifying the need for supervised
machine learning in conducting live forensic analysis of emergent configurations (ECO) in IoT environments. Forensic Sci. Int.
Rep. 2020, 2, 100122. [CrossRef]
7. Valjarevic, A.; Venter, H.S. A Comprehensive and Harmonized Digital Forensic Investigation Process Model. J. Forensic Sci. 2015,
60, 1467–1483. [CrossRef]
8. Conti, M.; Dehghantanha, A.; Franke, K.; Watson, S. Internet of Things security and forensics: Challenges and opportunities.
Futur. Gener. Comput. Syst. 2018, 78, 544–546. [CrossRef]
9. Valjarevic, A.; Venter, H.; Petrovic, R. ISO/IEC 27043:2015—Role and application. In Proceedings of the 2016 IEEE 24th
Telecommunications Forum (TELFOR), Belgrade, Serbia, 22–23 November 2016; pp. 1–4. [CrossRef]
10. Tok, Y.C.; Chattopadhyay, S. Identifying threats, cybercrime and digital forensic opportunities in Smart City Infrastructure via
threat modeling. Forensic Sci. Int. Digit. Investig. 2023, 45, 301540. [CrossRef]
11. Sahib, H.I.; AlSudani, M.Q.; Ali, M.H.; Abbas, H.Q.; Moorthy, K.; Adnan, M.M. Proposed intelligence systems based on digital
Forensics: Review paper. Mater. Today Proc. 2023, 80, 2647–2651. [CrossRef]
12. Qadir, A.M.; Varol, A. The role of machine learning in Digital Forensics. In Proceedings of the 2020 8th International Symposium
on Digital Forensics and Security (ISDFS), Beirut, Lebanon, 1–2 June 2020. [CrossRef]
13. Goni, I.; Gumpy, J.M.; Maigari, T.U.; Muhammad, M.; Saidu, A. Cybersecurity and Cyber Forensics: Machine Learning Approach.
Mach. Learn. Res. 2020, 5, 46. [CrossRef]
14. Iqbal, S.; Alharbi, S.A. Advancing Automation in Digital Forensic Investigations Using Machine Learning Forensics. Digit.
Forensic Sci. 2020. [CrossRef]
15. Jarrett, A.; Choo, K.R. The impact of automation and artificial intelligence on digital forensics. WIREs Forensic Sci. 2021, 3, e1418.
[CrossRef]
16. Du, X.; Scanlon, M. Methodology for the automated metadata-based classification of incriminating digital forensic artefacts. In
Proceedings of the 14th International Conference on Availability, Reliability and Security, Canterbury, UK, 26–29 August 2019; pp.
1–8. Available online: https://bit.ly/2Oqh6u6 (accessed on 9 March 2023).
17. Krivchenkov, A.; Misnevs, B.; Pavlyuk, D. Intelligent Methods in Digital Forensics: State of the Art. In Lecture Notes in Networks
and Systems; Springer: Berlin/Heidelberg, Germany, 2019; pp. 274–284. [CrossRef]
18. Babun, L.; Sikder, A.; Acar, A.; Uluagac, S. The Truth Shall Set Thee Free: Enabling Practical Forensic Capabilities in Smart
Environments. In Proceedings of the 2022 Network and Distributed System Security Symposium, San Diego, CA, USA, 24–28
April 2022. [CrossRef]
19. Shakeel, P.M.; Baskar, S.; Fouad, H.; Manogaran, G.; Saravanan, V.; Montenegro-Marin, C.E. Internet of things forensic data
analysis using machine learning to identify roots of data scavenging. Futur. Gener. Comput. Syst. 2021, 115, 756–768. [CrossRef]
20. Adam, I.Y.; Varol, C. Intelligence in digital forensics process. In Proceedings of the 2020 8th International Symposium on Digital
Forensics and Security (ISDFS), Beirut, Lebanon, 1–2 June 2020. [CrossRef]
21. Ngejane, C.; Eloff, J.; Sefara, T.; Marivate, V. Digital forensics supported by machine learning for the detection of online sexual
predatory chats. Forensic Sci. Int. Digit. Investig. 2021, 36, 301109. [CrossRef]
22. Kalnoor, G.; Gowrishankar, S. IoT-based smart environment using intelligent intrusion detection system. Soft Comput. 2021,
25, 11573–11588. [CrossRef]
23. Mazhar, M.S.; Saleem, Y.; Almogren, A.; Arshad, J.; Jaffery, M.H.; Rehman, A.U.; Shafiq, M.; Hamam, H. Forensic Analysis on
Internet of Things (IoT) Device Using Machine-to-Machine (M2M) Framework. Electronics 2022, 11, 1126. [CrossRef]
24. Koroniotis, N.; Moustafa, N.; Slay, J. A new Intelligent Satellite Deep Learning Network Forensic framework for smart satellite
networks. Comput. Electr. Eng. 2022, 99, 107745. [CrossRef]
25. Palmese, F.; Redondi, A.E.; Cesana, M. Feature-Sniffer: Enabling IoT Forensics in OpenWrt based Wi-Fi Access Points. arXiv 2023,
arXiv:2302.06991. Available online: https://arxiv.org/abs/2302.06991 (accessed on 5 June 2023).
26. Salih, K.M.M.; Dabagh, N.B.I. Digital Forensic Tools: A Literature Review. J. Educ. Sci. 2023, 32, 109–124. [CrossRef]
27. Shahbazi, Z.; Byun, Y.-C. NLP-Based Digital Forensic Analysis for Online Social Network Based on System Security. Int. J. Environ.
Res. Public Health 2022, 19, 7027. [CrossRef] [PubMed]
28. Ferreira, S.; Antunes, M.; Correia, M.E. A Dataset of Photos and Videos for Digital Forensics Analysis Using Machine Learning
Processing. Data 2021, 6, 87. [CrossRef]
Appl. Sci. 2023, 13, 10169 12 of 12
29. Balushi, Y.A.; Shaker, H.; Kumar, B. The use of machine learning in digital forensics: Review paper. In Proceedings of
the 1st International Conference on Innovation in Information Technology and Business (ICIITB 2022); Atlantis Press: Amsterdam, The
Netherlands, 2023; pp. 96–113. [CrossRef]
30. Baig, Z.; Khan, M.A.; Mohammad, N.; Ben Brahim, G. Drone Forensics and Machine Learning: Sustaining the Investigation
Process. Sustainability 2022, 14, 4861. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.