With a text mining and bibliometrics approach, this study reviews the literature on the evolution
of malware classification using machine learning. This work takes literature from 2008 to 2022
on the subject of using machine learning for malware classification to understand the impact of
this technology on malware classification. Throughout this study, we seek to answer three main
research questions: RQ1: Is the application of machine learning for malware classification
growing? RQ2: What is the most common machine-learning application for malware
classification? RQ3: What are the outcomes of the most common machine learning
applications? The analysis of 2186 articles resulting from a data collection process from peerreviewed databases shows the trajectory of the application of this technology on malware
classification as well as trends in both the machine learning and malware classification fields of
study. This study performs quantitative and qualitative analysis using statistical and N-gram
analysis techniques and a formal literature review to answer the proposed research questions.
The research reveals methods such as support vector machines and random forests to be
standard machine learning methods for malware classification in efforts to detect maliciousness
or categorize malware by family. Machine learning is a highly researched technology with
many applications, from malware classification and beyond.