0% found this document useful (0 votes)
17 views7 pages

IEEE Review Paper Template

This document discusses a seven-phase pipeline for vulnerability detection and exploitability prediction in device code, utilizing various datasets and machine learning models to enhance cybersecurity. It emphasizes the importance of understanding exploitability to prioritize vulnerabilities effectively, particularly in high-stakes sectors like healthcare. The work critiques existing vulnerability management frameworks and advocates for more adaptive scoring systems to improve the accuracy and reliability of vulnerability assessments.

Uploaded by

Shilpa K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views7 pages

IEEE Review Paper Template

This document discusses a seven-phase pipeline for vulnerability detection and exploitability prediction in device code, utilizing various datasets and machine learning models to enhance cybersecurity. It emphasizes the importance of understanding exploitability to prioritize vulnerabilities effectively, particularly in high-stakes sectors like healthcare. The work critiques existing vulnerability management frameworks and advocates for more adaptive scoring systems to improve the accuracy and reliability of vulnerability assessments.

Uploaded by

Shilpa K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Compliant Classification and Solution Generation

Fadil Fazal Rahiman1 , Jishnu Das Anjam Kudy2 , Neha Ann John3 ,
Sreelekshmi Satheesh4 , Dr. Ansamma John5 , Dr. Manu J Pillai6
Department of Computer Science and Engineering
TKM College of Engineering
December 2024

Abstract—Vulnerabilities are weaknesses in software, hard- and impact. However, their efficacy is often constrained by
ware, or systems that attackers can exploit to compromise limitations such as the lack of real-world context, inadequate
security and functionality. These weaknesses, often due to coding handling of complex code structures, and a focus on known
errors, configuration issues, or design flaws, create potential
access points for malicious actors. Exploitability, which measures vulnerabilities, leaving many organizations ill-prepared for
the ease or likelihood of successful exploitation, is critical for emerging threats.
prioritizing vulnerabilities; some are easily exploitable, while
others require complex conditions or specific resources. Under- II. METHODOLOGY
standing and predicting exploitability helps cybersecurity teams Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do
focus on high-risk threats, enhancing security posture. This work
presents a seven-phase pipeline for vulnerability detection and eiusmod tempor incididunt ut labore et dolore magna aliqua.
exploitability prediction in device code, using diverse datasets Ut enim ad minim veniam, quis nostrud exercitation ullamco
and machine learning models. The approach leverages data laboris nisi ut aliquip ex ea commodo consequat. Duis aute
sets such as REVEAL, Big-Vul, DEVIGN, and the National irure dolor in reprehenderit in voluptate velit esse cillum dolore
Vulnerability Database (NVD) to support detection accuracy. The eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
pipeline begins with code input and parsing to extract syntax
structures, followed by the generation of control flow graphs to proident, sunt in culpa qui officia deserunt mollit anim id est
model execution paths for analysis. Execution path extraction laborum.
then identifies routes for vulnerability detection, and code rep- Curabitur pretium tincidunt lacus. Nulla gravida orci a odio.
resentation uses pretrained models like CodeBERT to convert Nullam varius, turpis et commodo
paths into feature vectors. A Convolutional Neural Network
(CNN) model analyzes these vectors to detect vulnerabilities, III. L ITERATURE R EVIEW
while exploit prediction employs the Exploit Prediction Scoring
System (EPSS) and logistic regression to assign risk scores, The literature on vulnerability management and exploit pre-
prioritizing vulnerabilities. Finally, testing and validation ensure diction spans a wide spectrum of research themes, reflecting
the pipeline’s reliability using tools like pytest and Coverage.py. continuous advancements in cybersecurity practices. This sec-
This structured approach enhances vulnerability assessment and tion synthesizes insights from different references, categorized
exploitability prediction, enabling better risk prioritization and
remediation for cybersecurity in healthcare systems. into key thematic areas.

Index Terms—EPSS, EPVD, Vulnerability, CVSS A. Introduction to Vulnerability Management


Vulnerability management is a critical component of cyberse-
I. I NTRODUCTION curity, focusing on the identification, evaluation, and mitiga-
In the domain of cybersecurity, vulnerabilities are critical tion of software risks. The process begins with discovering
weaknesses in software, hardware, or systems that can be ex- vulnerabilities—weaknesses in software that attackers can
ploited by malicious actors to compromise security, integrity, exploit. Early studies, such as Toloudis et al. (2016) [1],
or functionality. As digital infrastructures grow increasingly highlighted the importance of predicting the severity of these
complex and interconnected, particularly in high-stakes sectors vulnerabilities by analyzing textual descriptions, which laid the
like healthcare, the task of identifying and mitigating these vul- groundwork for severity prediction systems. They emphasized
nerabilities has become both urgent and challenging. The po- the role of descriptive metrics, which helped organizations
tential consequences of unaddressed vulnerabilities—ranging assess the potential impact of a vulnerability.
from data breaches to operational failures—underscore the However, as cybersecurity threats evolve, traditional meth-
pressing need for robust detection and management mecha- ods like the Common Vulnerability Scoring System (CVSS)
nisms. have been criticized for their static nature. Jacobs et al. (2023)
Existing vulnerability detection and management frame- [2] criticized CVSS for failing to adapt to rapidly chang-
works largely rely on cataloging known vulnerabilities or as- ing threat landscapes. Their study, along with contributions
sessing risks using standardized metrics such as the Common from others [3, 4], suggested transitioning to more adaptive
Vulnerability Scoring System (CVSS). These tools provide scoring systems that can dynamically respond to evolving
organizations with an essential foundation for prioritizing threats. These improvements enable more accurate and timely
vulnerabilities based on factors like severity, exploitability, responses to new vulnerabilities.
With the rise of big data, machine learning techniques have These systems rely on detecting unusual patterns or behaviors
shown promise in enhancing vulnerability management. Data in systems to predict potential exploits.
mining methods, such as classification, clustering, and regres- Recent advancements have emphasized integrating domain-
sion, have enabled the prioritization of vulnerabilities within specific features into exploit prediction systems. While mod-
massive datasets. Bhatt et al. (2020) [5] demonstrated how els like FastEmbed and DarkEmbed use neural networks
machine learning techniques such as support vector machines for improved accuracy, domain-specific adaptations, such as
(SVMs), decision trees, and ensemble models can enhance risk the CVSSCAV model for automotive cybersecurity, address
prediction, making it more scalable and efficient. unique challenges by incorporating metrics like safety, finance,
In addition, natural language processing (NLP) has become and privacy impacts. Wang et al. (2023) [11] demonstrated
a powerful tool in analyzing vulnerability descriptions. Verma how Bayesian Networks (BNs) can handle limited datasets
et al. (2019) [6] applied NLP techniques to extract useful in- through probabilistic inference, enabling dynamic and realis-
formation from vulnerability reports, improving the prediction tic severity predictions. Hybrid methods, combining decision
of exploitability and severity. By analyzing unstructured data, trees, random forests, and neural networks, further enhance
NLP allows automated systems to understand and classify these systems. Looking ahead, emerging technologies like
vulnerabilities with minimal human intervention. graph neural networks (GNNs) and transformers promise to
Vulnerability management relies heavily on standardized revolutionize exploit prediction by capturing complex relation-
frameworks like the Common Vulnerability Scoring System ships with greater precision and scalability.
(CVSS) to assess the severity and prioritize the remediation Traditional scoring systems like CVSS struggle to pre-
of software vulnerabilities. However, Howland (2022) [7] dict exploitability accurately, prompting the development of
critiques the effectiveness of CVSS, highlighting several lim- advanced models like ExBERT. Leveraging transfer learn-
itations that undermine its reliability. These include a lack of ing with fine-tuned BERT on cybersecurity-specific corpora,
transparency in its scoring formula, inconsistencies in its spec- ExBERT overcomes dataset limitations by extracting rich se-
ification document, and poor correlation between CVSS scores mantic features through a Pooling Layer and capturing depen-
and real-world exploitability. These issues raise concerns about dencies with an LSTM-based Classification Layer (Yin et al.,
the potential misprioritization of remediation efforts, leaving 2020) [12]. Achieving 91.12% accuracy and 91.82% precision
critical vulnerabilities unaddressed. Howland calls for a re- on 46,176 vulnerabilities, ExBERT significantly outperforms
evaluation of current scoring methodologies and advocates prior models, highlighting the power of combining natural
for empirically validated systems to improve the accuracy language processing and machine learning for prioritizing vul-
and practicality of vulnerability management frameworks. This nerabilities. Future advancements may incorporate additional
critique underscores the need for a deeper understanding of factors, such as user privilege and confidentiality impacts, to
vulnerability management tools and the development of more enhance predictive accuracy further (Yin et al., 2020) [12].
robust and adaptive systems. Predicting exploitability is a critical component of vulner-
ability management. Joana et al. (2020) [13] introduced a
B. Exploit Prediction Scoring and Models predictive framework that utilizes natural language processing
(NLP) techniques to derive Common Vulnerability Scoring
Exploit prediction scoring systems (EPSS) are designed to System (CVSS) metrics, such as Exploitability and Impact
predict the likelihood that a given vulnerability will be ex- scores, directly from unstructured vulnerability descriptions.
ploited in the real world. This is particularly important as Their model standardizes and interprets textual data from
vulnerabilities with high exploitability pose a more immediate databases like the National Vulnerability Database (NVD),
threat than those that are less likely to be targeted. Jacobs achieving high accuracy in aligning predictions with expert-
et al. (2021) [4] introduced hybrid models that combined assigned CVSS values. This approach highlights the potential
statistical analysis and machine learning techniques to enhance of combining semantic analysis and domain knowledge for
the accuracy of exploit prediction. These models focus on real- automated scoring. However, challenges remain, including
world data to better capture the conditions under which an inconsistencies in description quality and biases in labeled
exploit is likely to occur. data, which affect reliability. Joana et al. called for more
The development of specialized models like FastEmbed robust datasets and frameworks to address these limitations
(Fang et al., 2020) [8] and DarkEmbed (Tavabi et al., 2018) [9] and enhance predictive accuracy.
has helped improve the predictive power of these systems by
using deep learning techniques. FastEmbed and DarkEmbed C. Datasets and Contextual Analysis
use neural networks to analyze and identify patterns in exploit A key challenge in vulnerability management is the avail-
data, leading to better risk prioritization. ability of high-quality datasets that can be used for training
Furthermore, research into zero-day vulnerabilities—those machine learning models and validating predictive tools. The
that are not yet publicly known—has become a major focus. National Vulnerability Database (NVD), REVEAL, and Big-
Movahedi et al. (2021) [10] proposed the use of unsupervised Vul datasets have been instrumental in the development of
learning models for the detection of zero-day exploits, which predictive models. Jiménez et al. (2019) [14] emphasized the
are particularly challenging due to the lack of prior knowledge. need for datasets that are not only large but also accurately
labeled and rich in context. Contextual metadata, such as in tandem with the threat landscape. Such datasets would
the environment in which a vulnerability is found, plays a enable models to adapt to new vulnerabilities and provide more
significant role in determining its exploitability and impact. accurate predictions.
Guo et al. (2022) [15] proposed the use of machine learning The work of Joana et al. relied heavily on datasets such as
to standardize vulnerability descriptions and address inconsis- the National Vulnerability Database (NVD), which provides
tencies in existing datasets. Their work helped improve data structured vulnerability data along with textual descriptions.
quality, making it more useful for predictive modeling. They emphasized the importance of contextual metadata, such
Gonzalez et al. (2019) [16] also utilized the REVEAL as attack vectors and impact metrics, in enabling accurate pre-
dataset to benchmark their exploit prediction models. This dictions. Their research demonstrated that structured elements
dataset allowed for a more thorough evaluation of how well like CVSS scores, combined with unstructured descriptions,
machine learning models could predict real-world exploita- provide the necessary context for advanced machine learning
tions based on historical data. models. Joana et al. also addressed issues related to dataset
Zhang et al. (2015) [17] demonstrated how structured data, quality, noting that the variability in descriptions across vul-
such as that found in the NVD, could be used to improve nerabilities often introduces noise in predictive tasks. By using
predictive models. Structured data allows models to better techniques such as text preprocessing and standardization,
recognize patterns and prioritize vulnerabilities that are most they improved the consistency of data inputs, enabling their
likely to lead to exploits. model to achieve higher predictive accuracy. Furthermore,
The integration of multiple datasets has further advanced the their findings suggest that enhancing datasets with additional
field by providing a richer context for vulnerability analysis. contextual metadata, such as remediation details or exploit
For example, Nikitopoulos et al. (2021) [18] introduced the patterns, could further improve the performance of prediction
CrossVul dataset, which combines NVD data with GitHub models [13].
commit histories to create a comprehensive resource for vul- Recent advancements in power system security highlight
nerability research. This dataset links vulnerabilities to their the growing role of data-driven approaches in state estimation
corresponding code changes, enabling researchers to study and anomaly detection. Reda et al. (2023) [23] proposed a
patterns in software development that lead to vulnerabilities two-stage framework combining predictive state estimation
Another significant contribution is the Big-Vul dataset, and adaptive detection of False Data Injection (FDI) attacks,
developed by Fan et al. (2020) [19], which focuses on C/C++ leveraging datasets from IEEE benchmark systems and real-
vulnerabilities mined from open-source projects on GitHub. time power load data. By integrating contextual metadata, such
This dataset stands out for its inclusion of real-world vulner- as network topology configurations and temporal correlations
abilities rather than synthetic ones, making it invaluable for among state variables, their approach improved the accuracy
training and evaluating machine learning models. The authors of predictions and detection rates. The framework utilized pre-
emphasized the importance of leveraging real-world data to processing techniques like normalization and noise reduction
improve the generalizability and accuracy of exploit prediction to ensure data consistency, enabling deep neural networks to
models. effectively capture patterns in both attack-free and manipulated
While structured data is crucial, unstructured data, such as scenarios. Experimental results demonstrated the superiority of
vulnerability descriptions and advisories, also plays a vital role this method over traditional Weighted Least Squares (WLS)-
in contextual analysis. Techniques such as natural language based approaches, achieving higher detection accuracy while
processing (NLP) have been used to extract insights from minimizing false positives. This work emphasizes the impor-
unstructured text. For instance, Williams et al. (2018) [20] tance of enriched datasets and adaptive learning mechanisms
employed topic modeling to analyze trends in vulnerability in advancing the reliability of modern power systems against
types and their evolution over time. This approach revealed sophisticated cyberattacks.
valuable insights into emerging threats and the shifting focus Web scraping techniques play a crucial role in gathering
of attackers. datasets for research and analysis, especially from diverse
Moreover, contextual metadata—such as Common Platform domains such as e-commerce, education, and real estate.
Enumeration (CPE) identifiers and Common Configuration Bale et al. (2022) [24] evaluated seven web scraping ap-
Enumeration (CCE) entries—has proven to be an essential proaches, including Python requests, Selenium, and undetected
aspect of datasets. Fedorchenko et al. (2017) [21] highlighted Chromedriver, across 120 websites spanning eight categories.
the use of ontologies to link contextual metadata with vulner- The study analyzed key parameters such as the number of
ability records. This approach not only enhances the utility of requests before bot detection, time spent on websites, and
existing datasets but also provides a framework for creating success rates for data extraction. Results revealed that unde-
hybrid datasets that integrate multiple sources of information. tected Chromedriver outperformed other methods, successfully
Despite the progress made, several challenges remain. extracting data while bypassing anti-bot mechanisms. The
Inconsistent reporting standards across datasets, incomplete analysis also uncovered significant domain-specific trends:
vulnerability descriptions, and a lack of real-time updates websites in educational and e-commerce categories demon-
hinder the full potential of these resources. Jiang et al. (2021) strated stronger anti-scraping defenses compared to others.
[22] pointed out the need for dynamic datasets that evolve This research highlights the importance of tailoring web
scraping methods to the characteristics of target datasets and into vectorized representations, allowing the system to detect
underscores the need for improved website protections against complex patterns associated with vulnerabilities without re-
automated data collection. lying on manually engineered features. Experimental results
demonstrate that VulDeePecker reduces false negatives while
D. Sector-Specific Vulnerability Analysis maintaining reasonable false positive rates. Notably, it detected
Different industries face unique cybersecurity challenges, ne- four previously unreported vulnerabilities in real-world soft-
cessitating sector-specific approaches to vulnerability manage- ware products, including Xen and Seamonkey, emphasizing its
ment. For instance, the healthcare sector is often burdened by practical utility. This work underscores the potential of deep
legacy systems and the need to protect sensitive patient data. learning to revolutionize automated vulnerability detection and
Islam et al. (2022) [25] analyzed vulnerabilities in healthcare reduce reliance on traditional manual methods.
systems and found that outdated software often exacerbates Lightweight assisted tools for vulnerability detection have
security risks. This highlights the need for customized risk gained prominence with the increasing complexity of software
management solutions that can address the unique demands systems. Li et al. (2019) [32] introduced LAVDNN, a deep
of the healthcare industry. neural network-based framework for vulnerability discovery.
In contrast, Beyrouti et al. (2023) [26] focused on the LAVDNN uses function names as semantic features to classify
Internet of Things (IoT), proposing a tailored framework weak and benign functions in source code, significantly im-
for assessing IoT vulnerabilities. IoT devices, due to their proving efficiency by narrowing the scope of manual auditing.
interconnectivity and often weak security, are prime targets The system employs a Bidirectional Long Short-Term Memory
for exploitation. The framework they developed helps orga- (BLSTM) network for sequence analysis, achieving an F2-
nizations assess the risks associated with IoT devices and score of 0.91 for C/C++ and 0.915 for Python programs.
implement targeted mitigation strategies. Experimental evaluation on projects like FFmpeg and LibTIFF
Ariffin et al. (2020) [27] studied vulnerabilities in cloud demonstrated that LAVDNN effectively detects vulnerabili-
computing environments, specifically focusing on API vul- ties while maintaining low false positive rates. By reducing
nerabilities. Cloud environments are unique due to their dis- reliance on human-defined patterns and preprocessing steps,
tributed nature, and API vulnerabilities pose a significant threat LAVDNN showcases the potential of machine learning to
because they serve as the gateway between systems. They enhance vulnerability discovery and streamline cybersecurity
recommended automated detection strategies to identify and workflows.
mitigate these vulnerabilities promptly. Recent advancements in vulnerability detection emphasize
the use of enhanced graph representation learning. Xiao et al.
E. Machine Learning and Automated Systems (2024) [33] proposed the EnGS2F framework, which leverages
Gonzalez et al. (2019) [28] applied semantic analysis to program dependency graphs (PDGs) combined with con-
vulnerability reports, showing how NLP could help rank text relationship graphs (CRGs) to represent code contextual
vulnerabilities by severity, providing organizations with a structures more comprehensively. The framework integrates
prioritized list of vulnerabilities to address. These techniques abstract syntax tree (AST) and paragraph embeddings for mul-
are essential in managing the ever-growing volume of vulner- tidimensional feature representation, capturing both syntactic
abilities reported daily. and semantic features. Additionally, the Gated Graph Neural
The use of deep learning techniques, such as CNNs and Attention Network (GGNAT) is employed to model long-
RNNs, has further enhanced the capabilities of vulnerability distance dependencies and refine node representations. Exper-
prediction systems. CNNs are particularly effective at analyz- imental evaluations using datasets such as SARD and NVD
ing large datasets for patterns, while RNNs are better suited showed EnGS2F achieving a 6% higher F1 score than existing
for understanding sequential data, such as attack vectors or methods, with notable improvements in accuracy and stability
exploit chains. These methods have made predictive models across diverse vulnerability types. These results highlight the
more accurate and capable of identifying vulnerabilities that potential of graph-based learning techniques to outperform
might otherwise go unnoticed [29]. traditional token-based methods and provide a more nuanced
Automated systems like Autosploit (Moscovich et al., 2020) approach to detecting complex software vulnerabilities.
[30] have demonstrated the power of machine learning in real- Recent advancements in vulnerability detection highlight the
time exploit prediction. These tools use advanced algorithms importance of explainability in machine learning systems. Chu
to predict exploitability based on the latest threat intelligence et al. (2024) [34] introduced CFExplainer, a counterfactual
and automatically trigger mitigations or alerts when a potential reasoning-based framework for enhancing the explainability of
exploit is detected. Graph Neural Networks (GNNs) used in vulnerability detec-
The use of machine learning in vulnerability detection has tion. Unlike traditional factual reasoning-based methods that
shown significant promise with systems like VulDeePecker. identify sub-graphs contributing to a model’s prediction, CF-
Introduced by Li et al. (2018) [31], VulDeePecker leverages Explainer generates counterfactual explanations by identifying
deep learning to identify software vulnerabilities by repre- minimal perturbations to the input code graph that alter the
senting programs as ”code gadgets,” which are semantically detection outcome. This approach not only pinpoints the root
related code fragments. These code gadgets are converted causes of detected vulnerabilities but also provides actionable
insights for developers to address them. Extensive evaluations while maintaining accuracy and reducing false positives. This
demonstrated CFExplainer’s superior performance over state- work underscores the potential of path-prioritized analysis
of-the-art explainers, achieving higher accuracy and clarity in in improving the scalability and reliability of smart contract
identifying critical features contributing to vulnerabilities. This vulnerability detection.
work underscores the potential of counterfactual reasoning in
improving the transparency and utility of GNN-based systems G. Collaborative Insights and Chronological Advances
for software security. The role of collaboration in vulnerability management cannot
F. Advanced Methods in Vulnerability Detection be overstated. Crowdsourced data, especially from Bug Bounty
programs, has proven invaluable in identifying and resolv-
Cao et al. (2021) introduced a Bidirectional Graph Neural
ing vulnerabilities more quickly. Komarkova et al. (2018)
Network (BGNN) model to enhance vulnerability detection
[37] highlighted the effectiveness of crowdsourced data in
by combining syntax and semantic information from ASTs,
categorizing vulnerabilities and accelerating the discovery of
CFGs, and DFGs. The bidirectional edge propagation enables
new threats. Bug Bounty programs, where ethical hackers are
improved context capture across graph nodes, while a CNN
rewarded for finding vulnerabilities, have become a critical
module refines extracted features for better classification.
component of modern cybersecurity strategies.
Experimental evaluations on datasets like Linux Kernel and
Chronologically, the shift from descriptive severity metrics
FFmpeg showed up to 11.0% accuracy and 8.4% precision
to predictive modeling represents a significant evolution in
improvements over baseline methods, highlighting BGNN’s
vulnerability management. Early efforts, such as those by
ability to detect vulnerabilities efficiently and accurately [35].
Toloudis et al. (2016) [1], focused on severity ratings, but
The application of deep learning techniques has significantly
as the threat landscape became more dynamic, predictive
advanced vulnerability detection systems. Zhang et al. (2023)
modeling methods began to take center stage. The integration
proposed a novel approach that addresses key limitations in
of real-time data and machine learning has further advanced
traditional methods, such as handling irrelevant information
these systems, allowing for more accurate and timely threat
and managing long code snippets. Their method leverages
predictions.
syntax-based Control Flow Graphs (CFG) to decompose code
Collaborative frameworks, integrating insights from diverse
snippets into execution paths, allowing models to focus on
stakeholders—including security researchers, private compa-
vulnerability-relevant features while mitigating issues caused
nies, and government agencies—have also improved the over-
by code length limitations [29].
all accuracy of vulnerability management tools. Suciu et al.
Specifically, the model constructs a syntax-based CFG using
(2021) [38] observed that adaptive risk models that incorporate
Abstract Syntax Trees (AST) and employs a greedy-based
real-time data streams are essential in tackling the challenges
algorithm to extract multiple representative paths. These paths
posed by an ever-changing cybersecurity landscape.
are processed with pre-trained code models like CodeBERT
and fused using convolutional neural networks (CNN) to
H. Challenges and Future Directions
capture both intra- and inter-path features. The experimental
evaluation, conducted on over 231K C/C++ code snippets, Despite the advancements in vulnerability management, sig-
demonstrates substantial improvements over existing base- nificant challenges remain. One of the biggest hurdles is
lines, achieving up to 22.30% higher Precision, 42.92% Recall, adapting to emerging threats, such as zero-day vulnerabilities.
and a 32.58% boost in F1-Score [29]. This demonstrates the Movahedi et al. (2021) [10] emphasized the importance of
efficacy of path decomposition in simplifying complex code developing adaptive risk models to handle these types of
structures, enhancing the accuracy of automated vulnerability vulnerabilities. Zero-day exploits, which are unknown until
detection systems. they are discovered in the wild, pose a unique challenge for
Fu et al. (2019) [36] proposed a critical-path-coverage- traditional vulnerability management systems.
based framework to address inefficiencies caused by path Chawla et al. (2019) [39] proposed augmenting static scor-
explosion in symbolic execution. Unlike traditional methods ing systems like CVSS with dynamic, real-time metrics to
that analyze all program paths indiscriminately, this approach improve vulnerability prioritization. This would allow organi-
prioritizes critical paths containing sensitive instructions, such zations to better respond to threats as they emerge.
as Ether transfer operations, by integrating static analysis with Looking forward, the integration of interdisciplinary ap-
a multi-objective oriented path search (MOPS) strategy. This proaches is expected to be crucial. Huang and Wu (2020) [40]
targeted approach enhances detection efficiency by avoiding suggested that combining cybersecurity with behavioral anal-
blind traversal and focusing on paths with potential security ysis could help predict attacker behavior, improving proactive
risks. By combining symbolic execution with taint analysis, defenses.
the framework tracks the propagation of taint values to hazard The future of vulnerability management lies in continuous
points, enabling precise identification of vulnerabilities such collaboration, the integration of AI and machine learning,
as reentrancy and integer overflow. Experimental evaluations and the ongoing enrichment of datasets. Together, these ad-
on over 1000 smart contracts demonstrated a 35% reduction vancements will help organizations stay one step ahead of
in detection time compared to mainstream tools like Mythril, cybercriminals in an increasingly complex threat landscape.
IV. C ONCLUSION [12] J. Yin, M. Tang, J. Cao, and H. Wang, “Apply transfer
learning to cybersecurity: Predicting exploitability of vul-
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed
nerabilities by description,” Knowledge-Based Systems,
do eiusmod tempor incididunt ut labore
vol. 210, p. 106529, 2020.
[13] J. C. Costa, T. Roxo, J. B. Sequeiros, H. Proenca, and
R EFERENCES
P. R. Inacio, “Predicting cvss metric via description
[1] D. Toloudis, G. Spanos, and L. Angelis, “Associating interpretation,” IEEE Access, vol. 10, pp. 59 125–59 134,
the severity of vulnerabilities with their description,” in 2022.
Advanced Information Systems Engineering Workshops: [14] M. Jimenez, R. Rwemalika, M. Papadakis, F. Sarro,
CAiSE 2016 International Workshops, Ljubljana, Slove- Y. Le Traon, and M. Harman, “The importance of ac-
nia, June 13-17, 2016, Proceedings 28. Springer, 2016, counting for real-world labelling when predicting soft-
pp. 231–242. ware vulnerabilities,” in Proceedings of the 2019 27th
[2] J. Jacobs, S. Romanosky, O. Suciu, B. Edwards, and ACM Joint Meeting on European Software Engineering
A. Sarabi, “Enhancing vulnerability prioritization: Data- Conference and Symposium on the Foundations of Soft-
driven exploit predictions with community-driven in- ware Engineering, 2019, pp. 695–705.
sights,” in 2023 IEEE European Symposium on Security [15] H. Guo, S. Chen, Z. Xing, X. Li, Y. Bai, and J. Sun,
and Privacy Workshops (EuroS&PW). IEEE, 2023, pp. “Detecting and augmenting missing key aspects in vul-
194–206. nerability descriptions,” ACM Transactions on Software
[3] J. Jacobs, S. Romanosky, B. Edwards, I. Adjerid, and Engineering and Methodology (TOSEM), vol. 31, no. 3,
M. Roytman, “Exploit prediction scoring system (epss),” pp. 1–27, 2022.
Digital Threats: Research and Practice, vol. 2, no. 3, pp. [16] D. Gonzalez, H. Hastings, and M. Mirakhorli, “Auto-
1–17, 2021. mated characterization of software vulnerabilities,” in
[4] J. Jacobs, S. Romanosky, I. Adjerid, and W. Baker, “Im- 2019 IEEE International Conference on Software Main-
proving vulnerability remediation through better exploit tenance and Evolution (ICSME). IEEE, 2019, pp. 135–
prediction,” Journal of Cybersecurity, vol. 6, no. 1, p. 139.
tyaa015, 2020. [17] S. Zhang, X. Ou, and D. Caragea, “Predicting cyber risks
[5] A. A. Bhatt, N. and V. Yadavalli, “Exploitability through national vulnerability database,” Information Se-
prediction of software vulnerabilities,” Quality curity Journal: A Global Perspective, vol. 24, no. 4-6,
and Reliability Engineering International, vol. 37, pp. 194–206, 2015.
no. 2, pp. 648–663, 2020. [Online]. Available: [18] G. Nikitopoulos, K. Dritsa, P. Louridas, and D. Mitropou-
https://doi.org/10.1002/qre.2754 los, “Crossvul: a cross-language vulnerability dataset
[6] B. K. Verma and A. K. Yadav, “Software security with with commit data,” 08 2021, pp. 1565–1569.
natural language processing and vulnerability scoring [19] J. Fan, Y. Li, S. Wang, and T. N. Nguyen, “A
using machine learning approach,” Journal of Ambient c/c++ code vulnerability dataset with code changes and
Intelligence and Humanized Computing, vol. 15, no. 4, cve summaries,” 2020 IEEE/ACM 17th International
pp. 2641–2651, 2024. Conference on Mining Software Repositories
[7] H. Howland, “Cvss: Ubiquitous and broken,” (MSR), pp. 508–512, 2020. [Online]. Available:
vol. 4, no. 1, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:221784842
https://doi.org/10.1145/3491263 [20] M. Williams, S. Dey, R. Camacho Barranco, S. M. Naim,
[8] Y. Fang, Y. Liu, C. Huang, and L. Liu, “Fastembed: M. Hossain, and M. Akbar, “Analyzing evolving trends
Predicting vulnerability exploitation possibility based on of vulnerabilities in national vulnerability database,” 12
ensemble machine learning algorithm,” Plos one, vol. 15, 2018, pp. 3011–3020.
no. 2, p. e0228439, 2020. [21] K. Jetinai, N. Arch-int, and S. Arch-int, “Ontology-based
[9] N. Tavabi, P. Goyal, M. Almukaynizi, P. Shakarian, metadata integration approach for learning resource in-
and K. Lerman, “Darkembed: Exploit prediction with teroperability,” 12 2010, pp. 195 – 202.
neural language models,” in Proceedings of the AAAI [22] Y. Jiang, M. Jeusfeld, and J. Ding, “Evaluating
Conference on Artificial Intelligence, vol. 32, no. 1, 2018. the data inconsistency of open-source vulnerability
[10] Y. Movahedi, M. Cukier, and I. Gashi, “Predicting the repositories.” New York, NY, USA: Association
discovery pattern of publically known exploited vulner- for Computing Machinery, 2021. [Online]. Available:
abilities,” IEEE Transactions on Dependable and Secure https://doi.org/10.1145/3465481.3470093
Computing, vol. 19, no. 2, pp. 1181–1193, 2020. [23] H. T. Reda, A. Anwar, A. Mahmood, and N. Chilamkurti,
[11] Y. Wang, B. Yu, H. Yu, L. Xiao, H. Ji, and Y. Zhao, “Au- “Data-driven approach for state prediction and detection
tomotive cybersecurity vulnerability assessment using of false data injection attacks in smart grid,” Journal of
the common vulnerability scoring system and bayesian Modern Power Systems and Clean Energy, vol. 11, no. 2,
network model,” IEEE Systems Journal, vol. 17, no. 2, pp. 455–467, 2022.
pp. 2880–2891, 2023. [24] B. Zhao, “Web scraping,” in Encyclopedia of big data.
Springer, 2022, pp. 951–953. [38] O. Suciu, C. Nelson, Z. Lyu, T. Bao, and T. Dumitras, ,
[25] S. Islam, A. Abba, U. Ismail, H. Mouratidis, and S. Pa- “Expected exploitability: Predicting the development of
pastergiou, “Vulnerability prediction for secure health- functional vulnerability exploits,” in 31st USENIX Secu-
care supply chain service delivery,” Integrated Computer- rity Symposium (USENIX Security 22), 2022, pp. 377–
Aided Engineering, vol. 29, no. 4, pp. 389–409, 2022. 394.
[26] M. Beyrouti, A. Lounis, B. Lussier, A. Bouadallah, [39] G. Chawla, N. Sharma, and N. K. Rawal, “Ivsv: An
and A. E. Samhat, “Vulnerability and threat assessment improved cvss base score mechanism with vulnerability
framework for internet of things systems,” in 2023 6th type,” International Journal of Engineering and Ad-
Conference on Cloud and Internet of Things (CIoT). vanced Technology, vol. 8, no. 6, pp. 4946–4950, 2019.
IEEE, 2023, pp. 62–69. [40] S.-Y. Huang and Y. Wu, “Poster: dynamic software vul-
[27] M. A. M. Ariffin, M. F. Ibrahim, and Z. Kasiran, “Api nerabilities threat prediction through social media con-
vulnerabilities in cloud computing platform: attack and textual analysis,” in Proceedings of the 15th ACM Asia
detection,” International Journal of Engineering Trends Conference on Computer and Communications Security,
and Technology, vol. 1, pp. 8–14, 2020. 2020, pp. 892–894.
[28] Y.-j. Zhang, P. Liao, K.-z. Huang, and Y.-l. Liu, “An
automatic approach for scoring vulnerabilities in risk as-
sessment,” in 2nd International Conference on Electrical
and Electronic Engineering (EEE 2019). Atlantis Press,
2019, pp. 256–261.
[29] J. Zhang, Z. Liu, X. Hu, X. Xia, and S. Li, “Vulnerability
detection by learning from syntax-based execution paths
of code,” IEEE Transactions on Software Engineering,
vol. 49, no. 8, pp. 4196–4212, 2023.
[30] N. Moscovich, R. Bitton, Y. Mallah, M. Inokuchi,
T. Yagyu, M. Kalech, Y. Elovici, and A. Shabtai, “Au-
tosploit: A fully automated framework for evaluating the
exploitability of security vulnerabilities,” arXiv preprint
arXiv:2007.00059, 2020.
[31] Z. Li, D. Zou, S. Xu, X. Ou, H. Jin, S. Wang,
Z. Deng, and Y. Zhong, “Vuldeepecker: A deep learning-
based system for vulnerability detection,” arXiv preprint
arXiv:1801.01681, 2018.
[32] R. Li, C. Feng, X. Zhang, and C. Tang, “A lightweight
assisted vulnerability discovery method using deep neural
networks,” IEEE Access, vol. 7, pp. 80 079–80 092, 2019.
[33] P. Xiao, Q. Xiao, X. Zhang, Y. Wu, and F. Yang, “Vulner-
ability detection based on enhanced graph representation
learning,” IEEE Transactions on Information Forensics
and Security, 2024.
[34] Z. Chu, Y. Wan, Q. Li, Y. Wu, H. Zhang, Y. Sui, G. Xu,
and H. Jin, “Graph neural networks for vulnerability
detection: A counterfactual explanation,” in Proceedings
of the 33rd ACM SIGSOFT International Symposium on
Software Testing and Analysis, 2024, pp. 389–401.
[35] S. Cao, X. Sun, L. Bo, Y. Wei, and B. Li, “Bgnn4vd:
Constructing bidirectional graph neural-network for vul-
nerability detection,” Information and Software Technol-
ogy, vol. 136, p. 106576, 2021.
[36] M. Fu, L. Wu, Z. Hong, F. Zhu, H. Sun, and W. Feng,
“A critical-path-coverage-based vulnerability detection
method for smart contracts,” IEEE Access, vol. 7, pp.
147 327–147 344, 2019.
[37] J. Komárková, L. Sadlek, and M. Laštovička, “Commu-
nity based platform for vulnerability categorization,” in
NOMS 2018-2018 IEEE/IFIP Network Operations and
Management Symposium. IEEE, 2018, pp. 1–2.

You might also like