Large Language Model for Vulnerability Detection: Emerging Results and Future Directions论文分享-CSDN博客

《Large Language Model for Vulnerability Detection: Emerging Results and Future Directions》

原文链接：https://dl.acm.org/doi/abs/10.1145/3639476.3639762

开放源代码：https://github.com/soarsmu/ChatGPT-VulDetection

本篇论文利用chatgpt3.5和chatgpt4实现了一种物联网漏洞检测的方法，尽管LLM的生成结果是不具备稳定性的，但是在取均值后依旧给出了较好的结果，其中chatgpt3.5和现有检测方法有一竞之力，而chatgpt4优于现有方法。

这篇论文主要是在LLM的提示下功夫，不过这种思路对于解决物联网漏洞检测问题上还是很有趣的，并且该方法对于漏洞的检测应该还是基于静态的，其主要贡献为：

•为LLM进行了不同提示的实验，包括任务和角色描述、项目信息以及来自常见弱点枚举（CWE）和训练集的示例。认为LLM是漏洞检测的有前景的模型。

•确定了在漏洞检测中利用LLM的几个有前景的未来方向，鼓励社区深入研究这些可能性。

（• We conduct experiments with diverse prompts for LLMs, encompassing task and role descriptions, project information, and examples from Common Weakness Enumeration (CWE) and the training set. We recognize LLMs as promising models for vulnerability detection.

• We pinpoint several promising future directions for leveraging LLMs in vulnerability detection, and we encourage the community to delve into these possibilities.）

本文本质上是做一个二分类问题，对于提示的核心设计如下表：

接着在做实验的时候，将以上几种提示方式分别结合，给出了一个相对较好的结果：

当然实验由于chatgpt是一个闭源模型，所以难免存在数据重叠以及较为高昂的成本问题。而且由于现实漏洞长尾分布等问题，导致一个平衡的或者好的数据集是不好获得的。

对于未来本文的研究重心将会是：1）探索LLM（特别是ChatGPT）是否可以有效地检测这些不常见的漏洞，2）提出一种解决方案（例如，通过数据增强为不太常见的类型生成更多样本），以解决漏洞数据长尾分布的影响。

（In the future, we plan to 1) explore whether LLMs, speciffcally ChatGPT, can effectively detect these infrequent vulnerabilities or not and 2) propose a solution (e.g., generating more samples for the less common types via data augmentation) to address the impact of the long-tailed distribution of vulnerability data.）

而且本文还提到了应用LLM技术的一种挑战：与开发人员的信任和协同。基于人工智能的漏洞检测解决方案，包括这项工作，与开发人员的互动有限。在实际使用过程中，他们可能会在与开发人员建立信任和协同作用方面面临挑战。为了克服这一点，未来的工作应该研究更有效的策略，以促进开发人员和人工智能解决方案之间的信任和协作。通过培养信任和协同效应，人工智能驱动的解决方案可能会演变为智能同事，以更好地帮助开发人员。

（Trust and Synergy with Developers. AI-powered solutions for vulnerability detection, including this work, have limited interaction with developers. They may face challenges in establishing trust and synergy with developers during practical use. To overcome this, future works should investigate more effective strategies to foster trust and collaboration between developers and AI-powered solutions. By nurturing trust and synergy, AI-powered solutions may evolve into smart workmates to better assist developers.）

总结一下本文，主要还是针对LLM的一个利用，只是应用场景具体到了漏洞检测方向，其提取特征从而匹配漏洞的方式依旧属于静态检测范畴。但是本文中提到的对于之前基于机器学习和深度学习技术的漏洞检测方法的总结很具有启发性，即（1）主要依赖于中等规模的预训练模型，如CodeBERT，（2）从头开始训练较小的神经网络（如图神经网络）。