VulSim: Leveraging similarity of multi-dimensional neighbor embeddings for vulnerability detection

August 14, 2024

Conference Paper

Author:

Samiha Shimmi

…

R&D Group:

Secure Resilient Systems and Technology

VulSim: Leveraging similarity of multi-dimensional neighbor embeddings for vulnerability detection

Summary

Despite decades of research in vulnerability detection, vulnerabilities in source code remain a growing problem, and more effective techniques are needed in this domain. To enhance software vulnerability detection, in this paper, we first show that various vulnerability classes in the C programming language share common characteristics, encompassing semantic, contextual, and syntactic properties. We then leverage this knowledge to enhance the learning process of Deep Learning (DL) models for vulnerability detection when only sparse data is available. To achieve this, we extract multiple dimensions of information from the available, albeit limited, data. We then consolidate this information into a unified space, allowing for the identification of similarities among vulnerabilities through nearest-neighbor embeddings. The combination of these steps allows us to improve the effectiveness and efficiency of vulnerability detection using DL models. Evaluation results demonstrate that our approach surpasses existing State-of-the-art (SOTA) models and exhibits strong performance on unseen data, thereby enhancing generalizability.