The Impact of Hard and Easy Negative Training Data on Vulnerability Prediction Performance

Debeyan, Fahad and Madeyski, Lech and Hall, Tracy and Bowes, David (2024) The Impact of Hard and Easy Negative Training Data on Vulnerability Prediction Performance. Journal of Systems and Software, 211: 112003. ISSN 0164-1212

Full text not available from this repository.

Abstract

Vulnerability prediction models have been shown to perform poorly in the real world. We examine how the composition of negative training data influences vulnerability prediction model performance. Inspired by other disciplines (e.g. image processing), we focus on whether distinguishing between negative training data that is ‘easy’ to recognise from positive data (very different from positive data) and negative training data that is ‘hard’ to recognise from positive data (very similar to positive data) impacts on vulnerability prediction performance. We use a range of popular machine learning algorithms, including deep learning, to build models based on vulnerability patch data curated by Reis and Abreu, as well as the MSR dataset. Our results suggest that models trained on higher ratios of easy negatives perform better, plateauing at 15 easy negatives per positive instance. We also report that different ML algorithms work better based on the negative sample used. Overall, we found that the negative sampling approach used significantly impacts model performance, potentially leading to overly optimistic results. The ratio of ‘easy’ versus ‘hard’ negative training data should be explicitly considered when building vulnerability prediction models for the real world.

Item Type:
Journal Article
Journal or Publication Title:
Journal of Systems and Software
Uncontrolled Keywords:
Research Output Funding/yes_externally_funded
Subjects:
?? software vulnerability predictionvulnerability datasetsmachine learningyes - externally fundednohardware and architecturesoftwareinformation systems ??
ID Code:
215093
Deposited By:
Deposited On:
22 Feb 2024 13:50
Refereed?:
Yes
Published?:
Published
Last Modified:
16 Jul 2024 12:14