Debeyan, Fahad and Hall, Tracy and Bowes, David (2022) Improving the Performance of Code Vulnerability Prediction using Abstract Syntax Tree Information. In: PROMISE 2022: Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering :. ACM, SGP, pp. 2-11. ISBN 9781450398602
Full text not available from this repository.Abstract
The recent emergence of the Log4jshell vulnerability demonstrates the importance of detecting code vulnerabilities in software systems. Software Vulnerability Prediction Models (VPMs) are a promising tool for vulnerability detection. Recent studies have focused on improving the performance of models to predict whether a piece of code is vulnerable or not (binary classification). However, such approaches are limited because they do not provide developers with information on the type of vulnerability that needs to be patched. We present our multiclass classification approach to improve the performance of vulnerability prediction models. Our approach uses abstract syntax tree n-grams to identify code clusters related to specific vulnerabilities. We evaluated our approach using real-world Java software vulnerability data. We report increased predictive performance compared to a variety of other models, for example, F-measure increases from 55% to 75% and MCC increases from 48% to 74%. Our results suggest that clustering software vulnerabilities using AST n-gram information is a promising approach to improve vulnerability prediction and enable specific information about the vulnerability type to be provided.