The state of machine learning methodology in software fault prediction

Hall, T. and Bowes, D. (2012) The state of machine learning methodology in software fault prediction. In: Proceedings 2012 11th International Conference on Machine Learning and Applications, ICMLA 2012. IEEE, pp. 308-313. ISBN 9781467346511

Full text not available from this repository.

Abstract

The aim of this paper is to investigate the quality of methodology in software fault prediction studies using machine learning. Over two hundred studies of fault prediction have been published in the last 10 years. There is evidence to suggest that the quality of methodology used in some of these studies does not allow us to have confidence in the predictions reported by them. We evaluate the machine learning methodology used in 21 fault prediction studies. All of these studies use NASA data sets. We score each study from 1 to 10 in terms of the quality of their machine learning methodology (e.g. whether or not studies report randomising their cross validation folds). Only 10 out of the 21 studies scored 5 or more out of 10. Furthermore 1 study scored only 1 out of 10. When we plot these scores over time there is no evidence that the quality of machine learning methodology is better in recent studies. Our results suggest that there remains much to be done by both researchers and reviewers to improve the quality of machine learning methodology used in software fault prediction. We conclude that the results reported in some studies need to be treated with caution.

Item Type:
Contribution in Book/Report/Proceedings
ID Code:
132045
Deposited By:
Deposited On:
18 Mar 2019 11:45
Refereed?:
Yes
Published?:
Published
Last Modified:
26 May 2020 14:00