Reflections on the NASA MDP data sets

Gray, D. and Bowes, D. and Davey, N. and Sun, Y. and Christianson, B. (2012) Reflections on the NASA MDP data sets. IET Software, 6 (6). 549 - 558. ISSN 1751-8806

Full text not available from this repository.

Abstract

Background: The NASA metrics data program (MDP) data sets have been heavily used in software defect prediction research. Aim: To highlight the data quality issues present in these data sets, and the problems that can arise when they are used in a binary classification context. Method: A thorough exploration of all 13 original NASA data sets, followed by various experiments demonstrating the potential impact of duplicate data points when data mining. Conclusions: Firstly researchers need to analyse the data that forms the basis of their findings in the context of how it will be used. Secondly, the bulk of defect prediction experiments based on the NASA MDP data sets may have led to erroneous findings. This is mainly because of repeated/duplicate data points potentially causing substantial amounts of training and testing data to be identical.

Item Type:
Journal Article
Journal or Publication Title:
IET Software
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/1700/1704
Subjects:
ID Code:
132038
Deposited By:
Deposited On:
18 Mar 2019 11:35
Refereed?:
Yes
Published?:
Published
Last Modified:
27 Nov 2020 06:10