The Jinx on the NASA software defect data sets

Petrić, Jean and Bowes, David and Hall, Tracy and Christianson, Bruce and Baddoo, Nathan (2016) The Jinx on the NASA software defect data sets. In: EASE '16 Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering :. Association for Computing Machinery, Inc, IRL. ISBN 9781450336918

[thumbnail of nasa_paper]
Preview
PDF (nasa_paper)
nasa_paper.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial.

Download (187kB)

Abstract

Background: The NASA datasets have previously been used extensively in studies of software defects. In 2013 Shepperd et al. presented an essential set of rules for removing erroneous data from the NASA datasets making this data more reliable to use. Objective: We have now found additional rules necessary for removing problematic data which were not identified by Shepperd et al. Results: In this paper, we demonstrate the level of erroneous data still present even after cleaning using Shepperd et al.'s rules and apply our new rules to remove this erroneous data. Conclusion: Even after systematic data cleaning of the NASA MDP datasets, we found new erroneous data. Data quality should always be explicitly considered by researchers before use.

Item Type:
Contribution in Book/Report/Proceedings
Additional Information:
© 2016 ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in EASE '16 Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering http://dx.doi.org/10.1145/2915970.2916007
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/1700/1709
Subjects:
?? data qualitymachine learningsoftware defect predictionhuman-computer interactioncomputer networks and communicationscomputer vision and pattern recognitionsoftware ??
ID Code:
127413
Deposited By:
Deposited On:
11 Sep 2018 14:04
Refereed?:
Yes
Published?:
Published
Last Modified:
20 Oct 2024 23:23