Kidmose, Brooke and Kidmose, Andreas and Meng, Weizhi (2025) can-sleuth : Sleuthing out the capabilities, limitations, and performance impacts of automotive intrusion detection datasets. International Journal of Information Security, 24 (5): 193. ISSN 1615-5262
can-sleuth_Sleuthing_Out_the_Capabilities_Limitations_and_Performance_Impacts_of_Automotive_Intrusion_Detection_Datasets_REVISION_1_MARKED_.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (596kB)
Abstract
Modern automobiles are made up of networks of computers, one of which is the inherently insecure Controller Area Network (CAN). Over the years, automotive security has been enhanced by secure gateways and new protocols such as automotive Ethernet, but the CAN protocol has remained the weak link. Automotive researchers have been exploring intrusion detection systems (IDSs) as a potential solution to the problem of CAN bus insecurity. To build and evaluate an IDS, however, researchers need adequate training and testing data. In this paper, we analyze and compare the following automotive intrusion detection datasets: (1) HCRL Car Hacking, (2) HCRL Survival Analysis, (3) can-train-and-test-v1.5, (4) UNIMORE Bus-Off, (5) UNIMORE DAGA, and (6) UNIMORE Ventus. The two HCRL datasets are well-established in the literature, whereas can-train-and-test-v1.5 is a promising new dataset—and the three UNIMORE datasets lie somewhere in between. In our evaluation, we pit sixteen machine learning IDSs against each dataset and analyze the results. In addition, we conduct a feature evaluation of can-train-and-test-v1.5, and we investigate the impact of train-test interdependence in the three UNIMORE datasets. We find that, when pitted against the five comparison datasets, can-train-and-test-v1.5 paints a clearer picture of an IDS’s true capabilities; in fact, can-train-and-test-v1.5’s testing scenarios can reveal when an IDS has overfitted to a particular vehicle type—unlike the UNIMORE datasets. Furthermore, unlike the HCRL datasets, can-train-and-test-v1.5 provides more than enough data to train a complex machine learning model—an order of magnitude more—reducing the risk of underfitting. Moreover, can-train-and-test-v1.5 maintains ample differentiation power; the standard deviation of the models’ F1-scores was 0.2392 (excluding suppress attacks), whereas the standard deviations for the remaining datasets—HCRL Car Hacking, HCRL Survival Analysis, UNIMORE Bus-Off, UNIMORE DAGA, and UNIMORE Ventus—were 0.2254, 0.2333, 0.1824, 0.2121, and 0.2100 (excluding suppress attacks), respectively.