can-sleuth : Sleuthing out the capabilities, limitations, and performance impacts of automotive intrusion detection datasets

Kidmose, Brooke and Kidmose, Andreas and Meng, Weizhi (2025) can-sleuth : Sleuthing out the capabilities, limitations, and performance impacts of automotive intrusion detection datasets. International Journal of Information Security, 24 (5): 193. ISSN 1615-5262

Text (can-sleuth_ Sleuthing Out the Capabilities, Limitations, and Performance Impacts of Automotive Intrusion Detection Datasets (REVISION 1) (MARKED))
can-sleuth_Sleuthing_Out_the_Capabilities_Limitations_and_Performance_Impacts_of_Automotive_Intrusion_Detection_Datasets_REVISION_1_MARKED_.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (596kB)

Abstract

Modern automobiles are made up of networks of computers, one of which is the inherently insecure Controller Area Network (CAN). Over the years, automotive security has been enhanced by secure gateways and new protocols such as automotive Ethernet, but the CAN protocol has remained the weak link. Automotive researchers have been exploring intrusion detection systems (IDSs) as a potential solution to the problem of CAN bus insecurity. To build and evaluate an IDS, however, researchers need adequate training and testing data. In this paper, we analyze and compare the following automotive intrusion detection datasets: (1) HCRL Car Hacking, (2) HCRL Survival Analysis, (3) can-train-and-test-v1.5, (4) UNIMORE Bus-Off, (5) UNIMORE DAGA, and (6) UNIMORE Ventus. The two HCRL datasets are well-established in the literature, whereas can-train-and-test-v1.5 is a promising new dataset—and the three UNIMORE datasets lie somewhere in between. In our evaluation, we pit sixteen machine learning IDSs against each dataset and analyze the results. In addition, we conduct a feature evaluation of can-train-and-test-v1.5, and we investigate the impact of train-test interdependence in the three UNIMORE datasets. We find that, when pitted against the five comparison datasets, can-train-and-test-v1.5 paints a clearer picture of an IDS’s true capabilities; in fact, can-train-and-test-v1.5’s testing scenarios can reveal when an IDS has overfitted to a particular vehicle type—unlike the UNIMORE datasets. Furthermore, unlike the HCRL datasets, can-train-and-test-v1.5 provides more than enough data to train a complex machine learning model—an order of magnitude more—reducing the risk of underfitting. Moreover, can-train-and-test-v1.5 maintains ample differentiation power; the standard deviation of the models’ F1-scores was 0.2392 (excluding suppress attacks), whereas the standard deviations for the remaining datasets—HCRL Car Hacking, HCRL Survival Analysis, UNIMORE Bus-Off, UNIMORE DAGA, and UNIMORE Ventus—were 0.2254, 0.2333, 0.1824, 0.2121, and 0.2100 (excluding suppress attacks), respectively.

Item Type:

Journal Article

Journal or Publication Title:

International Journal of Information Security

Uncontrolled Keywords:

Research Output Funding/no_not_funded

Subjects:

?? automotivemachine learningintrusion detection systemcontroller area networkin-vehicle networkno - not fundedsoftwarecomputer networks and communicationsinformation systemssafety, risk, reliability and quality ??

Departments:

Faculty of Science and Technology > School of Computing & Communications

ID Code:

232827

Deposited By:

ep_importer_pure

Deposited On:

06 Oct 2025 10:05

Refereed?:

Yes

Published?:

Published

Last Modified:

11 Dec 2025 09:07

URI:

https://eprints.lancs.ac.uk/id/eprint/232827