Li, Yi and Angelov, Plamen and Suri, Neeraj (2024) Adversarial Attack Detection via Fuzzy Predictions. IEEE Transactions on Fuzzy Systems, 32 (12). pp. 7015-7024. ISSN 1063-6706
TFS_final.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (3MB)
Abstract
Image processing using neural networks act as a tool to speed up predictions for users, specifically on large-scale image samples. To guarantee the clean data for training accuracy, various deep learning-based adversarial attack detection techniques have been proposed. These crisp set-based detection methods directly determine whether an image is clean or attacked, while, calculating the loss is non-differentiable and hinders training through normal back-propagation. Motivated by the recent success in fuzzy systems, in this work, we present an attack detection method to further improve detection performance, which is suitable for any pre-trained neural network classifier. Subsequently, the fuzzification network is used to obtain feature maps to produce fuzzy sets of difference degree between clean and attacked images. The fuzzy rules control the intelligence that determines the detection boundaries. Different from previous fuzzy systems, we propose a fuzzy mean-intelligence mechanism with new support and confidence functions to improve fuzzy rule's quality. In the defuzzification layer, the fuzzy prediction from the intelligence is mapped back into the crisp model predictions for images. The loss between the prediction and label controls the rules to train the fuzzy detector. We show that the fuzzy rule-based network learns rich feature information than binary outputs and offer to obtain an overall performance gain. Experiment results show that compared to various benchmark fuzzy systems and adversarial attack detection methods, our fuzzy detector achieves better detection performance over a wide range of images.