SSLDefender : Backdoor Defense in Self-Supervised Learning via Distillation-guided Unlearning

Zhang, Jiale and Zhu, Wanquan and Wang, Kai and Zhu, Chengcheng and Sun, Xiaobing and Meng, Weizhi and Luo, Xiapu (2025) SSLDefender : Backdoor Defense in Self-Supervised Learning via Distillation-guided Unlearning. IEEE Transactions on Information Forensics and Security, 20. pp. 13159-13172. ISSN 1556-6013

[thumbnail of SSDefender]
Text (SSDefender)
SSDefender.pdf - Accepted Version
Available under License Creative Commons Attribution.

Download (684kB)

Abstract

Self-supervised learning utilizes unlabelled data to train encoders, acquiring high-quality representations of input data, significantly advancing the field of computer vision. However, recent studies have demonstrated that self-supervised learning suffers from numerous adversarial attacks. Among them, backdoor attack is one of the focal issues, where downstream classifiers inherit the backdoor behavior of the pre-trained encoder. Existing defense methods against backdoor attacks primarily focus on supervised learning, which heavily relies on labeled data and cannot be directly migrated to self-supervised scenarios. Furthermore, defense methods for self-supervised backdoor aims to separate poisoned samples on assumed small-scale datasets and retraining to obtain a clean encoder. However, these approaches are useless against encoders that have been implanted with a backdoor. To address these issues, we propose SSLDefender, a novel image-based backdoor mitigation method specially designed for self-supervised learning, which can remove backdoor attributes directly from the backdoor encoder. Specifically, we employ a trigger recovery method based on mutual information maximization to efficiently obtain trigger that resembles the target backdoor’s influence. Additionally, we design a distillation-guided unlearning strategy to purify backdoor features steadily and ensure the retention of clean knowledge to prevent over-forgetting. Extensive experimental evaluations on six benchmark datasets demonstrate that SSLDefender can successfully reduce the attack success rate of Badencoder to around 2% while maintaining high model accuracy on the main task. Its performance surpasses state-of-the-art methods.

Item Type:
Journal Article
Journal or Publication Title:
IEEE Transactions on Information Forensics and Security
Uncontrolled Keywords:
Research Output Funding/no_not_funded
Subjects:
?? no - not fundedcomputer networks and communicationssafety, risk, reliability and quality ??
ID Code:
234400
Deposited By:
Deposited On:
18 Dec 2025 09:00
Refereed?:
Yes
Published?:
Published
Last Modified:
18 Dec 2025 09:00