Cross-Modal Contrastive Pre-training for Few-Shot Skeleton Action Recognition

Lu, Mingqi and Yang, Siyuan and Lu, Xiaobo and Liu, Jun (2024) Cross-Modal Contrastive Pre-training for Few-Shot Skeleton Action Recognition. IEEE Transactions on Circuits and Systems for Video Technology. ISSN 1051-8215

[thumbnail of Cross-Modal Contrastive]
Text (Cross-Modal Contrastive)
Cross-Modal_Contrastive.pdf - Accepted Version
Available under License Creative Commons Attribution.

Download (3MB)

Abstract

This paper proposes a novel approach for few-shot skeleton action recognition that comprises of two stages: cross-modal pre-training of a skeleton encoder, followed by fine-tuning of a cosine classifier on the support set. The pre-training and fine-tuning approach has been demonstrated to be more effective for handling few-shot tasks compared to utilizing more intricate meta-learning methods. However, its success relies on the availability of a large-scale training dataset, which yet is difficult to obtain. To address this challenge, we introduce a cross-modal pre-training framework based on Bootstrap Your Own Latent (BYOL), which considers skeleton sequences and their corresponding videos as augmented views of the same action in different modalities. By utilizing a simple regression loss, the framework is able to transfer robust and high-quality vision-language representations to the skeleton encoder. This allows the skeleton encoder to gain a comprehensive understanding of action sequences and benefit from the prior knowledge obtained from a vision-language pre-trained model. The representation transfer enhances the feature extraction capability of the skeleton encoder, compensating for the lack of large-scale skeleton datasets. Extensive experiments on the NTU RGB+D, NTU RGB+D 120, PKU-MMD, NW-UCLA, and MSR Action Pairs datasets demonstrate that our proposed approach achieves state-of-the-art performances for few-shot skeleton action recognition.

Item Type:
Journal Article
Journal or Publication Title:
IEEE Transactions on Circuits and Systems for Video Technology
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/2200/2214
Subjects:
?? media technologyelectrical and electronic engineering ??
ID Code:
224207
Deposited By:
Deposited On:
09 Oct 2024 09:45
Refereed?:
Yes
Published?:
Published
Last Modified:
19 Nov 2024 02:08