Llms are good action recognizers

Qu, Haoxuan and Cai, Yujun and Liu, Jun (2024) Llms are good action recognizers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 :. IEEE. ISBN 9798350353013

Text (Qu_LLMs_are_Good_Action_Recognizers_CVPR_2024_paper)
Qu_LLMs_are_Good_Action_Recognizers_CVPR_2024_paper.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (1MB)

Abstract

Skeleton-based action recognition has attracted lots of research attention. Recently, to build an accurate skeleton-based action recognizer, a variety of works have been pro-posed. Among them, some works use large model architectures as backbones of their recognizers to boost the skeleton data representation capability, while some other works pre-train their recognizers on external data to enrich the knowl-edge. In this work, we observe that large language models which have been extensively used in various natural language processing tasks generally hold both large model ar-chitectures and rich implicit knowledge. Motivated by this, we propose a novel LLM-AR framework, in which we in-vestigate treating the Large Language Model as an Action Recognizer. In our framework, we propose a linguistic pro-jection process to project each input action signal (i.e., each skeleton sequence) into its “sentence format” (i.e., an “action sentence”). Moreover, we also incorporate our frame-work with several designs to further facilitate this linguistic projection process. Extensive experiments demonstrate the efficacy of our proposed framework.

Item Type:

Contribution in Book/Report/Proceedings

Departments:

Faculty of Science and Technology > School of Computing & Communications

ID Code:

227548

Deposited By:

ep_importer_pure

Deposited On:

28 Nov 2025 13:55

Refereed?:

Yes

Published?:

Published

Last Modified:

13 Dec 2025 13:43

URI:

https://eprints.lancs.ac.uk/id/eprint/227548

Altmetric