Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation

Peng, Duo and Zhang, Zhengbo and Hu, Ping and Ke, Qiuhong and Yau, David K. Y. and Liu, Jun (2024) Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation. In: Computer Vision – ECCV 2024 :. Lecture Notes in Computer Science . Springer, Cham, pp. 342-360. ISBN 9783031726231

Full text not available from this repository.

Abstract

Category-Agnostic Pose Estimation (CAPE) aims to detect keypoints of an arbitrary unseen category in images, based on several provided examples of that category. This is a challenging task, as the limited data of unseen categories makes it difficult for models to generalize effectively. To address this challenge, previous methods typically train models on a set of predefined base categories with extensive annotations. In this work, we propose to harness rich knowledge in the off-the-shelf text-to-image diffusion model to effectively address CAPE, without training on carefully prepared base categories. To this end, we propose a Prompt Pose Matching (PPM) framework, which learns pseudo prompts corresponding to the keypoints in the provided few-shot examples via the text-to-image diffusion model. These learned pseudo prompts capture semantic information of keypoints, which can then be used to locate the same type of keypoints from images. We also design a Category-shared Prompt Training (CPT) scheme, to further boost our PPM’s performance. Extensive experiments demonstrate the efficacy of our approach.

Item Type:
Contribution in Book/Report/Proceedings
Uncontrolled Keywords:
Research Output Funding/yes_externally_funded
Subjects:
?? yes - externally funded ??
ID Code:
235194
Deposited By:
Deposited On:
29 Jan 2026 11:25
Refereed?:
Yes
Published?:
Published
Last Modified:
29 Jan 2026 11:25