Unleashing the Power of Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation

Peng, Duo and Zhang, Zhengbo and Hu, Ping and Ke, Qiuhong and Soh, De Wen and Bennamoun, Mohammed and Liu, Jun (2026) Unleashing the Power of Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 48 (5). pp. 5195-5211. ISSN 0162-8828

[thumbnail of output]
Text (output)
output.pdf - Accepted Version
Available under License Creative Commons Attribution.

Download (11MB)

Abstract

Category-Agnostic Pose Estimation (CAPE) aims to detect keypoints of unseen object categories in a few-shot setting, where the scarcity of labeled data poses significant challenges to generalization. In this work, we propose Prompt Pose Matching (PPM), a novel framework that unleashes the power of off-the-shelf text-to-image diffusion models for CAPE. PPM learns pseudo prompts from few-shot examples via the text-to-image diffusion model. These learned pseudo prompts capture semantic information of keypoints, which can then be used to locate the same type of keypoints from images. To provide prompts with representative initialization, we introduce a category-agnostic pre-training strategy to capture the foreground prior shared across categories and keypoints. To support the reliable prompt pre-training, we propose a Foreground-Aware Region Aggregation (FARA) module to provide robust and consistent supervision signal. Based on the foreground prior, a Foreground-Guided Attention Refinement (FGAR) module is further proposed to reinforce cross-attention responses for accurate keypoint localization. For efficiency, a Prompt Ensemble Inference (PEI) scheme enables joint keypoint prediction. Unlike previous methods that highly rely on base-category annotated data, our PPM framework can operate in a base-category-free setting while retaining strong performance. Code will be available at: https://github.com/DuoPeng-CVer/Prompt-Pose-Matching.

Item Type:
Journal Article
Journal or Publication Title:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/1700/1702
Subjects:
?? artificial intelligencecomputational theory and mathematicssoftwareapplied mathematicscomputer vision and pattern recognition ??
ID Code:
237512
Deposited By:
Deposited On:
20 May 2026 12:30
Refereed?:
Yes
Published?:
Published
Last Modified:
20 May 2026 21:35