Pose-Guided Multi-Cue Explicit Query Construction for Disambiguating Human-Object Interactions

Zou, Minghao and Liu, Shangkun and Zeng, Qingtian and Zhang, Xue and Yuan, Guiyuan and Hao, Xiaoshuai and Liu, Jun and Zhou, Wei (2026) Pose-Guided Multi-Cue Explicit Query Construction for Disambiguating Human-Object Interactions. IEEE Transactions on Circuits and Systems for Video Technology. ISSN 1051-8215

Text (paper)
paper.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (1MB)

Abstract

Human-Object Interaction (HOI) detection remains challenging due to the semantic ambiguity of interaction categories and the limited discriminability of their feature representations. Existing approaches often improve recognition by employing sophisticated models or auxiliary textual annotations. While effective in certain gains, these solutions incur additional computational or annotation costs and struggle to capture intrinsic interaction regularities. To address these issues, we propose Pose-Guided Multi-Cue Explicit Query Construction (PM-EQC), a unified Transformer-based framework that builds upon collaborative modeling of appearance, spatial, and pose cues for discriminative interaction reasoning. At its core, the Collaborative Multi-Cue Query Constructor (CM-CQC) jointly models dependencies among visual cues to generate explicit query embeddings. CM-CQC further incorporates a hierarchical pose contextualization mechanism: global body configurations adaptively guide attention to local critical joints, yielding fine-grained pose embeddings and more precise interaction disambiguation. Owing to its modular design, PM-EQC integrates seamlessly with diverse backbones and benefits from their advances. Extensive experiments on PhysLab, HICO-DET, and V-COCO datasets demonstrate that PM-EQC achieves state-of-the-art performance, and the code is publicly available at https://github.com/ZMHSDUST/ PM-EQC.

Item Type:

Journal Article

Journal or Publication Title:

IEEE Transactions on Circuits and Systems for Video Technology

Uncontrolled Keywords:

/dk/atira/pure/subjectarea/asjc/2200/2214

Subjects:

?? media technologyelectrical and electronic engineering ??

Departments:

Faculty of Science and Technology > School of Computing & Communications

ID Code:

235991

Deposited By:

ep_importer_pure

Deposited On:

11 Mar 2026 15:40

Refereed?:

Yes

Published?:

Published

Last Modified:

05 May 2026 23:33

URI:

https://eprints.lancs.ac.uk/id/eprint/235991