Zhao, J. and Su, J. and Liu, J. and Wang, M. and Liu, Y. (2026) APNet : Accurate Prompting Network with Modality Guidance and Structural Awareness for RGB-D Semantic Segmentation. IEEE Signal Processing Letters, 33. pp. 2131-2135. ISSN 1070-9908
SPL.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (1MB)
Abstract
Parameter-efficient fine-tuning (PEFT) is promising for RGB-D semantic segmentation, as lightweight prompters enable frozen pre-trained RGB backbones to leverage massive RGB pretraining knowledge without full fine-tuning on limited paired RGB-D data. However, existing PEFT methods have two critical limitations: static modal fusion ignores the dynamic reliability of RGB and depth across scenes, leading to suboptimal performance in complex environments; conventional prompts lack structural awareness, causing the loss of edge and texture details essential for dense prediction. To solve these problems, we propose the Accurate Prompting Network (APNet) for precise prompt injection in frozen backbones with two core modules. A Modality Effectiveness Guider (MEG) conducts input-level modal reliability assessment and dynamically generates scene-adaptive modality weights by capturing scene characteristics (e.g., illumination, texture richness). A Structural Awareness Prompter (SAP) injects directional structural priors into prompts via multi-directional gating convolution, endowing prompts with explicit edge and texture information to match semantic segmentation demands. MEG and SAP collaboratively form a precise prompting mechanism that realizes dynamic modal contribution allocation and structural detail preservation, facilitating efficient and accurate cross-modal knowledge transfer to the frozen backbone. Extensive experiments on NYUDv2 and SUN RGB-D show that APNet achieves state-of-the-art mIoU of 59.6% and 52.6% with only 6.2M trainable parameters, realizing a superior trade-off between segmentation accuracy and parameter efficiency.