Better Integrating Vision and Semantics for Improving Few-shot Classification

Li, Zhuoling and Wang, Yong (2023) Better Integrating Vision and Semantics for Improving Few-shot Classification. In: MM '23: Proceedings of the 31st ACM International Conference on Multimedia :. ACM, New York, pp. 4737-4746. ISBN 9798400701085

Full text not available from this repository.

Abstract

Some recent methods address few-shot classification by integrating visual and semantic prototypes. However, they usually ignore the difference in feature structure between the visual and semantic modalities, which leads to limited performance improvements. In this paper, we propose a novel method, called bimodal integrator (BMI), to better integrate visual and semantic prototypes. In BMI, we first construct a latent space for each modality via a variational autoencoder, and then align the semantic latent space to the visual latent space. Through this semantics-to-vision alignment, the semantic modality is mapped to the visual latent space and has the same feature structure as the visual modality. As a result, the visual and semantic prototypes can be better integrated. In addition, based on the multivariate Gaussian distribution and the prompt engineering, a data augmentation scheme is designed to ensure the accuracy of modality alignment during the training process. Experimental results demonstrate that BMI significantly improves few-shot classification, making simple baselines outperform the most advanced methods on miniImageNet and tieredImageNet datasets.

Item Type:
Contribution in Book/Report/Proceedings
ID Code:
227296
Deposited By:
Deposited On:
26 Nov 2025 11:10
Refereed?:
Yes
Published?:
Published
Last Modified:
26 Nov 2025 11:10