Better Integrating Vision and Semantics for Improving Few-shot Classification

Li, Zhuoling and Wang, Yong (2023) Better Integrating Vision and Semantics for Improving Few-shot Classification. In: MM '23: Proceedings of the 31st ACM International Conference on Multimedia :. ACM, New York, pp. 4737-4746. ISBN 9798400701085

Full text not available from this repository.

Abstract

Some recent methods address few-shot classification by integrating visual and semantic prototypes. However, they usually ignore the difference in feature structure between the visual and semantic modalities, which leads to limited performance improvements. In this paper, we propose a novel method, called bimodal integrator (BMI), to better integrate visual and semantic prototypes. In BMI, we first construct a latent space for each modality via a variational autoencoder, and then align the semantic latent space to the visual latent space. Through this semantics-to-vision alignment, the semantic modality is mapped to the visual latent space and has the same feature structure as the visual modality. As a result, the visual and semantic prototypes can be better integrated. In addition, based on the multivariate Gaussian distribution and the prompt engineering, a data augmentation scheme is designed to ensure the accuracy of modality alignment during the training process. Experimental results demonstrate that BMI significantly improves few-shot classification, making simple baselines outperform the most advanced methods on miniImageNet and tieredImageNet datasets.

Item Type:

Contribution in Book/Report/Proceedings

Departments:

Faculty of Science and Technology > School of Computing & Communications

ID Code:

227296

Deposited By:

ep_importer_pure

Deposited On:

26 Nov 2025 11:10

Refereed?:

Yes

Published?:

Published

Last Modified:

13 Dec 2025 13:43

URI:

https://eprints.lancs.ac.uk/id/eprint/227296

Altmetric