LLaFS++ : Few-Shot Image Segmentation With Large Language Models

Zhu, Lanyun and Chen, Tianrun and Ji, Deyi and Xu, Peng and Ye, Jieping and Liu, Jun (2025) LLaFS++ : Few-Shot Image Segmentation With Large Language Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47 (9). pp. 7715-7732. ISSN 0162-8828

Text (llafs_plus_plus)
llafs_plus_plus.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (7MB)

Abstract

Despite the rapid advancements in few-shot segmentation (FSS), most of existing methods in this domain are hampered by their reliance on the limited and biased information from only a small number of labeled samples. This limitation inherently restricts their capability to achieve sufficiently high levels of performance. To address this issue, this paper proposes a pioneering framework named LLaFS++, which, for the first time, applies large language models (LLMs) into FSS and achieves notable success. LLaFS++ leverages the extensive prior knowledge embedded by LLMs to guide the segmentation process, effectively compensating for the limited information contained in the few-shot labeled samples and thereby achieving superior results. To enhance the effectiveness of the text-based LLMs in FSS scenarios, we present several innovative and task-specific designs within the LLaFS++ framework. Specifically, we introduce an input instruction that allows the LLM to directly produce segmentation results represented as polygons, and propose a region-attribute corresponding table to simulate the human visual system and provide multi-modal guidance. We also synthesize pseudo samples and use curriculum learning for pretraining to augment data and achieve better optimization, and propose a novel inference method to mitigate potential oversegmentation hallucinations caused by the regional guidance information. Incorporating these designs, LLaFS++ constitutes an effective framework that achieves state-of-the-art results on multiple datasets including PASCAL-5 i, COCO-20 i, and FSS-1000. Our superior performance showcases the remarkable potential of applying LLMs to process few-shot vision tasks.

Item Type:

Journal Article

Journal or Publication Title:

IEEE Transactions on Pattern Analysis and Machine Intelligence

Uncontrolled Keywords:

/dk/atira/pure/subjectarea/asjc/1700/1702

Subjects:

?? artificial intelligencecomputational theory and mathematicssoftwareapplied mathematicscomputer vision and pattern recognition ??

Departments:

Faculty of Science and Technology > School of Computing & Communications

ID Code:

229990

Deposited By:

ep_importer_pure

Deposited On:

11 Jun 2025 14:45

Refereed?:

Yes

Published?:

Published

Last Modified:

11 Dec 2025 08:59

URI:

https://eprints.lancs.ac.uk/id/eprint/229990