Jamali, A. and Roy, S.K. and Lu, B. and Beni, L.H. and Kakhani, N. and Ghamisi, P. (2025) MSHCCT : A Multiscale Compact Convolutional Network for High-Resolution Aerial Scene Classification. IEEE Geoscience and Remote Sensing Letters, 22: 5001205. ISSN 1545-598X
Full text not available from this repository.Abstract
The growing popularity of vision transformers (ViTs) in remote sensing image classification is due to their ability to effectively capture long-range dependencies. However, their high computational cost and memory footprint limit their applicability, particularly for small-scale datasets and resource-constrained environments. To address these challenges, we propose the multiscale multihead compact convolutional transformer (MSHCCT), a lightweight yet powerful model that integrates convolutional tokenization with small-scale ViTs to enhance multiscale feature representation while maintaining computational efficiency. Despite a modest increase in parameters and training time, MSHCCT achieves superior classification accuracy and robustness on high-resolution aerial scenes. Importantly, our approach eliminates the need for model pretraining, additional datasets, or multisensor data fusion, ensuring a computationally efficient and practical solution for remote sensing applications. The code will be made publicly available at https://github.com/aj1365/MSHCCT