Smith, Abraham George and Lamprinidis, Sotiris and Seethepalli, Anand and York, Larry M and Han, Eusun and Möhl, Patrick and Boulata, Kyriaki and Thorup-Kristensen, Kristian and Petersen, Jens (2026) A systematic comparison of transformers and ConvNets for root segmentation across nine datasets. Plant Methods. ISSN 1746-4811
Full text not available from this repository.Abstract
Root segmentation is a fundamental yet challenging task in image-based plant phenotyping. Accurate segmentation is a prerequisite for extracting root traits relevant to plant physiology, breeding, and agronomy. While U-Net and other convolutional neural network (ConvNet) architectures have been applied to root segmentation, no systematic comparison of multiple Transformer and ConvNet architectures has been conducted across diverse root imaging conditions. We evaluated 21 segmentation architectures across nine diverse root image datasets, training 1511 models to assess all combinations of architecture, dataset, pre-training strategy, and learning rate, producing over 3 million segmentations for evaluation. Transformer-based models significantly outperformed ConvNets for Dice (mean Dice 0.679 vs 0.659; [Formula: see text]). Root-diameter and root-length correlation were also higher for Transformers, but the differences were not statistically significant ([Formula: see text] and [Formula: see text] respectively). Pre-training significantly improved mean Dice from 0.623 to 0.666 ([Formula: see text]), with Transformers benefiting more from pre-training than ConvNets (Dice improvement + 0.072 vs + 0.021; [Formula: see text]), supporting the hypothesis that fine-tuned Transformers transfer more effectively across large domain gaps. MobileSAM achieved the highest Dice score (0.693) while maintaining computational efficiency. Both architecture families underestimated thin root length compared to manual annotations. Dataset choice explained 70.9% of performance variance, far exceeding model architecture (6.7%). Transformer architectures significantly outperform ConvNets for root segmentation accuracy, and pre-training significantly improves performance, particularly for Transformers. Pre-trained MobileSAM offers the best accuracy at competitive computational cost. Dataset choice dominates performance variance, suggesting practitioners should prioritize data curation over architecture selection.