Gatineau, Guillaume and De Gruttola, Michele and Hind, Karen and Davies, Madeleine and Krueger, Diane and Binkley, Neil and Kužma, Martin and Payer, Juraj and Guglielmi, Giuseppe and Fahrleitner-Pammer, Astrid and Chun, Kwang and Bencardino, Jenny and Sherman, Alain and Jones, Kai and Hans, Didier (2026) Validation of a deep learning model for bone fragility detection from conventional radiographs : an international cohort study. eClinicalMedicine, 95: 103974. ISSN 2589-5370
Full text not available from this repository.Abstract
Background Artificial Intelligence (AI)-based opportunistic risk stratification solutions can help to counter rising fragility fracture rates. Existing tools estimate bone mineral density (BMD) alone, while the present study incorporates trabecular bone score (TBS), a surrogate of bone microarchitecture, more closely mirroring fracture pathophysiology. We aimed to evaluate the performance of an AI tool that estimates bone fragility directly from standard radiographs to identify individuals at highest risk of fracture. Methods This retrospective, multinational cohort study included 18,858 paired radiographs and lumbar spine dual-energy X-ray absorptiometry (DXA) scans from adult patients (aged at least 20 years) from three clinical sites in Europe and two sites in the United States. Routine clinical radiographs of the spine, abdomen, chest, or pelvis acquired in the anteroposterior or posteroanterior view and including visualisation of the lumbar spine were included. Eligible radiographs had an in-plane spatial resolution of ≤0.2 mm per pixel, independent of vendor, and had a corresponding DXA examination within 6 months, and included at least two lumbar vertebrae (L1–L4). The AI model training used a composite Bone Fragility Index, combining TBS and BMD. Training, internal validation and testing was performed on two European sites (n = 10,692; Italy and Austria); and external validation involved three sites with ethnically diverse populations (n = 7079): Slovakia, US site 1 (Wisconsin) and US site 2 (New York). Model performance for identifying very high bone fragility (characterised by degraded TBS and osteoporosis) prioritised specificity (as per the intended clinical use to prioritise low false-positive rates) and was evaluated with accuracy, sensitivity, specificity, AUC and precision. Findings Between Jan 1, 2010 and Dec 31, 2023, 18,858 paired radiographs and lumbar spine dual-energy X-ray absorptiometry (DXA) scans from 11,138 participants across five international sites were retrospectively aggregated. Internal testing on two European sites demonstrated an accuracy of 0.86 (95% CI: 0.78, 0.92), specificity of 0.93 (0.85, 0.99), and sensitivity of 0.53 (0.41, 0.67). External testing on three sites demonstrated consistently high specificity of 0.88 (0.77, 0.99) in the European cohort, 0.94 (0.91, 0.97) in the American White dataset, and 0.96 (0.81, 0.99) in the American Non-White. External sensitivity ranged from 0.53 (0.41, 0.67) to 0.64 (0.50, 0.88). Interpretation The proposed approach provides rapid identification of individuals with very high bone fragility from routine radiographs. Its specificity across diverse populations supports clinical use for opportunistic osteoporosis screening in real-world settings. Future work should assess the model’s performance in more sex-balanced cohorts without prior DXA assessment, and evaluate its ability to predict incident fractures. Funding The Swiss National Science Foundation, the Foundation of the Orthopaedic Hospital of the Vaudois University Hospital (Lausanne, Switzerland), and Medimaps Group SA, Switzerland.