Zheng, Zhihan and Tan, Yu-an and Chen, Zihan and Lu, Zheng and Meng, Weizhi and Wang, Shuo (2026) Empowering IoT Security : Automated Identification of Standard Library Functions in RTOS Firmware with LLM and RAG. IEEE Internet of Things Journal. ISSN 2327-4662
IoT-59503-2025_Proof_hi.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (1MB)
Abstract
Reverse engineering embedded firmware for Real-Time Operating Systems (RTOS) is a formidable challenge in Internet of Things security. The common practice of stripping symbolic information from firmware to optimize performance and storage makes identifying standard library functions exceptionally difficult, creating a significant bottleneck for functional analysis and vulnerability discovery. This paper introduces a novel, automated method for identifying these functions in RTOS firmware by leveraging Large Language Models. Our approach operates directly on raw binary code, requiring no symbolic or debugging information, and demonstrates broad generalizability across diverse hardware architectures and compilers. At its core, the method combines Retrieval-Augmented Generation (RAG) to enhance identification accuracy with a newly designed Adaptive Iterative Screening Algorithm (AISA), which optimizes analysis efficiency by prioritizing candidate functions based on a weighted score of call frequency, call depth, and address proximity to reduce token costs. We validated our method through rigorous experimentation on Zephyr RTOS firmware spanning nine architectures (e.g., ARM, MIPS, RISC-V). The results are compelling: our approach achieves a 90.59% accuracy rate in identifying standard library function names. Moreover, its application to commercial IoT firmware confirms its high efficiency, identifying functions significantly faster and more economically than traditional heuristic techniques. This work contributes a powerful, general-purpose, and cost-effective solution for automated firmware analysis.