Wang, Xiqi and Zheng, Shunyi and Zhang, Ce and Li, Rui and Gui, Li (2021) R-YOLO : A Real-Time Text Detector for Natural Scenes with Arbitrary Rotation. Sensors, 21 (3): 888. pp. 1-20. ISSN 1424-8220
sensors_1063450.docx - Accepted Version
Available under License Creative Commons Attribution.
Download (13MB)
Abstract
Accurate and efficient text detection in the natural scene is a fundamental yet challenging task in computer vision, especially when dealing with arbitrary-oriented texts. Currently, the majority of text detection methods are designed to identify the horizontal or approximate horizontal text, which cannot satisfy various practical requirements in real-time detection such as image streams or videos. To address this gap, we proposed a novel method of Rotational You Only Look Once (R-YOLO), a robust real-time convolutional neural network (CNN) model to detect arbitrary-oriented texts in natural image scenes. First, the rotated anchor box with angle information was exploited to represent the text bounding box over different orientations. Second, features of different scales were extracted from the input image to achieve the probability, confidence, and inclined bounding boxes of the text. Finally, the Rotational Distance Intersection over Union Non-Maximum Suppression (RDIoU-NMS) is proposed to eliminate the redundancy and acquire the detection results with the highest accuracy. Experiments on benchmark comparison were conducted using four popular datasets, i.e., ICDAR2015, ICDAR2013, MSRA-TD500, and HRSC2016. For example, the proposed R-YOLO method obtains an F-measure of 82.3% at 62.5fps with 720p resolution on the ICDAR2015 dataset. The results demonstrate that the proposed R-YOLO method can outperform the state-of-the-art methods significantly in terms of detection efficiency and accuracy. The code will be released at: https://github.com/wxq-888/R-YOLO.