De Souza Ramos, Washington and Soriano Marcolino, Leandro and Nascimento, Erickson R. (2024) Text-driven Video Acceleration. In: 37th Conference on Graphics, Patterns and Images (SIBGRAPI) : Workshop of Theses and Dissertations (WTD). UNSPECIFIED, BRA, pp. 35-41.
WashingtonRamos_2024_WTD_SIBGRAPI_CameraReady.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (8MB)
Abstract
From the dawn of the digital revolution until today, data has grown exponentially, especially in images and videos. Smartphones and wearable devices with high storage and long battery life contribute to continuous recording and massive uploads to social media. This rapid increase in visual data, combined with users' limited time, demands methods to produce shorter videos that convey the same information. Semantic Fast-Forwarding reduces viewing time by adaptively accelerating videos and slowing down for relevant segments. However, current methods require predefined visual concepts or user supervision, which is costly and time-consuming. This work explores using textual data to create text-driven fast-forwarding methods that generate semantically meaningful videos without explicit user input. Our proposed approaches outperform baselines, achieving F1 Score improvements up to 12.8 percentage points over the best competitors. Comprehensive user and ablation studies, along with quantitative and qualitative evaluations, confirm their superiority. Visual results are available at https://youtu.be/cOYqumJQOY and https://youtu.be/u6ODTv7-9C4 .