Optimizing CNN Inference Speed over Big Social Data through Efficient Model Parallelism for Sustainable Web of Things

Hu, Yuhao and Xu, Xiaolong and Bilal, Muhammad and Zhong, Weiyi and Liu, Yuwen and Kou, Huaizhen and Kong, Lingzhen (2024) Optimizing CNN Inference Speed over Big Social Data through Efficient Model Parallelism for Sustainable Web of Things. Journal of Parallel and Distributed Computing, 192: 104927. ISSN 0743-7315

[thumbnail of Optimizing CNN inference speed over big social data through efficient model parallelism for sustainable web of things-final]
Text (Optimizing CNN inference speed over big social data through efficient model parallelism for sustainable web of things-final)
Optimizing_CNN_inference_speed_over_big_social_data_through_efficient_model_parallelism_for_sustainable_web_of_things-final.pdf - Accepted Version
Available under License Creative Commons Attribution.

Download (3MB)

Abstract

The rapid development of artificial intelligence and networking technologies has catalyzed the popularity of intelligent services based on deep learning in recent years, which in turn fosters the advancement of Web of Things (WoT). Big social data (BSD) plays an important role during the processing of intelligent services in WoT. However, intelligent BSD services are computationally intensive and require ultra-low latency. End or edge devices with limited computing power cannot realize the extremely low response latency of those services. Distributed inference of deep neural networks (DNNs) on various devices is considered a feasible solution by allocating the computing load of a DNN to several devices. In this work, an efficient model parallelism method that couples convolution layer (Conv) split with resource allocation is proposed. First, given a random computing resource allocation strategy, the Conv split decision is made through a mathematical analysis method to realize the parallel inference of convolutional neural networks (CNNs). Next, Deep Reinforcement Learning is used to get the optimal computing resource allocation strategy to maximize the resource utilization rate and minimize the CNN inference latency. Finally, simulation results show that our approach performs better than the baselines and is applicable for BSD services in WoT with a high workload.

Item Type:
Journal Article
Journal or Publication Title:
Journal of Parallel and Distributed Computing
Uncontrolled Keywords:
Research Output Funding/yes_externally_funded
Subjects:
?? yes - externally fundednoartificial intelligencehardware and architecturetheoretical computer sciencesoftwarecomputer networks and communications ??
ID Code:
223228
Deposited By:
Deposited On:
19 Aug 2024 13:25
Refereed?:
Yes
Published?:
Published
Last Modified:
12 Nov 2024 01:38