Integrating Clustering and Regression for Workload Estimation in the Cloud

Yu, Yongjia and Jindal, Vasu and Yen, I-Ling and Bastani, Farokh and Xu, Jie and Garraghan, Peter (2020) Integrating Clustering and Regression for Workload Estimation in the Cloud. Concurrency and Computation Practice and Experience, 32 (23). ISSN 1532-0626

[thumbnail of Integrating Clustering and Regression for Workload Estimation in the Cloud]
Text (Integrating Clustering and Regression for Workload Estimation in the Cloud)
Cloud_Workload_Prediction.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial.

Download (2MB)


Workload prediction has been widely researched in the literature. However, existing techniques are per‐job based and useful for service‐like tasks whose workloads exhibit seasonality and trend. But cloud jobs have many different workload patterns and some do not exhibit recurring workload patterns. We consider job‐pool‐based workload estimation, which analyzes the characteristics of existing tasks' workloads to estimate the currently running tasks' workload. First cluster existing tasks based on their workloads. For a new task J, collect the initial workload of J and determine which cluster J may belong to, then use the cluster's characteristics to estimate J′s workload. Based on the Google dataset, the algorithm is experimentally evaluated and its effectiveness is confirmed. However, the workload patterns of some tasks do have seasonality and trend, and conventional per‐job‐based regression methods may yield better workload prediction results. Also, in some cases, some new tasks may not follow the workload patterns of existing tasks in the pool. Thus, develop an integrated scheme which combines clustering and regression and utilize the best of them for workload prediction. Experimental study shows that the combined approach can further improve the accuracy of workload prediction.

Item Type:
Journal Article
Journal or Publication Title:
Concurrency and Computation Practice and Experience
Additional Information:
This is the peer reviewed version of the following article: Yu, Y, Jindal, V, Yen, I‐L, Bastani, F, Xu, J, Garraghan, P. Integrating clustering and regression for workload estimation in the cloud. Concurrency Computat Pract Exper. 2020; e5931. which has been published in final form at This article may be used for non-commercial purposes in accordance With Wiley Terms and Conditions for self-archiving.
Uncontrolled Keywords:
ID Code:
Deposited By:
Deposited On:
12 Jun 2020 09:25
Last Modified:
17 Sep 2023 02:50