Holistic energy and failure aware workload scheduling in Cloud datacenters

Li, Xiang and Jiang, Xiaohong and Garraghan, Peter and Wu, Zhaohui (2018) Holistic energy and failure aware workload scheduling in Cloud datacenters. Future Generation Computer Systems, 78 (3). pp. 887-900. ISSN 0167-739X

[thumbnail of FGCS - Energy-aware Failure-Aware Scheduling - Accepted]
PDF (FGCS - Energy-aware Failure-Aware Scheduling - Accepted)
FGCS_Energy_aware_Failure_Aware_Scheduling_Accepted.pdf - Accepted Version
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.

Download (1MB)


The global uptake of Cloud computing has attracted increased interest within both academia and industry resulting in the formation of large-scale and complex distributed systems. This has led to increased failure occurrence within computing systems that induce substantial negative impact upon system performance and task reliability perceived by users. Such systems also consume vast quantities of power, resulting in significant operational costs perceived by providers. Virtualization – a commonly deployed technology within Cloud datacenters – can enable flexible scheduling of virtual machines to maximize system reliability and energy-efficiency. However, existing work address these two objectives separately, providing limited understanding towards studying the explicit trade-offs towards dependable and energy-efficient compute infrastructure. In this paper, we propose two failure-aware energy-efficient scheduling algorithms that exploit the holistic operational characteristics of the Cloud datacenter comprising the cooling unit, computing infrastructure and server failures. By comprehensively modeling the power and failure profiles of a Cloud datacenter, we propose workload scheduling algorithms Ella-W and Ella-B, capable of reducing cooling and compute energy while minimizing the impact of system failures. A novel and overall metric is proposed that combines energy efficiency and reliability to specify the performance of various algorithms. We evaluate our algorithms against Random, MaxUtil, TASA, MTTE and OBFIT under various system conditions of failure prediction accuracy and workload intensity. Evaluation results demonstrate that Ella-W can reduce energy usage by 29.5% and improve task completion rate by 3.6%, while Ella-B reduces energy usage by 32.7% with no degradation to task completion rate.

Item Type:
Journal Article
Journal or Publication Title:
Future Generation Computer Systems
Additional Information:
This is the author’s version of a work that was accepted for publication in Future Generation Computer Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Future Generation Computer Systems, 78, 3, 2017 DOI: 10.1016/j.future.2017.07.044
Uncontrolled Keywords:
?? energy efficiencythermal managementreliabilityfailuresworkload schedulingcloud computinghardware and architecturesoftwarecomputer networks and communications ??
ID Code:
Deposited By:
Deposited On:
24 Jul 2017 09:28
Last Modified:
20 Apr 2024 01:51