Lancaster EPrints

Holistic energy and failure aware workload scheduling in Cloud datacenters

Li, Xiang and Jiang, Xiaohong and Garraghan, Peter and Wu, Zhaohui (2018) Holistic energy and failure aware workload scheduling in Cloud datacenters. Future Generation Computer Systems, 78 (3). pp. 887-900. ISSN 0167-739X

[img] PDF (FGCS - Energy-aware Failure-Aware Scheduling - Accepted) - Submitted Version
Restricted to Repository staff only until 22 July 2018.
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.

Download (1728Kb)

    Abstract

    The global uptake of Cloud computing has attracted increased interest within both academia and industry resulting in the formation of large-scale and complex distributed systems. This has led to increased failure occurrence within computing systems that induce substantial negative impact upon system performance and task reliability perceived by users. Such systems also consume vast quantities of power, resulting in significant operational costs perceived by providers. Virtualization – a commonly deployed technology within Cloud datacenters – can enable flexible scheduling of virtual machines to maximize system reliability and energy-efficiency. However, existing work address these two objectives separately, providing limited understanding towards studying the explicit trade-offs towards dependable and energy-efficient compute infrastructure. In this paper, we propose two failure-aware energy-efficient scheduling algorithms that exploit the holistic operational characteristics of the Cloud datacenter comprising the cooling unit, computing infrastructure and server failures. By comprehensively modeling the power and failure profiles of a Cloud datacenter, we propose workload scheduling algorithms Ella-W and Ella-B, capable of reducing cooling and compute energy while minimizing the impact of system failures. A novel and overall metric is proposed that combines energy efficiency and reliability to specify the performance of various algorithms. We evaluate our algorithms against Random, MaxUtil, TASA, MTTE and OBFIT under various system conditions of failure prediction accuracy and workload intensity. Evaluation results demonstrate that Ella-W can reduce energy usage by 29.5% and improve task completion rate by 3.6%, while Ella-B reduces energy usage by 32.7% with no degradation to task completion rate.

    Item Type: Journal Article
    Journal or Publication Title: Future Generation Computer Systems
    Additional Information: This is the author’s version of a work that was accepted for publication in Future Generation Computer Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Future Generation Computer Systems, 78, 3, 2017 DOI: 10.1016/j.future.2017.07.044
    Uncontrolled Keywords: Energy efficiency ; Thermal management ; Reliability ; Failures ; Workload scheduling ; Cloud computing
    Subjects:
    Departments: Faculty of Science and Technology > School of Computing & Communications
    ID Code: 87132
    Deposited By: ep_importer_pure
    Deposited On: 24 Jul 2017 10:28
    Refereed?: Yes
    Published?: Published
    Last Modified: 15 Apr 2018 02:24
    Identification Number:
    URI: http://eprints.lancs.ac.uk/id/eprint/87132

    Actions (login required)

    View Item