Hatch::Self-Distributing Systems for Data Centers

Rodrigues Filho, Roberto and Porter, Barry (2022) Hatch::Self-Distributing Systems for Data Centers. Future Generation Computer Systems, 132. pp. 80-92. ISSN 0167-739X

[img]
Text (HATCH)
HATCH.pdf - Accepted Version
Restricted to Repository staff only until 18 February 2023.
Available under License Creative Commons Attribution-NonCommercial-NoDerivs.

Download (718kB)

Abstract

Designing and maintaining distributed systems remains highly challenging: there is a high-dimensional design space of potential ways to distribute a system’s sub-components over a large-scale infrastructure; and the deployment environment for a system tends to change in unforeseen ways over time. For engineers, this is a complex prediction problem to gauge which distributed design may best suit a given environment. We present the concept of self-distributing systems, in which any local system built using our framework can learn, at runtime, the most appropriate distributed design given its perceived operating conditions. Our concept abstracts distribution of a system’s sub-components to a list of simple actions in a reward matrix of distributed design alternatives to be used by reinforcement learning algorithms. By doing this, we enable software to experiment, in a live production environment, with different ways in which to distribute its software modules by placing them in different hosts throughout the system’s infrastructure. We implement this concept in a framework we call Hatch, which has three major elements: (i) a transparent and generalized RPC layer that supports seamless relocation of any local component to a remote host during execution; (ii) a set of primitives, including relocation, replication and sharding, from which to create an action/reward matrix of possible distributed designs of a system; and (iii) a decentralized reinforcement learning approach to converge towards more optimal designs in real time. Using an example of a self-distributing webserving infrastructure, Hatch is able to autonomously select the most suitable distributed design from among ù700,000 alternatives in about 5 minutes.

Item Type:
Journal Article
Journal or Publication Title:
Future Generation Computer Systems
Additional Information:
This is the author’s version of a work that was accepted for publication in Future Generation Computer Systems. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Future Generation Computer Systems, 132, 80-92, 2022 DOI: 10.1016/j.future.2022.02.008
Uncontrolled Keywords:
/dk/atira/pure/subjectarea/asjc/1700/1705
Subjects:
ID Code:
166121
Deposited By:
Deposited On:
15 Feb 2022 11:30
Refereed?:
Yes
Published?:
Published
Last Modified:
05 May 2022 02:35