PRISM: An Experiment Framework for Straggler Analytics in Containerized Clusters

Lindsay, Dominic (2019) PRISM: An Experiment Framework for Straggler Analytics in Containerized Clusters. In: Proceedings of the 5th International Workshop on Container Technologies and Container Clouds - WOC '19 :. ACM, USA, pp. 13-18. ISBN 9781450370332

Full text not available from this repository.

Abstract

Containerized clusters of machines at scale that provision Cloud services are encountering substantive difficulties with stragglers -- whereby a small subset of task execution negatively degrades system performance. Stragglers are an unsolved challenge due to a wide variety of root-causes and stochastic behavior. While there have been efforts to mitigate their effects, few works have attempted to empirically ascertain how system operational scenarios precisely influence straggler occurrence and severity. This challenge is further compounded with the difficulties of conducting experiments within real-world containerized clusters. System maintenance and experiment design are often error-prone and time-consuming processes, and a large portion of tools created for workload submission and straggler injection are bespoke to specific clusters, limiting experiment reproducibility. In this paper we propose PRISM, a framework that automates containerized cluster setup, experiment design, and experiment execution. Our framework is capable of deployment, configuration, execution, performance trace transformation and aggregation of containerized application frameworks, enabling scripted execution of diverse workloads and cluster configurations. The framework reduces time required for cluster setup and experiment execution from hours to minutes. We use PRISM to conduct automated experimentation of system operational conditions and identify straggler manifestation is affected by resource contention, input data size and scheduler architecture limitations.

Item Type:
Contribution in Book/Report/Proceedings
ID Code:
151375
Deposited By:
Deposited On:
10 Jun 2021 14:05
Refereed?:
Yes
Published?:
Published
Last Modified:
01 Feb 2024 00:53