Tolerating transient late-timing faults in cloud-based real-time stream processing

Garraghan, Peter and Perks, Stuart and Ouyang, Xue and McKee, David and Moreno, Ismael Solis (2016) Tolerating transient late-timing faults in cloud-based real-time stream processing. In: 2016 IEEE 19th International Symposium on Real-Time Distributed Computing (ISORC) :. IEEE, pp. 108-115. ISBN 9781467390323

[thumbnail of Submitted ISORC - Real-time Stream Processing]
Preview
PDF (Submitted ISORC - Real-time Stream Processing)
Submitted_ISORC_Real_time_Stream_Processing.pdf - Accepted Version
Available under License Creative Commons Attribution.

Download (411kB)

Abstract

Real-time stream processing is a frequently deployed application within Cloud datacenters that is required to provision high levels of performance and reliability. Numerous fault-tolerant approaches have been proposed to effectively achieve this objective in the presence of crash failures. However, such systems struggle with transient late-timing faults - a fault classification challenging to effectively tolerate - that manifests increasingly within large-scale distributed systems. Such faults represent a significant threat towards minimizing soft real-time execution of streaming applications in the presence of failures. This work proposes a fault-tolerant approach for QoS-aware data prediction to tolerate transient late-timing faults. The approach is capable of determining the most effective data prediction algorithm for imposed QoS constraints on a failed stream processor at run-time. We integrated our approach into Apache Storm with experiment results showing its ability to minimize stream processor end-to-end execution time by 61% compared to other fault-tolerant approaches. The approach incurs 12% additional CPU utilization while reducing network usage by 44%.

Item Type:
Contribution in Book/Report/Proceedings
Additional Information:
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Subjects:
?? prediction algorithmsreal-time systemsfault tolerancefault tolerant systemstransient analysisquality of servicepredictive models ??
ID Code:
82341
Deposited By:
Deposited On:
21 Oct 2016 13:40
Refereed?:
Yes
Published?:
Published
Last Modified:
11 Nov 2024 01:43