A predictive fault-tolerance framework for IoT systems

Power, Alexander (2020) A predictive fault-tolerance framework for IoT systems. PhD thesis, UNSPECIFIED.

[img]
Text (2020PowerPhD)
thesis_final.pdf - Published Version

Download (18MB)

Abstract

As Internet of Things (IoT) systems scale, attributes such as availability, reliability, safety, maintainability, security, and performance become increasingly more important. A key challenge to realise IoT is how to provide a dependable infrastructure for the billions of expected IoT devices. A dependable IoT system is one that can defensibly be trusted to deliver its intended service within a given time period. To define a FT-support solution that is applicable to all IoT systems, it is important that error definition is a generic, language-agnostic process, so that FT can be applied as a software pattern. It must also be interoperable, so that FT support can be easily 'plugged into' any existing IoT system, which is facilitated by an adherence to standards and protocols. Lastly, it is important that FT support is, itself, fault tolerant, so that it can be depended on to provide correct support for IoT systems. The work in this thesis considers how real-time and historical data analysis techniques can be combined to monitor an IoT environment and analyse its short- and long-term data to make the system as resilient to failure as possible. Specifically, complex event processing (CEP) is proposed for real-time error detection based on the analysis of stream data in an IoT system, where errors are defined as nondeterministic finite automata (NFA). For long-term error analysis, machine learning (ML) is proposed to predict when an error is likely to occur and mitigate imminent system faults based on previous experience of erroneous system behaviour in the IoT system. The contribution is threefold: (1) a language-agnostic approach to error definition using NFAs, designed to provide 'FT as a service' for easy deployment and integration into existing IoT systems; (2) an implementation of NFAs on a bespoke CEP system, BoboCEP, that provides distributed, resilient event processing at the network edge via active replication; and (3) a ML approach to intelligent FT that can learn from system errors over time to ensure correct long-term FT support. The proposed solution was evaluated using two vertical-farming testbeds and a dataset from a real-world vertical farm. Results showed that the proposed solution could detect and predict the successful detection and recovery of erroneous system behaviours. A performance analysis of BoboCEP was conducted with favourable results.

Item Type:
Thesis (PhD)
ID Code:
146630
Deposited By:
Deposited On:
20 Aug 2020 17:45
Refereed?:
No
Published?:
Published
Last Modified:
25 Oct 2020 00:04