A Combined System Metrics Approach to Cloud Service Reliability Using Artificial Intelligence

Tek Raj Chhetri, Chinmaya Kumar Dehury*, Artjom Lind, Satish Narayana Srirama, Anna Fensel

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

3 Citations (Scopus)


Identifying and anticipating potential failures in the cloud is an effective method for increasing cloud reliability and proactive failure management. Many studies have been conducted to predict potential failure, but none have combined SMART (self-monitoring, analysis, and reporting technology) hard drive metrics with other system metrics, such as central processing unit (CPU) utilisation. Therefore, we propose a combined system metrics approach for failure prediction based on artificial intelligence to improve reliability. We tested over 100 cloud servers’ data and four artificial intelligence algorithms: random forest, gradient boosting, long short-term memory, and gated recurrent unit, and also performed correlation analysis. Our correlation analysis sheds light on the relationships that exist between system metrics and failure, and the experimental results demonstrate the advantages of combining system metrics, outperforming the state-of-the-art.

Original languageEnglish
Article number26
JournalBig Data and Cognitive Computing
Issue number1
Publication statusPublished - 1 Mar 2022


  • Artificial intelligence
  • Cloud computing
  • Failure prediction
  • Fault tolerance
  • Reliability


Dive into the research topics of 'A Combined System Metrics Approach to Cloud Service Reliability Using Artificial Intelligence'. Together they form a unique fingerprint.

Cite this