AI-Assisted Adaptive Circuit Breaker Monitoring and Failure Prediction for Distributed Cloud Platforms
Abstract
Fast and intelligent fault detection and recovery for cloud systems (distributed and composed of dynamically changing microservices, virtualisation and/or geographically distributed nodes) is a big concern. The next research paper presents a framework for adaptive monitoring and prediction of the failure of circuit breakers in order to increase the resilience of distributed cloud environment. It is not just a framework for collecting real time data for telemetry, it also is there to keep an eye on services, identify anomalies and model failure prediction and adaptively control ‘circuit breakers'. Using machine learning methods these operational metrics such as latency, error-rate, request throughput, CPU/ram load, network delay and dependency behavior of the called services are continuously captured and analysed. The predictive actions layer is able to identify failure trends before they can lead to service outages, while the adaptive decision layer can dynamically adapt the circuit breaker levels to take into account the variability of traffic, past failures or workload trends. This unique architecture will allow for proactively forecasting and automating fault isolation to minimize subsequent cascading fault events and thereby providing a high level of service availability and self-healing to a cloud service. In summary, this work demonstrates the AI potential to disrupt the conventional fixed circuit breaker paradigm, bringing intelligence, context and reliability to this solution. It provides a scalable and efficient way of making a system more fault tolerant, reducing downtime and ensuring reliable performance in today's distributed systems, typically deployed in the cloud
Article Information
Journal |
International Journal of Advanced Engineering Science and Information Technology (IJAESIT) |
|---|---|
Volume (Issue) |
Vol. 9 No. 2 (2026): International Journal of Advanced Engineering Science and Information Technology (IJAESIT) |
DOI |
|
Pages |
421-429 |
Published |
April 20, 2026 |
| Copyright | |
Open Access |
This work is licensed under a Creative Commons Attribution 4.0 International License. |
How to Cite |
Dr Somasundaram Krishnan (2026). AI-Assisted Adaptive Circuit Breaker Monitoring and Failure Prediction for Distributed Cloud Platforms. International Journal of Advanced Engineering Science and Information Technology (IJAESIT) , Vol. 9 No. 2 (2026): International Journal of Advanced Engineering Science and Information Technology (IJAESIT) , pp. 421-429. https://doi.org/10.15662/IJAESIT.2026.0902004 |
References
[2] M. Ganesan, “Transforming home electronics customer self-installation experience with AI,” International Journal of Research Publications in Engineering, Technology and Management (IJRPETM), vol. 7, no. 4, pp. 14319–14327, 2024.
[3] N. Zhao et al., “Identifying bad software changes via multimodal anomaly detection for online service systems,” in Proc. 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Aug. 2021, pp. 527–539.
[4] P. Chen, Y. Qi, and D. Hou, “CauseInfer: Automated end-to-end performance diagnosis with hierarchical causality graph in cloud environment,” IEEE Transactions on Services Computing, vol. 12, no. 2, pp. 214–230, Mar. 2019.
[5] M. Jin et al., “An anomaly detection algorithm for microservice architecture based on robust principal component analysis,” IEEE Access, vol. 8, pp. 226397–226408, 2020.
[6] T. Wang, W. Zhang, J. Xu, and Z. Gu, “Workflow-aware automatic fault diagnosis for microservice-based applications with statistics,” IEEE Transactions on Network and Service Management, vol. 17, no. 4, pp. 2350–2363, Dec. 2020.
[7] M. Raeiszadeh, A. Ebrahimzadeh, A. Saleem, R. H. Glitho, J. Eker, and R. A. F. Mini, “Real-time anomaly detection using distributed tracing in microservice cloud applications,” in Proc. IEEE 12th International Conference on Cloud Networking (CloudNet), Nov. 2023, pp. 36–44.
[8] Y. Gan et al., “Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices,” in Proc. ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2019, pp. 19–33.
[9] S. Nedelkoski, J. Cardoso, and O. Kao, “Anomaly detection and classification using distributed tracing and deep learning,” in Proc. 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), May 2019, pp. 241–250.
[10] J. Bogatinovski, S. Nedelkoski, J. Cardoso, and O. Kao, “Self-supervised anomaly detection from distributed traces,” in Proc. IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC), Dec. 2020, pp. 342–347.
[11] P. Liu et al., “Unsupervised detection of microservice trace anomalies through service-level deep Bayesian networks,” in Proc. IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), Oct. 2020, pp. 48–58.
[12] L. Meng, F. Ji, Y. Sun, and T. Wang, “Detecting anomalies in microservices with execution trace comparison,” Future Generation Computer Systems, vol. 116, pp. 291–301, Mar. 2021.