Resilience Engineering for Intelligent Enterprise Platforms

Authors

  • Sumanth Reddy Anumula Senior Software Engineer, Dallas, TX, USA Author

DOI:

https://doi.org/10.15662/IJEETR.2023.0501004

Keywords:

resilience engineering, enterprise reliability, fault tolerance, graceful degradation, intelligent systems

Abstract

With the change of enterprise platforms to add more automation and intelligence, the systems experience new operational risks, including decision instability, data drift, and more system breakdowns. This paper discusses how resilience engineering concepts can be applied to intelligent enterprise systems, and a model is introduced that can be used to improve system reliability in the context of increasing complexity. The architectural strategies suggested focus on the fault containment, graceful degradation, and recoverability. The main attributes of the framework are decoupling of learning and optimization elements with implementation paths and the addition of safety nets such as limited autonomy, back-up logic and controlled shutdown. The way that intelligence is conceptualized as an ability that is controlled and not a necessity helps to guarantee the platform stability and reliability even during the time of uncertainty and operational pressure. The article will be applicable in developing enterprise platforms capable of supporting performance and reliability as automation and complexity keeps on growing as it gives a blueprint on the resilient enterprise architecture of tomorrow

References

1. Microsoft. (2022). Azure resiliency – Business continuity & disaster recovery. Retrieved from https://azure.microsoft.com/mediahandler/files/resourcefiles/resilience-in-azure-whitepaper/resiliency-whitepaper-2022.pdf

2. Google Cloud. . Site reliability engineering (SRE). Retrieved from https://cloud.google.com/sre

3. Neo01. (2022, October). Site reliability engineering: Evolution and modern practices. Retrieved from https://neo01.com/2022/10/Site-Reliability-Engineering-Evolution-and-Modern-Practices/

4. Google Cloud. . Well architected framework: Reliability pillar. Retrieved from https://docs.cloud.google.com/architecture/framework/reliability

5. Microsoft Learn. . Resiliency documentation. Retrieved from https://learn.microsoft.com/en-us/azure/resiliency/

6. Microsoft. (2022, August). Resiliency in the cloud – Azure essentials & shared responsibility. Retrieved from https://azure.microsoft.com/en-us/blog/resiliency-in-the-cloud-empowered-by-shared-responsibility-and-azure-essentials/

7. Microsoft Tech Community. (2021, December). Ensuring platform resiliency: The next step in AI deployment. Retrieved from https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/ensuring-platform-resiliency-the-next-step-in-ai-deployment/4239906

8. Microsoft Azure.. Azure reliability. Retrieved from https://azure.microsoft.com/en-us/explore/reliability/

9. SRE School. . Comprehensive tutorial on resilience in site reliability engineering. Retrieved from https://sreschool.com/blog/comprehensive-tutorial-on-resilience-in-site-reliability-engineering/

Downloads

Published

2023-02-11

How to Cite

Resilience Engineering for Intelligent Enterprise Platforms. (2023). International Journal of Engineering & Extended Technologies Research (IJEETR), 5(1), 5954-5965. https://doi.org/10.15662/IJEETR.2023.0501004