AI in Operations (Part 2 of 2)

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

We have been through the use cases of AI in Development in part 1 and in this blog we will cover the final 4 stages of the DevOps lifecycle and look into how AI can be used in operations at scale.

Release: The release stage in DevOps refers to the phase in the Software Development Lifecycle (SDLC) where a new version or iteration of a product is cut and made available to the end users. Here are two examples of where AI can help in this stage:

Automated release note generation:

Natural Language Processing (NLP) and Generative AI for release notes: When writing release notes, it can be quite an arduous task and it is imperative to get it correct. Using NLP and Generative AI, you can analyse code changes and automatically generate comprehensive release notes in natural language for end users. This ensures that the release documentation is comprehensive, up-to-date, and easily understandable for your user base.

Deployment risk assessment:

Machine Learning for risk prediction: The releasing of a new iteration of a product is always exciting but it comes with inherent risk. By implementing machine learning models to assess the risk associated with a release, using historical data it will help the team by providing insights, potential risks and employ mitigation contingencies that can be put in place ahead of time.

Deploy: The deploy stage in DevOps refers to the process of deploying the tested software or infrastructure changes from a development / test / pre-production environment to a production environment. Here are two examples of how AI can assist in this stage:

Dynamic rollback strategies:

AI-Driven rollbacks: It is expected that at one point or another, you will need to rollback your recently deployed environment. Mistakes happen and that is ok. The hard part of this is that it is not always automatically taken care of and the “what” and “why” is also not always clear. Here you can utilise AI models to analyse real-time performance metrics during deployment. If anomalies or performance issues are detected post-deployment, it can autonomously decide whether to initiate a rollback, ensuring there is a quick response to potential issues.

Deployment Optimisation:

Using AI for optimal traffic routing: There are several different deployment methods that are widely used. These include Canary, All-at-once, shadow deployments and more. Blue-Green is one of the most commonly used in production systems but that does not always mean it yields the results as expected. By utilising AI tools, you can dynamically optimise the traffic distribution between blue and green environments in a blue-green deployment, potentially better than a regular load balancer. This ensures that the new version receives sufficient traffic for testing and validation without impacting user experience.

Operate: In this stage the focus is on maintaining and managing the production environment. Here is where you will triage and address any incidents that occur and yes, AI can help!

Cognitive incident analysis:
- Cognitive AI for incident triage: When an incident occurs, as a DevOps Engineer (or otherwise stated) you need to be able to categorise and explain it in basic language to report it and help other team members understand. This can be a hard task and time consuming, especially when there is pressure involved. Here would be a good time to implement cognitive AI tooling that can understand and categorise incidents based on natural language descriptions, such as application logs. By doing this, it will assist in a faster and somewhat accurate incident triage, allowing the team to prioritise and address critical issues promptly.

Monitor: In this stage of DevOps you are continuously checking the health of the service, performance and even the behaviour. This can be time consuming and costly. You can do this in a few ways from cherry picking logs and analysing them to reading user feedback and calculating costs. Here is how AI can help you:

Predictive cost analysis:

Cost prediction and optimisation: Building on cloud infrastructure comes with a sense of anxiety that you may be charged unknowingly for the usage of a service or tool you may not be aware of. With the integration of AI into monitoring tools you could use it to predict future resource allocation and associated costs without needing to work through a manual cost calculator. This enables proactive cost management and optimisations with very little lift from you, the end user.

Sentiment analysis of user feedback:

AI-Based sentiment analysis: User feedback is imperative to improve the product or service you are providing, and this is the stage in the DevOps lifecycle where you will review it and begin to plan any actionable items into the next sprint. By applying sentiment analysis on user feedback and logs you can get an overall picture of how the product is being perceived and behaving at any given time. This in turn leads to having a quicker turn around on the feedback loop, it can help to prioritise feature improvements, bug fixes, or infrastructure changes.

By incorporating AI into your DevOps and Software Development Lifecycle, you will be able to speed up and improve your delivery of services in several ways, as shown above in this blog and in part 1. When using AI tools there must always be human interaction and oversight to ensure what is being changed, provided, or reported by the models is correct.

To fully immerse yourself in the different AI tools available to help at these different stages of operations, I would suggest visiting the Microsoft AI website.

Leave a Reply Cancel reply