So far in this series, we’ve explored the possibilities of diagnostic AIOps and assistive AIOps. Now, let’s look at where we are on the road to autonomous operations, and how we reach the next stage of the journey.

Autonomous Operations as They Currently Stand

For years, AI systems have been introducing various limited forms of automation that enhance IT operations but still require significant human oversight. These functions rely on established runbooks—predefined scripts that dictate specific actions to address known issues. When a technical problem arises, an agent may recommend a relevant runbook based on previous incidents. While this can assist in identifying problems and suggesting solutions, the final decision-making and execution often still rest with human operators.

Let’s take an example. Site Reliability Engineering (SRE) teams monitor the release of new software updates, features, and fixes used by websites and applications. They check everything from online services to mobile apps to ensure they work properly after changes are made. When issues arise, they might use AI to help pinpoint the problem and automate the rollback process. If the rollback doesn't resolve the issue, a human must intervene and take more nuanced actions.

In cases like this, AI systems remain a tool that augments human capabilities rather than replaces them.

Evolving AI to Operate With More Independence

Enter agentic AI—the next stage in AI evolution that promises genuinely autonomous operations. The architecture of agentic AI is based on distributed or multi-agent systems, which enable individual AI agents to focus on specific components of a problem while retaining an awareness of the overall context. By integrating multiple AI frameworks, agentic AI can combine the linguistic strengths of LLMs with the analytical prowess of other AI methodologies. It can:

  • Understand intricate human requests
  • Analyze a multiplicity of factors
  • Formulate appropriate responses without significant human involvement

SolarWinds Tech Evangelist Sascha Giese takes the example of logistics to illustrate the power of Agentic AI:

“If a shipment starts traveling from A to B, there are many moving parts in between, and many things could go wrong. There was a delay here, a missed connection there, and some trouble with documentation at the customs office. There are multiple little roadblocks, but it is still a problem that a machine can solve once it can make decisions like altering the route or changing the means of transportation. That decision is based on multiple agents working on individual moving parts, providing a range of possible resolutions, while the greater construct keeps its eye on the big picture. Only exceptional cases demand human intervention.”

This gives us a sense of what sets agentic AI apart from more traditional forms of artificial intelligence. But how will it impact how we manage IT infrastructures?

The Benefits of Agentic AI in Infrastructure Observability

  • Outcome-oriented workflows: While traditional AI usually follows predefined rules without such dynamic responsiveness, Agentic AI systems assess the current system state, plan appropriate actions, interact with relevant tools or data sources, execute tasks, and iteratively refine their strategies based on outcomes—decisions are informed by the results of previous actions.
  • Automation of remediation workflows: When Agentic AI detects potential bottlenecks, it can automatically scale resources or reroute traffic. In the event of service degradation, agentic AI can restart a service or roll back a deployment, with human approvals and guardrails integrated as needed.
  • Predictive maintenance: Agentic AI also excels in predictive maintenance. Instead of waiting for alerts to trigger investigations, it proactively forecasts potential failures and schedules maintenance tasks or re-deployments. For example, it can expand a blue/green rollout to prevent issues before they occur.
  • Dynamic resource allocation: Agentic AI adjusts infrastructure in real time based on usage patterns, scaling up during high-traffic periods and scaling down during low-traffic periods.
  • Continuous feedback loop: Agentic AI operates on a continuous feedback loop, learning from previous incidents or near misses to refine its decision-making models. Over time, it becomes more adept at predicting issues and selecting effective remediation strategies.

Barriers to Integrating Agentic AI

The potential benefits are enticing, but several key challenges hinder the seamless adoption of agentic AI:

  • There is the potential for compatibility and integration issues, as legacy systems often lack the flexibility, speed, and scalability needed for modern AI technologies.
  • The autonomous nature of agentic AI introduces new risks. When connecting to external APIs and third-party tools, it’s important to implement strong security measures and comply with industry standards.
  • As ever, data integrity is critical. Legacy systems often have outdated processes and inefficiencies that must be addressed to ensure data is clean, accessible, and standardized.
  • Finally, governance and control complexities involve balancing AI autonomy with appropriate oversight, defining clear boundaries for autonomous actions, and establishing audit trails and escalation protocols.

What IT Departments Can Do Now

Organizations should think carefully about what kind of artificial intelligence solution matches their needs. Could the problems you hope to solve with AI be addressed through hiring more suitable people or enhancing processes? Whether you hope to implement diagnostic, assistive, or autonomous AI systems, a few principles are relevant across the board. Organizations should work to ensure their data estate is in good condition and implement strong governance guidelines to build trust in their systems. By doing the basics now, IT departments can ensure they are best placed to capitalize on developments in AI as they happen. In today’s world, it’s the best way of ensuring a competitive business edge.

Read more about the different approaches to adopting AIOps.