How to Prevent IT Incidents with Agentic AI: Webinar Recap
Reactive firefighting drains engineering teams, burns revenue, and makes predictable operations feel out of reach. In our recent webinar, Flavia Batista (Delivery Manager at e-Core) and Martin Druda (Cloud and AI Strategy at e-Core) walked through why IT operations keep losing the battle against incident floods, and what it takes to move from constant firefighting to governed, AI-powered operations.
Here is a recap of the key points, the real-world results, and the practical steps to get there.
The real cost of reactive IT Operations
Flavia opened with a scenario most IT leaders know well. It is 2 a.m. Your phone is going off. You stumble to your desk, only to find out 20 minutes later that the alert was a false positive. You go back to sleep, but your focus for the next day is already gone.
The alternative scenario looks different. You sleep through the night. At 8 a.m. you read a report from an integrated command center showing that an incident happened overnight, with the actions taken and the root cause already identified.
That second scenario is the goal of every IT operations manager, and it is achievable.
Reaching it starts with understanding why so many teams are stuck in reactive mode in the first place.
As companies grow, their tool stacks grow with them. Observability platforms, ITSM systems, pager tools, collaboration tools, and now AI agents embedded natively inside many of those products. Each tool adds value in isolation.
Together, they add noise. Flavia shared that the average organization runs 4.4 monitoring tools. Those signals rarely connect, which makes issue detection harder rather than easier.
The root problem lives in the operating model. Adding more people to a broken model scales the problem. Adding more tools increases the noise.

Flávia Batista
Regional Delivery Manager
Flávia is an IT service management specialist with over 15 years of experience in service delivery. A certified service availability manager, she has spent over a decade leading technical specialist teams for global technology providers. Flávia excels at aligning operational processes with business value, ensuring that AI implementations follow rigorous governance standards to reduce burnout and improve service quality.
Automation and AI: Use both, in the right order
Martin pushed back on the instinct to apply AI everywhere. Many problems can be solved with simple automation. A service that hangs and needs to be recycled is deterministic. That is a textbook automation use case.
AI earns its place when automation hits its limits. If the recycle fails repeatedly, an agent can step in to do anomaly correlation (when did this last successfully recycle, what changed since then, is there a new dependency), triage, and root cause analysis.
Given read access to a GitHub repository, an agent can review source code and surface recommendations before a developer is even paged. The developer walks into an already informed investigation instead of starting cold.
The best implementations combine automation for deterministic work and AI for cases that require reasoning.

Martin Druda
Cloud and AI strategy
With over 25 years of technology leadership, Martin specializes in bridging the gap between complex technical architecture and executive business strategy. He has led large-scale digital transformations and high-availability projects for global icons in the entertainment and logistics sectors. At e-Core, Martin helps organizations move beyond foundational cloud environments into high-impact agentic AI solutions that drive long-term business resilience.
The maturity journey: Three levels
You cannot start at agentic operations. You need to get there.
1
Level one is a reactive chatbot. A user reports an issue, the chatbot collects information, opens a ticket, and hands it off to a human. It takes basic load off engineers and enables self-service.
2
Level two adds retrieval augmented generation.
The chatbot references your knowledge bases, past incidents, and runbooks to suggest possible solutions.
3
Level three is agentic coordination. Agents hand off work to each other across domains. In Martin's example, a user hits a cloud service limit while deploying.
The frontline chatbot does basic triage, identifies the service limit, and hands off to a specialized Cloud CoE agent. That agent checks whether the request fits company guidelines and either approves it or explains why it does not
Breaking down the silos
Standalone tools often have their own agentic or AI capabilities. Jira may serve your dev team. ServiceNow may serve your operations team. Each works well inside its own boundary. The question is how signals connect across those tools and how handoffs happen when an issue spans multiple domains.
That is where an orchestration layer comes in. Communication protocols like MCP solve part of the challenge. Orchestration handles the rest by knowing what to do next and what to do when things fail.
The flywheel: Observe, Engage, Act
Martin described the operating model as a flywheel with three stages.
What you observe informs how you engage. What you learn from acting improves how you observe. The loop compounds.
This foundation rests on organizational knowledge. If a runbook has not been updated in five years, the agent will act on outdated information. Concise, current runbooks let the agent behave like one of your most experienced engineers.
Process matters just as much. Change approval, downstream notifications, backups, scheduled maintenance windows. If the agent knows your operating guidelines, it can work within them.
Pilot the most mature process first. If anyone can execute the process with repeatable outcomes, the agent can add value.
Verify, then trust
In classic change management, the rule was trust but verify. Given how fast and how broadly agents can operate, Martin recommended flipping that to verify, then trust.
Keep a human in the loop, especially early on. Agents should propose actions while humans review and approve. As results prove repeatable, confidence builds. At that point you can start automating more of the workflow, which frees engineering staff to focus on work that grows the business.
Inside the e-Core AIOps Solution
Flavia introduced e-Core's AIOps offering, an end to end solution built on three layers.
The first layer is an extended IT ops team. Human experts with cross-product experience across visibility, ITSM, monitoring, collaboration, and cloud platforms. They adapt to your existing tools and workflows rather than replacing them.
The second layer is a set of specialized AI agents. Examples include:
- Anomaly correlation agent that groups related incidents
- Triage assistant that categorizes and assigns tickets
- Root cause analysis agent that reviews logs and traces and proposes resolutions
- Developer advisor that points engineers to likely code-related causes and explains the consequences of proposed fixes
- Resource optimization agent that flags cost reduction opportunities
- Monitoring designer agent that refines alert rules so you stop generating noise at the source
Agents support multi-cloud environments, multi-LLM configurations, and private hosting for compliance or data residency needs.
The third layer is OpsHub, an ITSM agnostic command center. OpsHub connects to the tools you already use (Datadog, PagerDuty, Splunk, ServiceNow, Jira, Slack, AWS, and others) with no rip and replace and no vendor lock in. Because it is agnostic, it preserves the expertise already built into each tool and adds an orchestration layer across them.
A real result: Food Delivery Platform
Flavia shared a case from a leading food delivery platform running 24/7 with over 120 million transactions per month. At that scale, a degraded API at dinner time translates directly into measurable revenue loss.
Before e-Core's AIOps solution, the team was handling over 100,000 alerts per month. Engineer fatigue was high. Alert quality was low. Escalations during the night were routine.
After implementation, alert volume dropped by approximately 73%. Critical incidents were isolated and prioritized, which accelerated response and resolution for the issues that mattered. Once operations reached a stable state, the company had enough confidence in their IT foundation to expand into financial services as a new line of business.
That expansion was possible because the team trusted that operations could absorb the additional load.
What OpsHub looks like in practice
During the live demo, Flavia walked through the OpsHub command center. A few data points from the platform view:
- Mean time to detect: 12 minutes, down from 18 the prior month
- Mean time to resolve: 18 minutes, down from 27 the prior month
- 89% of alert noise filtered by the triage agent
- 87% success rate on the remediation agent
The Observe view shows active incidents, which agents are working on them, and runbooks pending approval. The Engage view surfaces what each agent is doing, with traceability from detection through triage to investigation.

In one correlation example, the anomaly correlation agent linked three alerts to the same database connection pool exhaustion with 95% confidence, citing a shared spike pattern at 7:08 a.m. Another related alert was flagged at 65% confidence because the link was plausible but not conclusive.
The Act view shows proposed runbook executions. Each proposed action includes the steps the agent plans to take (for example, scaling an ECS service from 3 to 9 tasks), the estimated impact (cost, outage risk), and an approve or reject control.
Agents wait for human approval before executing disruptive changes, the same way a human engineer would need change management approval before modifying production.
This was the point Flavia emphasized: AI agents are IT assets, and they should follow the same governance process as your engineers.
Governance and the Agent Catalog
OpsHub also provides a rules view where teams can see every agent, its integrations, its confidence level, and its success rate. Before deploying a new agent, teams can search the existing catalog to avoid duplicating capabilities.
The orchestration view shows every active incident and where it sits in the flow: ingested, needing information, under triage, pending security review, being correlated, or under root cause analysis.
Integrations are visible along with their connection status. Some tools appear as connected, others as degraded and requiring attention.
-> Read about: AI and governance: Rethink IT control and accountability
Practical steps to prevent IT incidents with Agentic AI
Flavia closed with a short checklist for teams preparing to apply agentic AI to IT operations:
- Assess your maturity. If your data, monitoring rules, runbooks, and knowledge base are not reliable, agents will act on bad information.
- Remove silos by connecting agents across tools.
- Tie agents to the process your engineers already follow.
- Plan cost from the start. AI at scale gets expensive if you do not model it early.
- Start small. Pilot a well understood process before scaling to complex ones.
Treat AI as a force multiplier for your tools and people rather than a replacement.
Q&A Highlights
Can agents follow custom processes?
Yes. Flavia noted that while ITIL is common, agents adapt to whatever process an organization already uses.
When do you stop requiring human approval?
Martin pointed to three signals: a consistently high confidence score on proposed actions, low blast radius, and recoverability. Updating a record is low risk. Dropping a database is not. Flavia added that even senior engineers face approvals for certain actions, and the same standard should apply to agents.
How is this different from native AI in tools like Atlassian Rovo?
Rovo runs inside the Atlassian ecosystem. The agents in the e-Core solution work across tools, so correlation can happen between Jira and Datadog, for example, without the silo.
How do you identify the most mature process to pilot first?
Look for a process with a clear, current runbook that documents every required step, including approvals, notifications, and backups. If the runbook is short and relies on tribal knowledge, it is not ready for an agent to own.
Missed the live session? Watch the full webinar recording
here to see the OpsHub architecture in action or book a strategic AI Ops assessment with our experts.

e-Core
We combine global expertise with emerging technologies to help companies like yours create innovative digital products, modernize technology platforms, and improve efficiency in digital operations.




