How to Prevent IT Incidents with Agentic AI: Webinar Recap

Reactive firefighting drains engineering teams, burns revenue, and makes predictable operations feel out of reach. In our recent webinar, Flavia Batista (Delivery Manager at e-Core) and Martin Druda (Cloud and AI Strategy at e-Core) walked through why IT operations keep losing the battle against incident floods, and what it takes to move from constant firefighting to governed, AI-powered operations.

Here is a recap of the key points, the real-world results, and the practical steps to get there.

The real cost of reactive IT Operations

Flavia opened with a scenario most IT leaders know well. It is 2 a.m. Your phone is going off. You stumble to your desk, only to find out 20 minutes later that the alert was a false positive. You go back to sleep, but your focus for the next day is already gone.

The alternative scenario looks different. You sleep through the night. At 8 a.m. you read a report from an integrated command center showing that an incident happened overnight, with the actions taken and the root cause already identified.

That second scenario is the goal of every IT operations manager, and it is achievable.

Reaching it starts with understanding why so many teams are stuck in reactive mode in the first place.

As companies grow, their tool stacks grow with them. Observability platforms, ITSM systems, pager tools, collaboration tools, and now AI agents embedded natively inside many of those products. Each tool adds value in isolation.

Together, they add noise. Flavia shared that the average organization runs 4.4 monitoring tools. Those signals rarely connect, which makes issue detection harder rather than easier.

The root problem lives in the operating model. Adding more people to a broken model scales the problem. Adding more tools increases the noise.

Automation and AI: Use both, in the right order

Martin pushed back on the instinct to apply AI everywhere. Many problems can be solved with simple automation. A service that hangs and needs to be recycled is deterministic. That is a textbook automation use case.

AI earns its place when automation hits its limits. If the recycle fails repeatedly, an agent can step in to do anomaly correlation (when did this last successfully recycle, what changed since then, is there a new dependency), triage, and root cause analysis.

Given read access to a GitHub repository, an agent can review source code and surface recommendations before a developer is even paged. The developer walks into an already informed investigation instead of starting cold.

The best implementations combine automation for deterministic work and AI for cases that require reasoning.

Breaking down the silos

Standalone tools often have their own agentic or AI capabilities. Jira may serve your dev team. ServiceNow may serve your operations team. Each works well inside its own boundary. The question is how signals connect across those tools and how handoffs happen when an issue spans multiple domains.

That is where an orchestration layer comes in. Communication protocols like MCP solve part of the challenge. Orchestration handles the rest by knowing what to do next and what to do when things fail.

What you observe informs how you engage. What you learn from acting improves how you observe. The loop compounds.

This foundation rests on organizational knowledge. If a runbook has not been updated in five years, the agent will act on outdated information. Concise, current runbooks let the agent behave like one of your most experienced engineers.

Process matters just as much. Change approval, downstream notifications, backups, scheduled maintenance windows. If the agent knows your operating guidelines, it can work within them.

Pilot the most mature process first. If anyone can execute the process with repeatable outcomes, the agent can add value.

Inside the e-Core AIOps Solution

Flavia introduced e-Core's AIOps offering, an end to end solution built on three layers.

The first layer is an extended IT ops team. Human experts with cross-product experience across visibility, ITSM, monitoring, collaboration, and cloud platforms. They adapt to your existing tools and workflows rather than replacing them.

The second layer is a set of specialized AI agents. Examples include:

Anomaly correlation agent that groups related incidents
Triage assistant that categorizes and assigns tickets
Root cause analysis agent that reviews logs and traces and proposes resolutions
Developer advisor that points engineers to likely code-related causes and explains the consequences of proposed fixes
Resource optimization agent that flags cost reduction opportunities
Monitoring designer agent that refines alert rules so you stop generating noise at the source

Agents support multi-cloud environments, multi-LLM configurations, and private hosting for compliance or data residency needs.

The third layer is OpsHub, an ITSM agnostic command center. OpsHub connects to the tools you already use (Datadog, PagerDuty, Splunk, ServiceNow, Jira, Slack, AWS, and others) with no rip and replace and no vendor lock in. Because it is agnostic, it preserves the expertise already built into each tool and adds an orchestration layer across them.

A real result: Food Delivery Platform

Flavia shared a case from a leading food delivery platform running 24/7 with over 120 million transactions per month. At that scale, a degraded API at dinner time translates directly into measurable revenue loss.

Before e-Core's AIOps solution, the team was handling over 100,000 alerts per month. Engineer fatigue was high. Alert quality was low. Escalations during the night were routine.

After implementation, alert volume dropped by approximately 73%. Critical incidents were isolated and prioritized, which accelerated response and resolution for the issues that mattered. Once operations reached a stable state, the company had enough confidence in their IT foundation to expand into financial services as a new line of business.

That expansion was possible because the team trusted that operations could absorb the additional load.

What OpsHub looks like in practice

During the live demo, Flavia walked through the OpsHub command center. A few data points from the platform view:

Mean time to detect: 12 minutes, down from 18 the prior month
Mean time to resolve: 18 minutes, down from 27 the prior month
89% of alert noise filtered by the triage agent
87% success rate on the remediation agent

The Observe view shows active incidents, which agents are working on them, and runbooks pending approval. The Engage view surfaces what each agent is doing, with traceability from detection through triage to investigation.

In one correlation example, the anomaly correlation agent linked three alerts to the same database connection pool exhaustion with 95% confidence, citing a shared spike pattern at 7:08 a.m. Another related alert was flagged at 65% confidence because the link was plausible but not conclusive.

The Act view shows proposed runbook executions. Each proposed action includes the steps the agent plans to take (for example, scaling an ECS service from 3 to 9 tasks), the estimated impact (cost, outage risk), and an approve or reject control.

Agents wait for human approval before executing disruptive changes, the same way a human engineer would need change management approval before modifying production.

This was the point Flavia emphasized: AI agents are IT assets, and they should follow the same governance process as your engineers.

-> Read the full case: How an on-demand delivery platform reduced MTTR by 70% with AI-driven operations

Governance and the Agent Catalog

OpsHub also provides a rules view where teams can see every agent, its integrations, its confidence level, and its success rate. Before deploying a new agent, teams can search the existing catalog to avoid duplicating capabilities.

The orchestration view shows every active incident and where it sits in the flow: ingested, needing information, under triage, pending security review, being correlated, or under root cause analysis.

Integrations are visible along with their connection status. Some tools appear as connected, others as degraded and requiring attention.

-> Read about: AI and governance: Rethink IT control and accountability

Practical steps to prevent IT incidents with Agentic AI

Flavia closed with a short checklist for teams preparing to apply agentic AI to IT operations:

Assess your maturity. If your data, monitoring rules, runbooks, and knowledge base are not reliable, agents will act on bad information.
Remove silos by connecting agents across tools.
Tie agents to the process your engineers already follow.
Plan cost from the start. AI at scale gets expensive if you do not model it early.
Start small. Pilot a well understood process before scaling to complex ones.

Treat AI as a force multiplier for your tools and people rather than a replacement.

Can agents follow custom processes?

When do you stop requiring human approval?

How is this different from native AI in tools like Atlassian Rovo?

How do you identify the most mature process to pilot first?

Inter's Jira Cloud Migration Saves $200K a Year

July 23, 2026

See how Banco Inter migrated to Jira Cloud, cut $200K in annual costs, and boosted team efficiency with e-Core as its implementation partner.

Why your AI tools haven't fixed alert noise yet

By Flávia Batista • July 10, 2026

There is a belief that runs deep inside IT operations teams: a noisy environment is a healthy one. If alerts are firing constantly, if tickets are piling up, if the on-call rotation is getting hit at 2am, that means monitoring is working. The tools are catching things. I understand where this comes from. In the early days of observability, silence was genuinely suspicious. A quiet dashboard often meant a gap in coverage, a misconfigured rule, something important slipping through undetected. So teams learned to treat volume as proof, and that instinct stayed long after the environment around it changed. But noise is not proof that monitoring is working. In most cases, it is proof that something upstream was never fixed. Automation at the wrong end When alert volume becomes unsustainable, the response is almost always the same. Leadership looks at the backlog (a thousand tickets a day, engineers buried, SLAs slipping) and reaches for automation at the remediation end of the pipeline: AI agents plugged into monitoring tools, scripts that fire when incident X arrives, routing logic that moves tickets faster. These are reasonable responses to an unreasonable situation, but they treat cost rather than cause. Most of those alerts should never have been generated. Processing them faster does not change why they exist.

An unexpected gain of Agentic AI: retaining senior engineers

By Adriele Radmann • June 22, 2026

Senior engineers lose 30% of their time to operational overhead. Discover how Agentic AI reclaims their cognitive bandwidth and prevents burnout-driven attrition.

How to Prevent IT Incidents with Agentic AI: Webinar Recap

The real cost of reactive IT Operations

Automation and AI: Use both, in the right order

The maturity journey: Three levels

1

2

3

Breaking down the silos

The flywheel: Observe, Engage, Act

Verify, then trust

Inside the e-Core AIOps Solution

A real result: Food Delivery Platform

What OpsHub looks like in practice

Governance and the Agent Catalog

Practical steps to prevent IT incidents with Agentic AI

Q&A Highlights

Can agents follow custom processes?

When do you stop requiring human approval?

How is this different from native AI in tools like Atlassian Rovo?

How do you identify the most mature process to pilot first?

You may also be interested in:

Inter's Jira Cloud Migration Saves $200K a Year

Why your AI tools haven't fixed alert noise yet

An unexpected gain of Agentic AI: retaining senior engineers

Vamos reinventar o seu futuro.

Canal de ética

Código de Ética e Conduta

Política de Privacidade | Política de Qualidade

By Surfe Digital

Transform your vision into reality

SOLUTIONS

FOR INDUSTRIES

WITH PARTNERS

MTTR reduzido em quase 70%: como a e-Core uniu automação com IA e monitoramento ativo para escalar a operação de entregas por aplicativo

Guia Completo Para Amazon Q