Teaching Machines to Debug Machines: Building an AI-Powered RCA Agent for Sylva-Core

by Mihai Zaharia, Orange

📍 Atlas 2 AI / ML Advanced

12:30 – 13:00

In modern cloud-native environments, operators face hundreds of alerts daily from Kubernetes clusters, network functions, and infrastructure components. Traditional approaches require manual investigation, correlation across multiple tools, and, let's face it, tribal knowledge that doesn't quite scale well.

This talk focuses on our journey building an RCA agent that autonomously investigates and tries to get the root cause of alerts in a Kubernetes-based telco cloud environment.
As the base infrastructure we used `sylva-core` to deploy our k8s clusters and manage the monitoring stack.

We'll go through the steps that we took to build this , focusing on:
- Choosing the right framework .
- Prompt engineering and design patterns.
- Model Context Protocol
- Memory systems

We'll walk through real examples of the agent investigating Kubernetes pod failures, correlating them with similar past incidents, and providing actionable RCA without human intervention.