Teaching Machines to Debug Machines: Building an AI-Powered RCA Agent for Sylva-Core
by Mihai Zaharia, Orange
12:30 – 13:00
In modern cloud-native environments, operators face hundreds of alerts daily from Kubernetes clusters, network functions, and infrastructure components. Traditional approaches require manual investigation, correlation across multiple tools, and, let's face it, tribal knowledge that doesn't quite scale well.
This talk focuses on our journey building an RCA agent that autonomously investigates and tries to get the root cause of alerts in a Kubernetes-based telco cloud environment.
As the base infrastructure we used `sylva-core` to deploy our k8s clusters and manage the monitoring stack.
We'll go through the steps that we took to build this , focusing on:
- Choosing the right framework .
- Prompt engineering and design patterns.
- Model Context Protocol
- Memory systems
We'll walk through real examples of the agent investigating Kubernetes pod failures, correlating them with similar past incidents, and providing actionable RCA without human intervention.