You Can’t Prevent Every Outage, But You Can See Them Coming

by Florin Loghiade, UiPath

📍 Atlas 1 Platform Engineering Intermediate

12:30 – 13:00

You do not see the next outage during the incident. You see it after, if you learn correctly.

This talk shows how post-incident data can be turned into early warning signals using Azure-native telemetry and automation. Instead of long postmortems, each incident produces a minimal, structured record capturing what failed, how it was detected, response time, recent changes, and customer impact.

We will see the method by which this data can be derived from Azure Monitor, Application Insights, Log Analytics, deployment stages, and Service Health, and then reused to surface patterns such as recurring failure modes, risky deployments, slow detection paths, and near misses.

These patterns are fed into a simple change risk model biased by error budget trends, blast radius, rollback capability, and historical failure rate. The output is not approval, but a clear green, yellow, or red signal before future changes go live.

AI is used to correlate signals and summarize trends, decreasing analysis toil and improving reliability without adding process or friction.