How SRE Teams Use Cost of Delay to Prioritize Reliability Work cover art

How SRE Teams Use Cost of Delay to Prioritize Reliability Work

How SRE Teams Use Cost of Delay to Prioritize Reliability Work

Listen for free

View show details
Lucas and Luna explore how SRE teams at companies like Spotify and Etsy use 'cost of delay' — a concept borrowed from product management — to quantify the business impact of reliability work. Lucas explains the math behind deferring a reliability project, using a real-world example: a payment-processing team deciding whether to fix a latency issue or build a new feature. Luna pushes back on the difficulty of estimating delay costs, and they discuss a practical framework — weighted shortest job first (WSJF) — that helps teams rank reliability initiatives alongside feature work. The episode includes a concrete example: if deferring an SRE project by one quarter costs $200,000 in incident-related losses, the team can calculate the cost of delay per week and compare it to the effort required. Listeners learn how to present reliability investments in the language executives understand: dollars and time. The conversation closes with a reflection on how cost of delay changes the conversation from 'how reliable should we be?' to 'what happens if we defer this work?' #SiteReliabilityEngineering #CostOfDelay #WSJF #Spotify #Etsy #SREPrioritization #ReliabilityEngineering #IncidentResponse #Technology #BusinessCase #ProductManagement #WeightedShortestJobFirst #SREMetrics #LatencyOptimization #FexingoBusiness #BusinessPodcast #TechPodcast #SREPodcast Keep every episode free: buymeacoffee.com/fexingo
adbl_web_anon_alc_button_suppression_t1
No reviews yet