How Datadog Rebuilt Its Observability Pipeline for 100 Trillion Events Daily

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

How Datadog Rebuilt Its Observability Pipeline for 100 Trillion Events Daily

Listen for free

View show details

Episode 79 of The CTO Podcast dives into the engineering behind Datadog's core pipeline. Hosts Lucas and Luna unpack how Datadog re-architected its ingestion, processing, and storage layers to handle over 100 trillion events per day by mid-2026. They explore the shift from a monolithic intake to a sharded, stream-oriented architecture, the decision to build custom compression rather than use off-the-shelf codecs, and how the team maintained sub-second query latencies while scaling throughput by 10x over three years. Along the way, they discuss tradeoffs between consistency and availability, the role of probabilistic data structures for sampling, and why Datadog eventually rewrote parts of its query engine in Rust. This episode offers a concrete look at what it takes to keep observability observant when the data never stops growing. Perfect for engineering leaders and senior architects wrestling with scale. #Datadog #Observability #DataPipeline #Architecture #Engineering #Scalability #StreamProcessing #Compression #Rust #QueryEngine #APM #Telemetry #CloudInfrastructure #BigData #SRE #BusinessAndTechnology #FexingoBusiness #BusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

No reviews yet