How Data Teams Are Validating Pipelines with Schema-on-Read

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to wishlist failed.

Please try again later

Remove from wishlist failed.

Please try again later

Adding to library failed

Please try again

Follow podcast failed

Unfollow podcast failed

How Data Teams Are Validating Pipelines with Schema-on-Read

Listen for free

View show details

Episode 80 of The Data Business Podcast explores how data teams are using schema-on-read validation to catch pipeline failures before they corrupt downstream analytics. Lucas and Luna discuss a real case at a mid-sized e-commerce company where a misclassified field in a Parquet file caused a $200,000 reporting error. They break down the difference between schema-on-write and schema-on-read, explain how tools like Apache Iceberg and Delta Lake enable late-binding schema enforcement, and walk through the trade-offs: flexibility versus performance. The episode also covers how this approach fits into broader data observability and data contract strategies, with practical advice on when to use schema registries like Confluent Schema Registry versus file-level validation. Listeners learn one concrete technique they can apply to their own pipelines. #SchemaOnRead #DataValidation #ApacheIceberg #DeltaLake #DataObservability #DataContracts #DataPipelines #Parquet #SchemaRegistry #DataQuality #DataEngineering #Analytics #EcommerceData #PipelineReliability #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #TheDataBusinessPodcast Keep every episode free: buymeacoffee.com/fexingo

No reviews yet