How Data Teams Are Validating Pipelines with Schema-on-Read cover art

How Data Teams Are Validating Pipelines with Schema-on-Read

How Data Teams Are Validating Pipelines with Schema-on-Read

Listen for free

View show details
Episode 80 of The Data Business Podcast explores how data teams are using schema-on-read validation to catch pipeline failures before they corrupt downstream analytics. Lucas and Luna discuss a real case at a mid-sized e-commerce company where a misclassified field in a Parquet file caused a $200,000 reporting error. They break down the difference between schema-on-write and schema-on-read, explain how tools like Apache Iceberg and Delta Lake enable late-binding schema enforcement, and walk through the trade-offs: flexibility versus performance. The episode also covers how this approach fits into broader data observability and data contract strategies, with practical advice on when to use schema registries like Confluent Schema Registry versus file-level validation. Listeners learn one concrete technique they can apply to their own pipelines. #SchemaOnRead #DataValidation #ApacheIceberg #DeltaLake #DataObservability #DataContracts #DataPipelines #Parquet #SchemaRegistry #DataQuality #DataEngineering #Analytics #EcommerceData #PipelineReliability #BusinessAndTechnology #FexingoBusiness #BusinessPodcast #TheDataBusinessPodcast Keep every episode free: buymeacoffee.com/fexingo
adbl_web_anon_alc_button_suppression_t1
No reviews yet