Tech Talk: Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark

Delta Lake

Thursday, April 21, 2022, 4:00 – 5:00 PM UTC

306
RSVPs

deltalake

About this event

Join us for a live tech talk and learn about architecting for data quality in the lakehouse with delta Lake and PySpark. After the presentation, we’ll have time for questions. Excited to have you join us!

From null values and duplicate rows to modeling errors and schema changes, data can break for millions of reasons. To combat this, teams are increasingly adopting best practices from DevOps and software engineering to identify, resolve, and even prevent this "data downtime" from happening in the first place. Join Prateek Chawla and Ryan Kearns as they walk through how data and ML engineers can solve for data quality across the data lakehouse by applying data observability techniques. Topics to be discussed include: how to optimize for data reliability across your lakehouse's metadata, storage, and query engine tiers, building your own data observability monitors with PySpark, and the role of tools like Delta Lake to scale this design. 

Organizer

  • Carly Akerly

    Marketing Manager

Contact Us