Tech Talk: Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark

Delta Lake
Thu, Apr 21, 9:00 AM (PDT)

About this event

Join us for a live tech talk and learn about architecting for data quality in the lakehouse with delta Lake and PySpark. After the presentation, we’ll have time for questions. Excited to have you join us!

From null values and duplicate rows to modeling errors and schema changes, data can break for millions of reasons. To combat this, teams are increasingly adopting best practices from DevOps and software engineering to identify, resolve, and even prevent this "data downtime" from happening in the first place. Join Prateek Chawla and Ryan Kearns as they walk through how data and ML engineers can solve for data quality across the data lakehouse by applying data observability techniques. Topics to be discussed include: how to optimize for data reliability across your lakehouse's metadata, storage, and query engine tiers, building your own data observability monitors with PySpark, and the role of tools like Delta Lake to scale this design. 

Speakers


Hosts

  • Karen Bajza-Terlouw

    Karen Bajza-Terlouw

    Databricks

    Community Program Manager

  • Denny Lee

    Denny Lee

    Databricks

    Developer Advocate

    See Bio

Organizers