Tech Talk: Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark

Delta Lake

Apr 21, 2022, 4:00 – 5:00 PM

306
RSVPs

deltalake

About this event

Join us for a live tech talk and learn about architecting for data quality in the lakehouse with delta Lake and PySpark. After the presentation, we’ll have time for questions. Excited to have you join us!

From null values and duplicate rows to modeling errors and schema changes, data can break for millions of reasons. To combat this, teams are increasingly adopting best practices from DevOps and software engineering to identify, resolve, and even prevent this "data downtime" from happening in the first place. Join Prateek Chawla and Ryan Kearns as they walk through how data and ML engineers can solve for data quality across the data lakehouse by applying data observability techniques. Topics to be discussed include: how to optimize for data reliability across your lakehouse's metadata, storage, and query engine tiers, building your own data observability monitors with PySpark, and the role of tools like Delta Lake to scale this design. 

Speakers

  • Prateek Chawla

    Monte Carlo

    Founding Engineer and Technical Lead

  • Ryan Kearns

    Monte Carlo

    Founding Data Scientist

Hosts

  • Karen Bajza-Terlouw

    Databricks

    Community Program Manager

  • Denny Lee

    Databricks

    Developer Advocate

Organizer

  • Carly Akerly

    Marketing Manager

Contact Us