Tech Talk: Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark

Name: Tech Talk: Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark
Start: 2022-04-21T09:00:00-07:00
End: 2022-04-21T10:00:00-07:00

Delta Lake

Apr 21, 2022, 4:00 – 5:00 PM

306

RSVPs

deltalake

About this event

Join us for a live tech talk and learn about architecting for data quality in the lakehouse with delta Lake and PySpark. After the presentation, we’ll have time for questions. Excited to have you join us! 
From null values and duplicate rows to modeling errors and schema changes, data can break for millions of reasons. To combat this, teams are increasingly adopting best practices from DevOps and software engineering to identify, resolve, and even prevent this "data downtime" from happening in the first place.  Join Prateek Chawla and Ryan Kearns as they walk through how data and ML engineers can solve for data quality across the data lakehouse by applying data observability techniques. Topics to be discussed include: how to optimize for data reliability across your lakehouse's metadata, storage, and query engine tiers, building your own data observability monitors with PySpark, and the role of tools like Delta Lake to scale this design. 

Speakers

Prateek Chawla

Monte Carlo

Founding Engineer and Technical Lead

Ryan Kearns

Monte Carlo

Founding Data Scientist

Hosts

Karen Bajza-Terlouw

Databricks

Community Program Manager

Denny Lee

Databricks

Developer Advocate

Organizer

Carly Akerly

Marketing Manager

See bio