D3L2: Massive Data Processing in Adobe Experience Platform Using Delta Lake

Name: D3L2: Massive Data Processing in Adobe Experience Platform Using Delta Lake
Start: 2023-03-07T09:00:00-08:00
End: 2023-03-07T09:45:00-08:00

Delta Lake

Mar 7, 2023, 5:00 – 5:45 PM

94

RSVPs

In this session, Yeshwanth Vijaykumar, Senior Engineering Manager and Architect at Adobe and our host Denny Lee will discuss how the data lake house architecture at Adobe Experience Platform combines with the Real-time Customer Profile architecture to increase our Apache Spark Batch workload throughputs and reduce costs while maintaining functionality with Delta Lake

deltalake

About this event

At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of normalized and denormalized data with various linkage scenarios powered by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. In this session, Yeshwanth Vijaykumar, Senior Engineering Manager and Architect at Adobe and our host Denny Lee will go over how we built a cost effective and scalable data pipeline using Apache Spark and Delta Lake:
What are we storing?
Multi Source – Multi Channel Problem
Data Representation and Nested Schema Evolution
Performance Trade Offs with Various formats
Go over anti-patterns used 
Data Manipulation using UDFs 
Writer Worries and How to Wipe them Away
Staging Tables FTW 
Data Lake Replication Lag Tracking
Performance Metrics!

Speaker

Yeshwanth Vijaykumar

Adobe

Senior Engineering Manager / Architect

Host

Denny Lee

Databricks

Developer Advocate

Organizer

Carly Akerly

Marketing Manager

See bio