D3L2: Massive Data Processing in Adobe Experience Platform Using Delta Lake

Delta Lake

Mar 7, 2023, 5:00 – 5:45 PM

94
RSVPs

In this session, Yeshwanth Vijaykumar, Senior Engineering Manager and Architect at Adobe and our host Denny Lee will discuss how the data lake house architecture at Adobe Experience Platform combines with the Real-time Customer Profile architecture to increase our Apache Spark Batch workload throughputs and reduce costs while maintaining functionality with Delta Lake

deltalake

About this event

At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of normalized and denormalized data with various linkage scenarios powered by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. In this session, Yeshwanth Vijaykumar, Senior Engineering Manager and Architect at Adobe and our host Denny Lee will go over how we built a cost effective and scalable data pipeline using Apache Spark and Delta Lake:

  • What are we storing?
    • Multi Source – Multi Channel Problem
  • Data Representation and Nested Schema Evolution
    • Performance Trade Offs with Various formats
      • Go over anti-patterns used
    • Data Manipulation using UDFs
  • Writer Worries and How to Wipe them Away
  • Staging Tables FTW
  • Data Lake Replication Lag Tracking
  • Performance Metrics!

Speaker

  • Yeshwanth Vijaykumar

    Adobe

    Senior Engineering Manager / Architect

Host

  • Denny Lee

    Databricks

    Developer Advocate

Organizer

  • Carly Akerly

    Marketing Manager

Contact Us