SAP ERP data integration to Databricks

SAP ECC or S/4 data integration to Databricks

Motivation

  • make RAW SAP data available in the datalake for advanced analytics and ML purposes
    • SAP extracts and SAP BW data model contains a limited number of tables and columns/fields - nothing more than what is needed for reporting
    • analytics team with SQL/python skills can build the analytical data model E2E without any dependencies on the SAP team with CDSViews, ABAP Extractors, … skills
    • ELT > ETL → real datalake approach
  • combine SAP and non-SAP data easily
  • (near) real-time data availability if necessary
  • incremental data load is always available (CDC - change data capture)
    • not always possible with SAP Extractors or CDSViews
  • reporting can be done via PowerBI/Tableau
  • more stable performance on bigger data or complex reports
  • selected BW components can be moved/re-engineered to Databricks

Our approach

image
  1. Ingesting RAW SAP S/4 tables
  2. Metadata extraction (data types, table & column descriptions, associations, …)
  3. Data model building and reports/dashboards creation
    1. By using pre-built CDS-views in S/4 and converting them into Databricks views
    2. By using our pre-built SAP data model