SAP ERP data integration to Databricks

SAP ECC or S/4 → Databricks integration

image

Motivation & Best Practices

  • Make SAP datasources available in the Lakehouse for advanced analytics and ML
  • Combine SAP and non-SAP data easily
  • Make data available in (near) real-time
  • Use PowerBI/Tableau for reporting

1. SAP ERP → Bronze

  • Azure DataFactory and SAP Datasphere support (more connectors to come)
  • RAW tables and Extractors/CDSViews support
  • Automated extraction pipelines creation
  • read more

2. Bronze → Silver

  • unity catalog COPY INTO from bronze
  • human-readable column names generation (for RAW SAP tables)
  • table + columns descriptions transfer, primary keys definition
  • Currency conversion and hierarchies transition
  • Data types normalization
  • read more

3. Silver → Gold

  • Texts + Master Data Integration (master tables to texts tables association)
  • Automated surrogate key creation (single column PK)
    • Texts → Master Data
    • Facts → Master Data
  • Reporting views creation

4. PowerBI reporting

  • KPIs definition
  • Pre-build PowerBI templates
  • See examples (Github)

SAP ECC or S/4HANA→ Databricks Data Extraction (Bronze)
Eviden’s SAP Data Model for Databricks