Azure DataFactory Data Extraction

Data Ingestion from SAP ERP via Azure DataFactory

1. Scaling data ingest for 100s or 1000s of SAP data sources

Scaling Azure DataFactory SAP setup for a larger number of tables is challenging. For each data source, the following settings must be defined manually:

  • all primary keys each incrementally-updated ingested table
  • SAP ODP source name (for CDS Views)
  • ingest mode (full or incremental - based on the data source type)

Our Accelerator helps you to automate all these steps fully using a simple configuration.

image

2. Automate incremental data load

We’ve prepared a python package to effectively merge the CDC events into a Delta table, apply SCD2 transformations to create flat tables registered in the Databricks Unity Catalog, and many more.

image

3. Enrich Databricks schema with SAP metadata, fix types

Supports SAP objects ingested using either CDSViews or extractors.

ADF-ingested table schema (TCURR table)

image

Table schema fixed by the Eviden SAP Accelerator

image

Automatically applied schema changes

  • Primary key definition
  • Missing nullability constraints (NULL or NOT NULL) applied
  • Column comments transfer
  • Data type fixes:
    • Most of the DATE fields are ingested as strings (in multiple formats)
      • DATS date (YYYYMMDD)
      • DATUM_INV integer-based date values
    • DECIMAL length fixes

4. Enable hierarchies extraction

Extracting SAP hierarchies for Cost Centers, Profit Centers and others is not supported. (see issue). Neither SAPI Extractors (0COSTCENTER_0101_HIER, 0GL_ACCOUNT_T011_HIER, …) nor extraction CDSViews (I_CostCenterHierarchyNode work).

Solution: Create a custom CDSView that allows ODP-based extraction.

💡
Contact us to obtain custom CDSViews for all SAP hierarchies

Return to SAP ERP data integration to Databricks