Logo

    SAP data postprocessing

    🏠
    Home

    SAP Data Post-processing Framework

    We developed a simple-but-powerful framework to simplify and automate everyday tasks when working with SAP data in Databricks.

    Automated conversion of CDC event stream to SCD2 tables

    If you use CDC (Changed Data Capture) based data ingestion tool like Azure Data Factory or SAP Data Intelligence, you will need to convert all those change events into a historized fact table.

    By adding a few lines to the YAML config file of our framework, you can flatten the CDC events automatically. In the following example, the CDC events will be flattened using the primary key column MANDT.

    The OPTYPE column values describe whether the event is an insert, update or delete.

    OPTIME
    MANDT
    BUKRS
    LIFNR
    CNTR
    AR_YEAR
    OPTYPE
    20220718140000.0
    100
    465465
    32714986
    CZ
    2022
    i
    20220718150000.0
    100
    132132
    32714986
    CZ
    2022
    u
    20220718151500.0
    000
    123
    1684684
    DE
    2022
    i
    cdc_ingest:
    	my_table:
        source_path: "stage/raw/my_table.parquet"
    		primary_keys: ["MANDT"]
        partition_by: ["AR_YEAR"]
    		transformation: ["currency_shifting"] # fixing SAP floating point incompatibility

    All the events get automatically converted into a fact table my_table

    OPTIME
    MANDT
    BUKRS
    LIFNR
    CNTR
    AR_YEAR
    20220718150000.0
    100
    132132
    32714986
    CZ
    2022
    20220718151500.0
    000
    123
    1684684
    DE
    2022

    Automated data type conversion, schema enforcement

    RAW data are usually stored as string values only. Our framework supports loading original SAP schemas and data types and applying them to the datalake raw data to produce Hive/Unity Catalog tables with the correct schema.

    # generic data types mapping
    data_type_conversions:
    - abap_type: ACCP
      raw_type: STRING
      hive: VARCHAR
    - abap_type: CHAR
      raw_type: STRING
      hive: VARCHAR
    - abap_type: CURR
      raw_type: DECIMAL
      hive: DECIMAL

    SAP tables & columns metadata to be transferred to Databricks

    table_name
    column_name
    column_comment
    source_type
    decimals
    length
    /BEV3/CHVTMANFO
    RI_AVANF
    Rule Identifier
    NUMC
    0
    6
    CKI_A21_ML
    BUKRS
    Company Code
    CHAR
    0
    4
    06FC_S_TF123
    SALK3_2KA
    Value of total valuated stock
    CURR
    2
    21

    Simplified stream processing

    By stream processing simplification

    id: "abc123_billing_doc"
    description: "Billing documents"
    source_systems: ["vfq", "lena", "mariot", "xf99"]
    primary_keys: ["ID_MA", "VB_ID", "XA"]
    transformation_function: ["mycomp.billing_stream_transform"]

    Data Reconciliation

    Data Reconciliation

    ← Return to homepage