SAP data postprocessing

SAP Data Post-processing Framework

We developed a simple-but-powerful framework to simplify and automate everyday tasks when working with SAP data in Databricks.

Automated conversion of CDC event stream to SCD2 tables

If you use CDC (Changed Data Capture) based data ingestion tool like Azure Data Factory or SAP Data Intelligence, you will need to convert all those change events into a historized fact table.

By adding a few lines to the YAML config file of our framework, you can flatten the CDC events automatically. In the following example, the CDC events will be flattened using the primary key column MANDT.

The OPTYPE column values describe whether the event is an insert, update or delete.

OPTIME
MANDT
BUKRS
LIFNR
CNTR
AR_YEAR
OPTYPE
20220718140000.0
100
465465
32714986
CZ
2022
i
20220718150000.0
100
132132
32714986
CZ
2022
u
20220718151500.0
000
123
1684684
DE
2022
i
cdc_ingest:
	my_table:
    source_path: "stage/raw/my_table.parquet"
		primary_keys: ["MANDT"]
    partition_by: ["AR_YEAR"]
		transformation: ["currency_shifting"] # fixing SAP floating point incompatibility

All the events get automatically converted into a fact table my_table

OPTIME
MANDT
BUKRS
LIFNR
CNTR
AR_YEAR
20220718150000.0
100
132132
32714986
CZ
2022
20220718151500.0
000
123
1684684
DE
2022

Automated data type conversion, schema enforcement

RAW data are usually stored as string values only. Our framework supports loading original SAP schemas and data types and applying them to the datalake raw data to produce Hive/Unity Catalog tables with the correct schema.

# generic data types mapping
data_type_conversions:
- abap_type: ACCP
  raw_type: STRING
  hive: VARCHAR
- abap_type: CHAR
  raw_type: STRING
  hive: VARCHAR
- abap_type: CURR
  raw_type: DECIMAL
  hive: DECIMAL

SAP tables & columns metadata to be transferred to Databricks

table_name
column_name
column_comment
source_type
decimals
length
/BEV3/CHVTMANFO
RI_AVANF
Rule Identifier
NUMC
0
6
CKI_A21_ML
BUKRS
Company Code
CHAR
0
4
06FC_S_TF123
SALK3_2KA
Value of total valuated stock
CURR
2
21

Simplified stream processing

By stream processing simplification

id: "abc123_billing_doc"
description: "Billing documents"
source_systems: ["vfq", "lena", "mariot", "xf99"]
primary_keys: ["ID_MA", "VB_ID", "XA"]
transformation_function: ["mycomp.billing_stream_transform"]

Data Reconciliation

Data Reconciliation