SAP Data Post-processing Framework

We developed a simple-but-powerful framework to simplify and automate everyday tasks when working with SAP data in Databricks.

Automated conversion of CDC event stream to SCD2 tables

If you use CDC (Changed Data Capture) based data ingestion tool like Azure Data Factory or SAP Data Intelligence, you will need to convert all those change events into a historized fact table.

By adding a few lines to the YAML config file of our framework, you can flatten the CDC events automatically. In the following example, the CDC events will be flattened using the primary key column MANDT.

The OPTYPE column values describe whether the event is an insert, update or delete.

OPTIME	MANDT	BUKRS	LIFNR	CNTR	AR_YEAR	OPTYPE
20220718140000.0	100	465465	32714986	CZ	2022	i
20220718150000.0	100	132132	32714986	CZ	2022	u
20220718151500.0	000	123	1684684	DE	2022	i

cdc_ingest:
	my_table:
    source_path: "stage/raw/my_table.parquet"
		primary_keys: ["MANDT"]
    partition_by: ["AR_YEAR"]
		transformation: ["currency_shifting"] # fixing SAP floating point incompatibility

All the events get automatically converted into a fact table my_table

OPTIME	MANDT	BUKRS	LIFNR	CNTR	AR_YEAR
20220718150000.0	100	132132	32714986	CZ	2022
20220718151500.0	000	123	1684684	DE	2022

Automated data type conversion, schema enforcement

RAW data are usually stored as string values only. Our framework supports loading original SAP schemas and data types and applying them to the datalake raw data to produce Hive/Unity Catalog tables with the correct schema.

# generic data types mapping
data_type_conversions:
- abap_type: ACCP
  raw_type: STRING
  hive: VARCHAR
- abap_type: CHAR
  raw_type: STRING
  hive: VARCHAR
- abap_type: CURR
  raw_type: DECIMAL
  hive: DECIMAL

SAP tables & columns metadata to be transferred to Databricks

table_name	column_name	column_comment	source_type	decimals	length
/BEV3/CHVTMANFO	RI_AVANF	Rule Identifier	NUMC	0	6
CKI_A21_ML	BUKRS	Company Code	CHAR	0	4
06FC_S_TF123	SALK3_2KA	Value of total valuated stock	CURR	2	21

Simplified stream processing

By stream processing simplification

id: "abc123_billing_doc"
description: "Billing documents"
source_systems: ["vfq", "lena", "mariot", "xf99"]
primary_keys: ["ID_MA", "VB_ID", "XA"]
transformation_function: ["mycomp.billing_stream_transform"]

Data Reconciliation

← Return to homepage

SAP data postprocessing

SAP Data Post-processing Framework

Automated conversion of CDC event stream to SCD2 tables

Automated data type conversion, schema enforcement

Simplified stream processing

Data Reconciliation