SAP Data Post-processing Framework
We developed a simple-but-powerful framework to simplify and automate everyday tasks when working with SAP data in Databricks.
Automated conversion of CDC event stream to SCD2 tables
If you use CDC (Changed Data Capture) based data ingestion tool like Azure Data Factory or SAP Data Intelligence, you will need to convert all those change events into a historized fact table.
By adding a few lines to the YAML config file of our framework, you can flatten the CDC events automatically. In the following example, the CDC events will be flattened using the primary key column MANDT
.
The OPTYPE
column values describe whether the event is an insert, update or delete.
OPTIME | MANDT | BUKRS | LIFNR | CNTR | AR_YEAR | OPTYPE |
20220718140000.0 | 100 | 465465 | 32714986 | CZ | 2022 | i |
20220718150000.0 | 100 | 132132 | 32714986 | CZ | 2022 | u |
20220718151500.0 | 000 | 123 | 1684684 | DE | 2022 | i |
cdc_ingest:
my_table:
source_path: "stage/raw/my_table.parquet"
primary_keys: ["MANDT"]
partition_by: ["AR_YEAR"]
transformation: ["currency_shifting"] # fixing SAP floating point incompatibility
All the events get automatically converted into a fact table my_table
OPTIME | MANDT | BUKRS | LIFNR | CNTR | AR_YEAR |
20220718150000.0 | 100 | 132132 | 32714986 | CZ | 2022 |
20220718151500.0 | 000 | 123 | 1684684 | DE | 2022 |
Automated data type conversion, schema enforcement
RAW data are usually stored as string values only. Our framework supports loading original SAP schemas and data types and applying them to the datalake raw data to produce Hive/Unity Catalog tables with the correct schema.
# generic data types mapping
data_type_conversions:
- abap_type: ACCP
raw_type: STRING
hive: VARCHAR
- abap_type: CHAR
raw_type: STRING
hive: VARCHAR
- abap_type: CURR
raw_type: DECIMAL
hive: DECIMAL
SAP tables & columns metadata to be transferred to Databricks
table_name | column_name | column_comment | source_type | decimals | length |
/BEV3/CHVTMANFO | RI_AVANF | Rule Identifier | NUMC | 0 | 6 |
CKI_A21_ML | BUKRS | Company Code | CHAR | 0 | 4 |
06FC_S_TF123 | SALK3_2KA | Value of total valuated stock | CURR | 2 | 21 |
Simplified stream processing
By stream processing simplification
id: "abc123_billing_doc"
description: "Billing documents"
source_systems: ["vfq", "lena", "mariot", "xf99"]
primary_keys: ["ID_MA", "VB_ID", "XA"]
transformation_function: ["mycomp.billing_stream_transform"]