Data Ingestion Tools from SAP
Integrating data from SAP into Databricks might be a tricky process, especially when working with large amounts of data (10+ TB/month). We’ve prepared a comprehensive comparison to help you decide which tool is the best for your scenarios and use-cases.
Update: Current situation (Q2/2024)
Currently, the most capable, compliant and easy-to-setup SAP data extraction tools are:
costs per year (100 data sources) | ~$60k 1 + $5k/20 GBs outbound data transfer | ~$50k | ~$50k |
operational complexity | medium | low | low |
advantages | * best compatibility, most features
* pipelines creation automation via API | * ABAP Addon to be installed to NetWeaver (little additional management needed)
* support for all import data sources (tables, extractors, CDSViews, SAP BW, …)
* API for CI/CD integration
* reasonable pricing
| * ABAP Addon to be installed to NetWeaver (little additional management needed)
* delta extraction support for tables and CDSViews
|
disadvantages | * very expensive
* separate app that requires more management | * no CDSView delta extraction (in development)
* supports SAP Rise private cloud edition only | * no extractors support, BW extraction support
* no API for automated pipeline creation (in development)
* no parquet/delta format support (only CSV)
* supports SAP Rise private cloud edition only |
Qlik Replicate is a mature SAP data extraction tool. However, it only supports trigger-based table replication for HANA tables and does not support compliant (non-ODP) extraction of CDS views. Since it runs as a standalone application, it requires more management compared to ABAP-addon based solutions.
Azure DataFactory SAP CDC connector is not compliant with the recent SAP Note. As a result, most companies are hesitant to use it for new projects. Although Azure provides a new compliant ODATA-based connector, it has several limitations, especially when extracting larger datasets.