Data Ingestion Tools Comparison
There are two main ways to get data out of an SAP ERP system reliably:
1. Environments, setup
How is the tool installed and operated
- SaaS: multitenant application managed by the tool provider
- on-prem: application installed inside the clientβs environment and managed by the client
- agent: a small piece of software installed inside the clientβs environment designed to process and send (push) data into the target cloud space
What needs to be installed inside the clientβs environment
- some type of an agent
- full app installation
- no installation is needed (full SaaS)
Name | Environment | Run options | Installation needed | Independent of SAP SLT |
---|---|---|---|---|
Fivetran HVA | SaaS + agent | βοΈ (but for low-level tables only) | ||
Azure, AWS, GCP | on-prem | |||
Azure only | SaaS + agent | integration runtime needs to be installed on an on-premises computer or on a virtual machine (VM). | βοΈ (SLT is recommended and needed for SAP tables extraction) | |
on-prem | installed as ABAP add-on on ERP server (no additional hardware required) | β
(but can work with SLT if already in place) | ||
Qlik Replicate | on-premSaaS + agent | full app installation (on-prem version) | ||
Azure, AWS, GCP | on-premSaaS | full app installation (on-prem version, Kubernetes) | ||
AWS only | SaaS | β (fully SaaS) | βοΈ | |
GCP only | SaaS | β (fully SaaS) | βοΈ |
2. Costs
How each tool is priced.
- consumption-based
- no of pipeline runs, per hour of run
- number of active pipelines
Tiers
- pay-as-you-go (you pay for every pipeline, GB, minute of run, β¦)
- tiered packages (250 pipelines, 500 pipelines, 1000 pipelines)
Does the tool require additional computing (HW/VMs) to be used?
Name | Pricing | Free of additional costs | Independent of SAP SLT |
---|---|---|---|
Fivetran HVA | consumption based (# of rows processed) | β (agent needs additional HW) | βοΈ (but for low-level tables only) |
β (installation needs additional HW) | |||
β (integration runtime needs additional HW) | βοΈ (SLT is recommended and needed for SAP tables extraction) | ||
# of ingestion pipelines/ingested tables (tiered) | β
(installed as ABAP addon onto an existing SAP Netweaver machine) | β
(but can work with SLT if already in place) | |
Qlik Replicate | β (installation needs additional HW) | ||
β (installation needs additional HW) | |||
β
(itβs SaaS) | βοΈ | ||
β
(itβs SaaS) | βοΈ |
3. Historical SAP data extraction
Name | Historical data load (primary system) | Filtered historical load |
---|---|---|
Fivetran HVA | via the SAP Application Layer, High Volume Agent | |
via the SAP Application Layer and LDP Agent | ||
via the SAP Application Layer | β (source) | |
utilize standard SAP select
options | ||
Qlik Replicate | ||
ODP via oData only (slower) | ||
4. Continuous SAP data extraction
Does the tool support (near) real-time data ingestion?
- real-time (<1s)
- near real-time (~1 - 5s)
- micro-batches (e.g. every 5+ minutes)
Is the tool capable of connecting to standard higher-level SAP data extraction interfaces provided by ECC or S/4?
- BW Extractors (via ODP/SAPI)
- CDSViews - new extraction interface for SAP S/4
- Stream tables
- Custom ABAP Extractors - Business Logic, ABAP API, Function Module etc.
Is the tool capable of connecting to standard higher-level SAP data extraction interfaces in BW?
- InfoProviders - DSOs, ADSOs, Cubes
- InfoObjects
- HANA Calculation Views
- BEx Queries
- RAW tables
Does the tool have the ability to process new records only from CDSViews or HANA Calculation views or does it always require full load?
Name | Real-time support | SAP Extractors, CDSViews support | BW data extraction support | New records (deltas) processing for CDSViews/HANA Calculation views | Application layer extraction | Database layer extraction |
---|---|---|---|---|---|---|
Fivetran HVA | β
(via CDC) | β (low-level tables only) | β (low-level tables only) | β | β
| β
(via CDC) |
β
(via CDC) | β
| β (low-level tables only) | β | β
| β
(via CDC) | |
β (batches only, every 5 minutes) | β
| β
(via ODP) | β
| β
| β
(via SLT and ODP) | |
β
(near-real time) | β
| β
| β
(via ODP) | β
| β
(via triggers) | |
Qlik Replicate | β
| β
| β
(via ODP) | β
(for the ODP connector only?) | β
| β
(via CDC, or via triggers for HANA) |
β
| β
| β
| β
| β
| β
| |
β (micro-batches only) | β
| β
(via ODP) | β
(ODP via OData) | β (ODP only) | ||
β
(via SLT) | β
| β
(via ODP) | β
| β
(via SLT) |
5. Writing data to target
Are the data and associated column types ingested from SAP converted into Databricks-compatible column types?
Is the data ingestion tool capable of writing data in the Databricks Delta format into the landing zone of the data lake?
How does the incoming data processing work?
The data ingestion tool generates smaller (real-time/micro-batching) or larger (hourly/daily processing) batches of data that are written into CSV, parquet, delta or some other format and then written into the cloud storage (AWS S3, Azure Blob Storage, GCP Cloud Storage).
As soon as the file (P&L planning data of the diagram) lands in the bronze/landing zone of the data lake, Databricks can pick the file and append it to the overall P&L planning table stored in the silver layer.