SAP to Databricks ingestion tools comparison

Data Ingestion Tools Comparison

icon
Feel free to contact us so we can help you select the best tool for your use-cases.

There are two main ways to get data out of an SAP ERP system reliably:

1. Environments, setup

β€£
Run options

How is the tool installed and operated

  • SaaS: multitenant application managed by the tool provider
  • on-prem: application installed inside the client’s environment and managed by the client
  • agent: a small piece of software installed inside the client’s environment designed to process and send (push) data into the target cloud space
β€£
Installation needed:

What needs to be installed inside the client’s environment

  • some type of an agent
  • full app installation
  • no installation is needed (full SaaS)
NameEnvironmentRun optionsInstallation neededIndependent of SAP SLT
Fivetran HVA
Azure, AWS, GCP
SaaS + agent
β˜‘οΈ (but for low-level tables only)
Azure, AWS, GCP
on-prem
Azure only
SaaS + agent
integration runtime needs to be installed on an on-premises computer or on a virtual machine (VM).
β˜‘οΈ (SLT is recommended and needed for SAP tables extraction)
Azure, AWS, GCP
on-prem
installed as ABAP add-on on ERP server (no additional hardware required)
βœ… (but can work with SLT if already in place)
Qlik Replicate
on-premSaaS + agent
full app installation (on-prem version)
βœ… (see release log)
Azure, AWS, GCP
on-premSaaS
full app installation (on-prem version, Kubernetes)
AWS only
SaaS
❌ (fully SaaS)
β˜‘οΈ
GCP only
SaaS
❌ (fully SaaS)
β˜‘οΈ

2. Costs

β€£
Pricing

How each tool is priced.

  • consumption-based
  • no of pipeline runs, per hour of run
  • number of active pipelines

Tiers

  • pay-as-you-go (you pay for every pipeline, GB, minute of run, …)
  • tiered packages (250 pipelines, 500 pipelines, 1000 pipelines)
β€£
Free of additional costs

Does the tool require additional computing (HW/VMs) to be used?

NamePricingFree of additional costsIndependent of SAP SLT
Fivetran HVA
consumption based (# of rows processed)
❌ (agent needs additional HW)
β˜‘οΈ (but for low-level tables only)
❌ (installation needs additional HW)
❌ (integration runtime needs additional HW)
β˜‘οΈ (SLT is recommended and needed for SAP tables extraction)
# of ingestion pipelines/ingested tables (tiered)
βœ… (installed as ABAP addon onto an existing SAP Netweaver machine)
βœ… (but can work with SLT if already in place)
Qlik Replicate
❌ (installation needs additional HW)
βœ… (see release log)
❌ (installation needs additional HW)
βœ… (it’s SaaS)
β˜‘οΈ
βœ… (it’s SaaS)
β˜‘οΈ

3. Historical SAP data extraction

NameHistorical data load (primary system)Filtered historical load
Fivetran HVA
via the SAP Application Layer, High Volume Agent
via the SAP Application Layer and LDP Agent
via the SAP Application Layer
❌ (source)
utilize standard SAP select options
Qlik Replicate
βœ…
βœ…
ODP via oData only (slower)

4. Continuous SAP data extraction

β€£
Real-time support

Does the tool support (near) real-time data ingestion?

  • real-time (<1s)
  • near real-time (~1 - 5s)
  • micro-batches (e.g. every 5+ minutes)
β€£
SAP Extractors, CDS Views support

Is the tool capable of connecting to standard higher-level SAP data extraction interfaces provided by ECC or S/4?

  • BW Extractors (via ODP/SAPI)
  • CDSViews - new extraction interface for SAP S/4
  • Stream tables
  • Custom ABAP Extractors - Business Logic, ABAP API, Function Module etc.
β€£
BW data extraction support

Is the tool capable of connecting to standard higher-level SAP data extraction interfaces in BW?

  • InfoProviders - DSOs, ADSOs, Cubes
  • InfoObjects
  • HANA Calculation Views
  • BEx Queries
  • RAW tables
β€£
New records (deltas) processing for CDSViews/HANA Calculation views

Does the tool have the ability to process new records only from CDSViews or HANA Calculation views or does it always require full load?

NameReal-time supportSAP Extractors, CDSViews supportBW data extraction supportNew records (deltas) processing for CDSViews/HANA Calculation viewsApplication layer extractionDatabase layer extraction
Fivetran HVA
βœ… (via CDC)
❌ (low-level tables only)
❌ (low-level tables only)
❌
βœ…
βœ… (via CDC)
βœ… (via CDC)
βœ…
❌ (low-level tables only)
❌
βœ…
βœ… (via CDC)
❌ (batches only, every 5 minutes)
βœ…
βœ… (via ODP)
βœ…
βœ…
βœ… (via SLT and ODP)
βœ… (near-real time)
βœ…
βœ…
βœ… (via ODP)
βœ…
βœ… (via triggers)
Qlik Replicate
βœ… (log-based + trigger based)
βœ…
βœ… (via ODP)
βœ… (for the ODP connector only?)
βœ…
βœ… (via CDC, or via triggers for HANA)
βœ…
βœ…
βœ…
βœ…
βœ…
βœ…
❌ (micro-batches only)
βœ…
βœ… (via ODP)
βœ… (ODP via OData)
❌ (ODP only)
βœ… (via SLT)
βœ…
βœ… (via ODP)
βœ…
βœ… (via SLT)

5. Writing data to target

β€£
Schema conversion

Are the data and associated column types ingested from SAP converted into Databricks-compatible column types?

β€£
DeltaLake support

Is the data ingestion tool capable of writing data in the Databricks Delta format into the landing zone of the data lake?

NameSchema conversionDeltaLake support
Fivetran HVA
βœ… (source)
βœ… (source)
βœ…
βœ… (source)
βœ…
βœ… (source)
βœ…
βœ…
Qlik Replicate
βœ… (source)
βœ… (source)
❌
❌ (custom connector needed)
❌ (parquet or CSV only)

How does the incoming data processing work?

The data ingestion tool generates smaller (real-time/micro-batching) or larger (hourly/daily processing) batches of data that are written into CSV, parquet, delta or some other format and then written into the cloud storage (AWS S3, Azure Blob Storage, GCP Cloud Storage).

As soon as the file (P&L planning data of the diagram) lands in the bronze/landing zone of the data lake, Databricks can pick the file and append it to the overall P&L planning table stored in the silver layer.

image
icon
Feel free to contact us so we can help you select the best tool for your use-cases.

Application-level SAP data replication
Database-level SAP data replication