SAP DataSphere and Databricks

SAP DataSphere and Databricks

SAP recently announced SAP DataSphere - the next generation of SAP Data Warehouse Cloud (DWC). This article covers the newly introduced SAP integration with Databricks.

The initial version of the integration is based on JDBC and relies on older but proven integration components (SDI CamelJDBCAdapter). In the follow-up article, also SAP promises “a bi-directional integration between SAP Datasphere—with SAP data’s complete business context” as part of the improved version of the integration connector.”

DataSphere is nothing really new (yet)

Based on the information from our SAP experts, the DataSphere is basically “SAP Data Warehouse Cloud with a new name”.

  • Despite being on the market for more than 3 years, the SAP Data Warehouse cloud has a very low adoption rate among our clients.
  • SDI Camel JDBCAdapter is nothing new - already 7+ years on the market.
  • Neither a roadmap nor any further technical details for the future bi-directional integration have been published by Databricks or SAP yet. The question is whether it will really be much better than JDBC.
  • No information on SAP license-based data access restrictions yet.

Current options for ad-hoc SAP HANA ← Databricks connection

A Databricks cluster can either connect to Hana via the HANA JDBC driver or HDBCLI - HANA database Python client.

Please note that the most common SAP runtime license prohibits direction connection to data stored in HANA (an enterprise license is needed).

Alternatively, NetWeaver RFC SDK can be used to connect directly to the SAP NetWeaver application platform or the Business Warehouse (BW). See also supported platforms and requirements (pyrfc package).

None of the approaches mentioned above is considered reliable for production-ready deployments.

Our recommendation for production-ready deployments

Full data extraction from the primary SAP systems and BW to the data lake using tools mentioned on the data ingestion page.