Full Data Ingestion

Data Ingestion Project Schedule

This page describes an expected schedule of a full SAP → Databricks data ingestion project. You may also prefer our alternative use-case-driven approach.

1. PoC Phase

  • Initial scope/data-set definition + other requirements
  • Check connection onprem->cloud, bandwidth, Azure resources availability
  • Check available versions of SAP tools and expected data-size
  • Outcome: working connection SAP -> datalake, first sample table transferred

2. Initial Load phase

  • Proper set-up/fine-tune of missing pieces: firewalls, permissions, connection, bandwidth
  • Full performance testing
  • Outcome: Required data-set ingested from SAP to datalake bronze

3. Go-To-Production phase

  • Incremental load setup
  • Access controls setup, governance, compliance
  • Monitoring & alerting & operations setup
  • Definition of standard procedures „How to add new data-set, change existing“, …
  • Outcome: new data is being ingested to datalake reliably, everything is properly governed and compliant

4. Business-as-usual

  • Running standard procedures
  • Changing/improving existing standard procedures, creating the new ones
  • Outcome: working on change management, iterative improvement

Expected project roadmap

image