Retry and Failure Handling Strategy for CDC Merge Pipeline from Kafka to Databricks and Hyperscale
In our CDC ingestion architecture, we are processing incremental changes( 3000-30,000 events/sec) , 800 topics for 800 tables from IBM DB2 using Kafka topics (via IBM InfoSphere CDC), with the following two stages: Kafka to Databricks Silver Layer: We…
Azure Databricks
When to Use MERGE INTO vs APPLY CHANGES INTO in Databricks CDC Pipelines
Background: In our CDC pipeline, we use Databricks to process Kafka CDC data (I/U/D events) into Delta tables. We’re evaluating whether to continue using MERGE INTO or shift to APPLY CHANGES INTO. ❓ Questions for Microsoft: When should we prefer APPLY…
Azure Databricks
Kafka Partitionings vs DB partitions
We are working on a large-scale CDC ingestion pipeline after completion of One time historicsl Migration where we have already imported 80 TB of data vi ADF to bronze layer where: Source: IBM DB2 (on-prem) CDC Tool: IBM InfoSphere CDC publishes to…
Azure Databricks
Guidance on Connecting Azure Databricks to External Kafka Cluster (GCP-Hosted) for Structured Streaming Ingestion
We are implementing a real-time ingestion pipeline where Azure Databricks (in our tenant) consumes CDC data directly from a Kafka cluster hosted on GCP (external to Azure). The Kafka topics are populated by IBM InfoSphere CDC and are available in…
Azure Databricks
Hash calculation strategy for datatypes mismatch
In our current project, we are migrating data from an on-premises IBM DB2 system to Azure SQL Hyperscale, using Azure Databricks for transformation and reconciliation. This includes both batch and CDC-based pipelines. Our project requirement is not just…
Azure Databricks
Essential Data Cleaning steps to Succeed Recon
In our project, we are migrating 80TB of production-grade data from DB2 to Azure SQL Hyperscale using ADF and Databricks. While the primary transformation is data type conversion, what essential data cleaning steps should be performed to ensure…
Azure Databricks
Handling DATETIME2 compatibility issue in Databricks during Hyperscale type alignment
In our project, we are transforming data in Azure Databricks coming from source systems (DB2 via CDC or snapshots) and storing it temporarily in Delta Lake. We later load this data into Azure SQL Hyperscale. To align with Hyperscale’s expected schema, we…
Azure Databricks
Unable to create or bring up the cluster on azure databricks. - Failed to perform resource identity operation
Hi, We have set up an Azure Databricks service along supporting services and it was working fine until the below changes were performed on Azure subscription. Details of changes: --> subscriptions was moved to a different directory Post this change,…
Azure Databricks
mpact of Kafka Partition Size on Databricks Streaming Performance When Writing to Azure SQL Hyperscale
n our project, we are using Databricks (not ADF) for both catch-up and real-time CDC ingestion from Kafka topics and writing the output directly to Azure SQL Hyperscale via JDBC. Some of our source Kafka topics (originating from DB2 CDC) may have large…
Azure Databricks
Best Practice for Schema Enforcement and Type Casting in Databricks Ingestion Pipeline
In our data pipeline, I'm considering importing source data into Databricks in raw string format (regardless of original data types), performing column-level validation, and only then applying explicit type casting to desired data types before writing to…
Azure Databricks
Integrity checks via Databricks
In our pipeline, the source data is coming from DB2 production systems and is assumed to be highly reliable. During transformation in Databricks, we are already performing column-level checks (e.g., presence of mandatory fields, null validation,…
Azure Databricks
I can't get databricks to talk to my storage account. Error 403
I can't get the data bricks to mount my data lake storage. I get error 403 no matter what I do.
Azure Databricks
CDC pipeline schema handling
We completed historical migration from DB2 to Azure SQL Hyperscale and ADLS Gen2 (Delta format, partitioned). Now building Catch-Up CDC pipelines using Kafka (via IBM CDC), ADF (orchestration), and Databricks (Delta processing). CDC data is merged with…
Azure Databricks
CDC KAFKA recon strategy
We are implementing a CDC-based ingestion using IBM InfoSphere CDC pushing data to Kafka. Downstream, the data is consumed in Databricks, processed, and written to Azure SQL Hyperscale. We use run-wise ingestion into the Bronze layer and perform a…
Azure Databricks
I am not able to create cluster. I am trying to create single node cluster. However , when I am trying to select node type all seems disabled. Also I am getting a message that "cluster cannot be created because no node is enabled for this subscription".
I am trying to create a single node cluster using my free trial account . I have selected west india as my region while creating resource group. When I am trying to create "all purpose cluster" and then I am trying to select node type, all VM…
Azure Databricks
Guidance on Designing Control Tables Across Historical Migration, Catch-up, and Streaming Phases
In our project, we are handling data ingestion in three phases: One-time historical migration (via ADF) Catch-up CDC (Kafka from IBM CDC) Real-time streaming (Structured Streaming with Databricks) We have already designed separate control tables for…
Azure Databricks
How to proactively avoid micro-batch data loss or duplication during Structured Streaming in high-volume Kafka-to-Azure SQL pipeline?
We are currently implementing a near real-time streaming architecture as part of a modernization project. In our streaming phase, we are consuming data from Kafka topics (one per table, approx. 800 total) using Databricks Structured Streaming and…
Azure Databricks
EIGHT HUNDRED KAFKA TOPICS PROCESSING BY DBR
We are working on a large-scale Change Data Capture (CDC) implementation where: The source system is IBM DB2. IBM InfoSphere CDC pushes changes to Kafka, with each Kafka topic representing one DB2 table. There are 800 Kafka topics in total, please note…
Azure Databricks
azure pricing calculator
why I am not able to use pricing calculator after log in to my account
Azure Databricks
Unable to launch single node Databricks cluster in Free trial subscription
Hello, I am unable to create / launch single node cluster in Azure databricks. I believe in free trial subscription one can try out databricks by creating a single node (4 vCPU core). I have even tried out creating the databricks service in…