Hi @Janice Chi
You're navigating a complex CDC pipeline involving Kafka, ADF, and Databricks great job so far. Here's how to handle dynamic schemas and CDC message structures effectively:
1.Dynamic Schema Handling:
- Use a control table to store each Kafka topic's schema (as JSON or DDL).
- In Databricks, retrieve the schema based on topic name and apply it dynamically using from_json(). This enables schema evolution handling and avoids hardcoding.
2.Flattening CDC Messages (before/after blocks):
- CDC messages from tools like IBM InfoSphere CDC usually follow a { before, after, op } structure.
- In update events, the after block may include only changed columns — not the entire row so verify that primary keys are always included.
- Use Spark transformations like selectExpr, withColumn, and col("after.field").alias("field") to flatten the structure and normalize schema across all events.
This setup ensures schema-aligned CDC processing, which is critical for safe MERGE INTO operations in Delta Lake or Azure SQL.
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
Thank you.