Handling DATETIME2 compatibility issue in Databricks during Hyperscale type alignment

Question

Handling DATETIME2 compatibility issue in Databricks during Hyperscale type alignment

Janice Chi 100

In our project, we are transforming data in Azure Databricks coming from source systems (DB2 via CDC or snapshots) and storing it temporarily in Delta Lake. We later load this data into Azure SQL Hyperscale. To align with Hyperscale’s expected schema, we are converting source data types in Databricks (e.g., string → DATETIME2, decimal → numeric).

However, we are facing a compatibility issue where Databricks does not support certain SQL Server-specific data types like DATETIME2. When we attempt to cast or store such values in Delta format, we encounter errors because DATETIME2 is not natively supported in Spark.

Can you guide:

What is the recommended approach to handle SQL Server types like DATETIME2 inside Databricks?

Should we store it in Spark-compatible types like TimestampType in Delta and only cast to DATETIME2 at the point of writing into Hyperscale?

Is there any issue in doing so in terms of data truncation or precision mismatch between Spark TimestampType and SQL DATETIME2?

Any best practice guidance for type casting and schema enforcement in intermediate layers between source and Hyperscale?

J N S S Kasyap 3,300 Reputation points Microsoft External Staff Moderator

2025-06-13T14:33:11.4+00:00
Hi @Janice Chi
You're encountering a challenge with handling the DATETIME2 type in Azure Databricks when aligning your data to Azure SQL Hyperscale’s schema.

Since SQL Server’s DATETIME2 is not natively supported in Spark, the best practice is to use Spark's TimestampType to represent all datetime values in your Delta Lake tables. This ensures compatibility with Spark functions and avoids casting issues during transformation stages.

During the final write to Azure SQL Hyperscale (via JDBC), explicitly map the TimestampType to DATETIME2(p) (typically DATETIME2(6) for full compatibility). Most JDBC drivers (like Microsoft’s SQL Server driver) can implicitly convert Spark TimestampType into DATETIME2, as long as the SQL table is pre-defined with correct types.

Spark's TimestampType supports up to microsecond precision (6 digits), while SQL Server’s DATETIME2(p) supports up to 100-nanosecond precision (7 digits). This means if your source timestamps exceed microsecond precision—which is uncommon—you might lose the 7th digit when writing to SQL. To ensure consistency and avoid confusion or rounding issues, it's best to align precision by using DATETIME2(6) in SQL Server when consuming timestamps from Spark.

Best Practices for Type Casting and Validation

Store raw strings in Bronze, cast to TimestampType in Silver/Gold layers in Databricks.

Define Hyperscale schema upfront using DATETIME2(6) to avoid implicit downgrades to older SQL DATETIME.

Use to_timestamp() with defined formats to safely parse datetime strings.

Add data validation or quality checks before writing (e.g., nulls, invalid formats).

Monitor and log schema mismatches or precision warnings in production pipelines.

Example Conversion flow:

from pyspark.sql.functions import col, to_timestamp df_clean = df.withColumn("event_ts", to_timestamp(col("event_ts_str"), "yyyy-MM-dd HH:mm:ss.SSSSSS")) # Write to Hyperscale df_clean.write \ .format("jdbc") \ .option("url", jdbc_url) \ .option("dbtable", "dbo.my_table") \ .option("user", "username") \ .option("password", "pwd") \ .save()

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Thank you.
Janice Chi 100 Reputation points

2025-06-14T08:18:14.9366667+00:00
should we follow this startegy for all data mismatches "Always map DB2 types to closest Spark types in Databricks (Bronze/Silver), then ensure correct matching SQL types are pre-defined in Hyperscale, and let JDBC handle the conversion during write." please tell what to do for below tables mismatches

char(25) to nchar(25)

decimal(24,12) to numeric(24,12)

varchar(4096) to nvarchar(MAX)

Your answer

Janice Chi 100 Reputation points

2025-06-14T08:18:14.9366667+00:00

should we follow this startegy for all data mismatches "Always map DB2 types to closest Spark types in Databricks (Bronze/Silver), then ensure correct matching SQL types are pre-defined in Hyperscale, and let JDBC handle the conversion during write." please tell what to do for below tables mismatches

char(25) to nchar(25)

decimal(24,12) to numeric(24,12)

varchar(4096) to nvarchar(MAX)

Share via

Handling DATETIME2 compatibility issue in Databricks during Hyperscale type alignment

Your answer