Hi @Anirudh Bharti
Given your need to ingest semi-structured JSON data from Azure Data Lake Storage, extract it, and persist it in a relational format at large scale (terabytes to petabytes) - while also avoiding heavy compute overhead and SQL Server limitations - here are a few scalable, RDBMS-like options to consider:
Recommended Azure Services:
Azure Synapse Analytics (Dedicated SQL Pools)
- Why: Designed for large-scale analytics workloads and can store massive volumes of relational data (scale to petabytes).
- Supports: Partitioning, indexing, columnstore tables, and materialized views.
- Ideal for: Centralized relational data warehouse from multiple JSON sources.
- Note: Requires data ingestion + transformation pipeline (e.g., ADF, Synapse pipelines, or Azure Functions).
Azure Data Explorer (ADX)
- Why: Highly optimized for log, telemetry, and semi-structured data ingestion at high volume.
- Supports: Querying structured/semi-structured data (like JSON), time series analysis, and can perform relational-style joins.
- Ideal for: Scenarios with high ingestion rates and low-latency query needs.
- Note: Not a traditional RDBMS but supports tabular structure and Kusto Query Language (KQL).
Azure PostgreSQL Flexible Server (Hyperscale - Citus)
- Why: If you're looking for an open source RDBMS feel with horizontal scalability, Citus (extension of Postgres) shards and distributes relational data across nodes.
- Supports: Complex joins, indexes, constraints — just like a classic RDBMS, but at scale.
- Ideal for: Relational OLTP + analytics hybrid use cases, especially for JSONB workloads.
Azure Cosmos DB for PostgreSQL (powered by Citus)
- Why: Globally distributed, highly available with RDBMS support (via PostgreSQL + Citus).
- Supports: JSON storage, relational queries, scaling out write-heavy workloads.
- Ideal for: Multi-region or globally scaled relational workloads.
Additionally,
- Transformation Layer: You can still leverage Azure Functions, ADF, or Synapse Pipelines to extract and transform JSON into structured tables.
- Avoid over-indexing in SQL Server: If sticking with SQL Server, consider refactoring indexes or using partitioned views or stretch databases for scale - but it may not scale well for petabyte-level data.
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.