Need suggestion for good storage of data in azure like an RDBMS

Question

Need suggestion for good storage of data in azure like an RDBMS

Anirudh Bharti 0

Hi,

We have a requirement where we want extract information from multiple payload json file which will be in blob storage and so many internal products will be pushing these files in azure data lake storage. Our goal is to extract the data from these json and maintain data in relation tables. we first thought of using azure databricks and pyspark sql tables for this use case but then this was dropped because heavy compute was happening in databricks and it was ruled out, we can make use if azure functions or logic apps for data extraction and keep it in sql server which we are doing today but the challenge storage concern and partitioning data will multiple indexes is becoming bottleneck with sql, can you suggest some good storage service available in azure which can handle terrabytes to pettabytes of data easily and can be relation dbms in nature.

Smaran Thoomu 24,095 Reputation points Microsoft External Staff Moderator

2025-06-13T09:40:11.2533333+00:00

@Anirudh Bharti We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

1 answer

Your answer

Smaran Thoomu 24,095 Reputation points Microsoft External Staff Moderator

2025-06-13T09:40:11.2533333+00:00

@Anirudh Bharti We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

Hi @Anirudh Bharti
Given your need to ingest semi-structured JSON data from Azure Data Lake Storage, extract it, and persist it in a relational format at large scale (terabytes to petabytes) - while also avoiding heavy compute overhead and SQL Server limitations - here are a few scalable, RDBMS-like options to consider:

Recommended Azure Services:

Azure Synapse Analytics (Dedicated SQL Pools)

Why: Designed for large-scale analytics workloads and can store massive volumes of relational data (scale to petabytes).
Supports: Partitioning, indexing, columnstore tables, and materialized views.
Ideal for: Centralized relational data warehouse from multiple JSON sources.
Note: Requires data ingestion + transformation pipeline (e.g., ADF, Synapse pipelines, or Azure Functions).

Azure Data Explorer (ADX)

Why: Highly optimized for log, telemetry, and semi-structured data ingestion at high volume.
Supports: Querying structured/semi-structured data (like JSON), time series analysis, and can perform relational-style joins.
Ideal for: Scenarios with high ingestion rates and low-latency query needs.
Note: Not a traditional RDBMS but supports tabular structure and Kusto Query Language (KQL).

Azure PostgreSQL Flexible Server (Hyperscale - Citus)

Why: If you're looking for an open source RDBMS feel with horizontal scalability, Citus (extension of Postgres) shards and distributes relational data across nodes.
Supports: Complex joins, indexes, constraints — just like a classic RDBMS, but at scale.
Ideal for: Relational OLTP + analytics hybrid use cases, especially for JSONB workloads.

Azure Cosmos DB for PostgreSQL (powered by Citus)

Why: Globally distributed, highly available with RDBMS support (via PostgreSQL + Citus).
Supports: JSON storage, relational queries, scaling out write-heavy workloads.
Ideal for: Multi-region or globally scaled relational workloads.

Additionally,

Transformation Layer: You can still leverage Azure Functions, ADF, or Synapse Pipelines to extract and transform JSON into structured tables.
Avoid over-indexing in SQL Server: If sticking with SQL Server, consider refactoring indexes or using partitioned views or stretch databases for scale - but it may not scale well for petabyte-level data.

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Anirudh Bharti 0 Reputation points

2025-06-17T04:46:45.7933333+00:00

Hi Smaran Thank you for the valuable insights but the team is looking to use azure table storage integrated with PowerBi, the table storage will persist both aggregate records as well transaction records just wanted to check if we integrate multiple tables in powerbi is there a way to establish relationship with keys of the table and how do you think about this approach. relations are not too hierachical in nature they are 1 to 1 most of them

Share via

Need suggestion for good storage of data in azure like an RDBMS

1 answer

Your answer