Need suggestion for good storage of data in azure like an RDBMS

Anirudh Bharti 0 Reputation points
2025-06-12T16:44:15.2833333+00:00

Hi,

We have a requirement where we want extract information from multiple payload json file which will be in blob storage and so many internal products will be pushing these files in azure data lake storage. Our goal is to extract the data from these json and maintain data in relation tables. we first thought of using azure databricks and pyspark sql tables for this use case but then this was dropped because heavy compute was happening in databricks and it was ruled out, we can make use if azure functions or logic apps for data extraction and keep it in sql server which we are doing today but the challenge storage concern and partitioning data will multiple indexes is becoming bottleneck with sql, can you suggest some good storage service available in azure which can handle terrabytes to pettabytes of data easily and can be relation dbms in nature.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,357 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 24,095 Reputation points Microsoft External Staff Moderator
    2025-06-12T17:27:57.8933333+00:00

    Hi @Anirudh Bharti
    Given your need to ingest semi-structured JSON data from Azure Data Lake Storage, extract it, and persist it in a relational format at large scale (terabytes to petabytes) - while also avoiding heavy compute overhead and SQL Server limitations - here are a few scalable, RDBMS-like options to consider:

    Recommended Azure Services:

    Azure Synapse Analytics (Dedicated SQL Pools)

    • Why: Designed for large-scale analytics workloads and can store massive volumes of relational data (scale to petabytes).
    • Supports: Partitioning, indexing, columnstore tables, and materialized views.
    • Ideal for: Centralized relational data warehouse from multiple JSON sources.
    • Note: Requires data ingestion + transformation pipeline (e.g., ADF, Synapse pipelines, or Azure Functions).

    Azure Data Explorer (ADX)

    • Why: Highly optimized for log, telemetry, and semi-structured data ingestion at high volume.
    • Supports: Querying structured/semi-structured data (like JSON), time series analysis, and can perform relational-style joins.
    • Ideal for: Scenarios with high ingestion rates and low-latency query needs.
    • Note: Not a traditional RDBMS but supports tabular structure and Kusto Query Language (KQL).

    Azure PostgreSQL Flexible Server (Hyperscale - Citus)

    • Why: If you're looking for an open source RDBMS feel with horizontal scalability, Citus (extension of Postgres) shards and distributes relational data across nodes.
    • Supports: Complex joins, indexes, constraints — just like a classic RDBMS, but at scale.
    • Ideal for: Relational OLTP + analytics hybrid use cases, especially for JSONB workloads.

    Azure Cosmos DB for PostgreSQL (powered by Citus)

    • Why: Globally distributed, highly available with RDBMS support (via PostgreSQL + Citus).
    • Supports: JSON storage, relational queries, scaling out write-heavy workloads.
    • Ideal for: Multi-region or globally scaled relational workloads.

    Additionally,

    1. Transformation Layer: You can still leverage Azure Functions, ADF, or Synapse Pipelines to extract and transform JSON into structured tables.
    2. Avoid over-indexing in SQL Server: If sticking with SQL Server, consider refactoring indexes or using partitioned views or stretch databases for scale - but it may not scale well for petabyte-level data.

    I hope this information helps. Please do let us know if you have any further queries.


    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.