How to resolve Azure OpenAI rate limit errors when uploading large PDFs

Question

How to resolve Azure OpenAI rate limit errors when uploading large PDFs

ca 0

I am attempting to use the OpenAI GPT-4.1 model to upload a large PDF document. Despite sending the document in several smaller chunks, timeouts are still occurring. Could this issue be related to request limits?

Saideep Anchuri 8,320 Reputation points Microsoft External Staff Moderator

2025-06-12T01:44:52.4466667+00:00

Hi ca

Did you get any chance to check above response.

Thank You.
Saideep Anchuri 8,320 Reputation points Microsoft External Staff Moderator

2025-06-13T01:12:37.1333333+00:00

Hi caWe haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.

Thank You.
ca 0 Reputation points

2025-06-13T02:00:35.7566667+00:00

Can you provide more insight on how we can use chunking or to use smaller chunks? I want to use via API.
Saideep Anchuri 8,320 Reputation points Microsoft External Staff Moderator

2025-06-13T02:10:02.17+00:00
Hi ca

how to implement chunking via API:

Chunk Size: The default chunk size is 1,024 tokens, but you can adjust this based on your data. Smaller chunk sizes (e.g., 256 or 512 tokens) can be more effective for datasets with direct facts, while larger sizes might be better for more contextual information.

Chunking Techniques: You can use fixed-size chunking with overlapping text to preserve context. For example, if you set a chunk size of 256 tokens, consider overlapping 25 tokens between chunks to maintain semantic richness.

API Integration: When using the API, ensure that your documents are processed by splitting them into chunks before ingestion. This can be done by specifying the chunk size in your API requests.

Kindly refer below link: how-to-chunk-documents

best-practices

Thank You.
Saideep Anchuri 8,320 Reputation points Microsoft External Staff Moderator

2025-06-14T05:27:42.63+00:00

Hi ca

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.

Thank You.

1 answer

Your answer

Saideep Anchuri 8,320 Reputation points Microsoft External Staff Moderator

2025-06-12T01:44:52.4466667+00:00

Hi ca

Did you get any chance to check above response.

Thank You.
Saideep Anchuri 8,320 Reputation points Microsoft External Staff Moderator

2025-06-13T01:12:37.1333333+00:00

Hi caWe haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.

Thank You.
ca 0 Reputation points

2025-06-13T02:00:35.7566667+00:00

Can you provide more insight on how we can use chunking or to use smaller chunks? I want to use via API.
Saideep Anchuri 8,320 Reputation points Microsoft External Staff Moderator

2025-06-13T02:10:02.17+00:00

Hi ca

how to implement chunking via API:

Chunk Size: The default chunk size is 1,024 tokens, but you can adjust this based on your data. Smaller chunk sizes (e.g., 256 or 512 tokens) can be more effective for datasets with direct facts, while larger sizes might be better for more contextual information.

Chunking Techniques: You can use fixed-size chunking with overlapping text to preserve context. For example, if you set a chunk size of 256 tokens, consider overlapping 25 tokens between chunks to maintain semantic richness.

API Integration: When using the API, ensure that your documents are processed by splitting them into chunks before ingestion. This can be done by specifying the chunk size in your API requests.

Kindly refer below link: how-to-chunk-documents

best-practices

Thank You.
Saideep Anchuri 8,320 Reputation points Microsoft External Staff Moderator

2025-06-14T05:27:42.63+00:00

Hi ca

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.

Thank You.

Answer 1

Hi ca

Yes, rate limits could be causing the timeouts when uploading large PDFs to Azure OpenAI.

Here are some steps:

Try smaller chunks (e.g., 500–1000 tokens) instead of large sections.
You can upload files up to 8 GB in total, but using the Uploads API is key for larger sizes. Also, breaking the file into chunks smaller than 512 MB is a good practice.
Increase Quota Submit a quota increase request via Azure Quota Increase Portal
Azure OpenAI Studio (Playground) supports larger uploads and longer processing times than the API.
Use Azure Portal’s Metrics tab to track token and request usage.
OpenAI models have a maximum token limit per request. If your PDF exceeds this, it may trigger rate limits
The GPT-4.1 model has a limit of 1 million tokens per minute and 1,000 requests per minute for the default tier.

Kindly refer below link: quotas-limits

Thank You.

Share via

How to resolve Azure OpenAI rate limit errors when uploading large PDFs

1 answer

Your answer