If the number of your requests exceeds the capacity allocated to process
requests, then error code 429
is returned. The following table displays the
error message generated by each type of quota framework:
Quota framework | Message |
---|---|
Pay-as-you-go | Resource exhausted, please try again later. |
Provisioned Throughput | Too many requests. Exceeded the Provisioned Throughput. |
With a Provisioned Throughput subscription, you can reserve an
amount of throughput for specific generative AI models. If you don't have a
Provisioned Throughput subscription and resources aren't available
to your application, then an error code 429
is returned. Although you don't
have reserved capacity, you can try your request again. However, the request
isn't counted against your error rate as described in your service level
agreement (SLA).
For projects that have purchased Provisioned Throughput, Vertex AI measures a project's throughput and reserves the purchased amount of throughput for the project's actual usage.
For standard Provisioned Throughput, when you use less than your
purchased amount, errors that might otherwise be 429
are returned as 5XX
and
count toward the SLA error rate. For Single Zone Provisioned Throughput,
when you use less than your purchased amount, capacity-related 429
errors are
treated as 5XX
but don't count toward the SLA error rate. When you exceed your
purchased amount, the additional requests are processed on-demand as pay-as-you-go.
Pay-as-you-go
On the pay-as-you-go quota framework, you have the following options to
resolving 429
errors:
- Use the global endpoint instead of a regional endpoint whenever possible.
- Implement a retry strategy by using truncated exponential backoff.
- If your model uses quotas, you can submit a Quota Increase Request (QIR). If your model uses Dynamic shared quota, smoothing traffic and reducing large spikes can help. For more information, see Dynamic shared quota (DSQ).
- Subscribe to Provisioned Throughput for a more consistent level of service. For more information, see Provisioned Throughput.
Provisioned Throughput
To correct the 429 error generated by Provisioned Throughput, do the following:
- Use the Default behavior example, which doesn't set a header in prediction requests. Any overages are processed on-demand and billed as pay-as-you-go.
- Increase the number of GSUs in your Provisioned Throughput subscription.
What's next
- To learn more about dynamic shared quota, see Dynamic shared quota.
- To learn more about Provisioned Throughput, see Provisioned Throughput.
- To learn about quotas and limits for Vertex AI, see Vertex AI quotas and limits.
- To learn more about Google Cloud quotas and system limits, see the Cloud Quotas documentation.
- To learn more about API errors, see API errors.