OPERA DSWX Rate limiting due to common s3 prefix

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
evanclimate
Posts: 1
Joined: Mon Oct 14, 2024 3:40 pm America/New_York
Answers: 0

OPERA DSWX Rate limiting due to common s3 prefix

by evanclimate » Mon Oct 14, 2024 3:55 pm America/New_York

Hello,

I've been trying to catalog the OPERA DSWX rasters to make the data easily queryable and accessible to our production pipelines via Airflow. My jobs have been getting S3 "503 Please reduce your request rate" errors.

I've observed that for this product all rasters are collected under a single directory. For example:
s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T46SBC_20230404T043701Z_20230407T140723Z_S2A_30_v1.0_B01_WTR.tif

and

s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T56WNE_20230409T013623Z_20230413T013817Z_L8_30_v1.0_B01_WTR.tif

These two rasters represent significantly different geospatial areas and are from different dates but share the common prefix s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/

Organizing such a volume of files under a single directory is considered bad practice by AWS, as it limits the rate at which users can access via GET to 5500 commands per second across all users. See https://repost.aws/knowledge-center/http-5xx-errors-s3

Would PODAAC consider reorganizing this data source on s3 to unique directory prefixes by MGRS tile and date, similar to other geospatial holdings such as Sentinel-2? This would remove concerns of user rate limiting and allow for both internal and external users to leverage the volume of data as intended.

Thank you

Filters:

vmcdonald
Posts: 4
Joined: Mon May 08, 2023 3:33 pm America/New_York
Answers: 0

Re: OPERA DSWX Rate limiting due to common s3 prefix

by vmcdonald » Wed Oct 16, 2024 3:34 pm America/New_York

Hi there, thanks for sharing your use case and bringing this issue to our attention.

At this point in the data record for the OPERA DSWx-HLS dataset it is prohibitively complex for us to change the structure of the prefix and reorganize the data, but we recognize the limitation this imposes and will take this into consideration for future datasets archived at PO.DAAC. Our recommendation is to slow down your request rate to avoid the 503 errors

Something else to note for this collection is that the name was changed to OPERA_L3_DSWX-HLS_V1 when the data was confirmed to be validated in May 2024, so rasters from 2024-05-14 and forward will be located under the OPERA_L3_DSWX-HLS_V1 prefix instead of OPERA_L3_DSWX-HLS_PROVISIONAL_V1. Using NASA's Common Metadata Repository (CMR - https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html) to find the s3 paths should return the correct url for each file.

Victoria
PO.DAAC Data Engineer

Post Reply