Hello,
I've been trying to catalog the OPERA DSWX rasters to make the data easily queryable and accessible to our production pipelines via Airflow. My jobs have been getting S3 "503 Please reduce your request rate" errors.
I've observed that for this product all rasters are collected under a single directory. For example:
s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T46SBC_20230404T043701Z_20230407T140723Z_S2A_30_v1.0_B01_WTR.tif
and
s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T56WNE_20230409T013623Z_20230413T013817Z_L8_30_v1.0_B01_WTR.tif
These two rasters represent significantly different geospatial areas and are from different dates but share the common prefix s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/
Organizing such a volume of files under a single directory is considered bad practice by AWS, as it limits the rate at which users can access via GET to 5500 commands per second across all users. See https://repost.aws/knowledge-center/http-5xx-errors-s3
Would PODAAC consider reorganizing this data source on s3 to unique directory prefixes by MGRS tile and date, similar to other geospatial holdings such as Sentinel-2? This would remove concerns of user rate limiting and allow for both internal and external users to leverage the volume of data as intended.
Thank you
OPERA DSWX Rate limiting due to common s3 prefix
-
- Posts: 1
- Joined: Mon Oct 14, 2024 3:40 pm America/New_York
Re: OPERA DSWX Rate limiting due to common s3 prefix
Hi there, thanks for sharing your use case and bringing this issue to our attention.
At this point in the data record for the OPERA DSWx-HLS dataset it is prohibitively complex for us to change the structure of the prefix and reorganize the data, but we recognize the limitation this imposes and will take this into consideration for future datasets archived at PO.DAAC. Our recommendation is to slow down your request rate to avoid the 503 errors
Something else to note for this collection is that the name was changed to OPERA_L3_DSWX-HLS_V1 when the data was confirmed to be validated in May 2024, so rasters from 2024-05-14 and forward will be located under the OPERA_L3_DSWX-HLS_V1 prefix instead of OPERA_L3_DSWX-HLS_PROVISIONAL_V1. Using NASA's Common Metadata Repository (CMR - https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html) to find the s3 paths should return the correct url for each file.
Victoria
PO.DAAC Data Engineer
At this point in the data record for the OPERA DSWx-HLS dataset it is prohibitively complex for us to change the structure of the prefix and reorganize the data, but we recognize the limitation this imposes and will take this into consideration for future datasets archived at PO.DAAC. Our recommendation is to slow down your request rate to avoid the 503 errors
Something else to note for this collection is that the name was changed to OPERA_L3_DSWX-HLS_V1 when the data was confirmed to be validated in May 2024, so rasters from 2024-05-14 and forward will be located under the OPERA_L3_DSWX-HLS_V1 prefix instead of OPERA_L3_DSWX-HLS_PROVISIONAL_V1. Using NASA's Common Metadata Repository (CMR - https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html) to find the s3 paths should return the correct url for each file.
Victoria
PO.DAAC Data Engineer