OPERA DSWX Rate limiting due to common s3 prefix
Posted: Mon Oct 14, 2024 3:55 pm America/New_York
Hello,
I've been trying to catalog the OPERA DSWX rasters to make the data easily queryable and accessible to our production pipelines via Airflow. My jobs have been getting S3 "503 Please reduce your request rate" errors.
I've observed that for this product all rasters are collected under a single directory. For example:
s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T46SBC_20230404T043701Z_20230407T140723Z_S2A_30_v1.0_B01_WTR.tif
and
s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T56WNE_20230409T013623Z_20230413T013817Z_L8_30_v1.0_B01_WTR.tif
These two rasters represent significantly different geospatial areas and are from different dates but share the common prefix s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/
Organizing such a volume of files under a single directory is considered bad practice by AWS, as it limits the rate at which users can access via GET to 5500 commands per second across all users. See https://repost.aws/knowledge-center/http-5xx-errors-s3
Would PODAAC consider reorganizing this data source on s3 to unique directory prefixes by MGRS tile and date, similar to other geospatial holdings such as Sentinel-2? This would remove concerns of user rate limiting and allow for both internal and external users to leverage the volume of data as intended.
Thank you
I've been trying to catalog the OPERA DSWX rasters to make the data easily queryable and accessible to our production pipelines via Airflow. My jobs have been getting S3 "503 Please reduce your request rate" errors.
I've observed that for this product all rasters are collected under a single directory. For example:
s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T46SBC_20230404T043701Z_20230407T140723Z_S2A_30_v1.0_B01_WTR.tif
and
s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T56WNE_20230409T013623Z_20230413T013817Z_L8_30_v1.0_B01_WTR.tif
These two rasters represent significantly different geospatial areas and are from different dates but share the common prefix s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/
Organizing such a volume of files under a single directory is considered bad practice by AWS, as it limits the rate at which users can access via GET to 5500 commands per second across all users. See https://repost.aws/knowledge-center/http-5xx-errors-s3
Would PODAAC consider reorganizing this data source on s3 to unique directory prefixes by MGRS tile and date, similar to other geospatial holdings such as Sentinel-2? This would remove concerns of user rate limiting and allow for both internal and external users to leverage the volume of data as intended.
Thank you