Number of parallel reads from CMR API?
-
- Posts: 87
- Joined: Mon Jan 27, 2020 10:36 am America/New_York
- Has thanked: 3 times
- Been thanked: 2 times
Number of parallel reads from CMR API?
Question: how many parallel reads from the CMR API are you okay with? We are currently limiting our catalog scans to 10 parallel workers, and with the increasing number of datasets we read, it's starting to take longer and longer to just find the next batch of assets.
Would 20 parallel workers be okay? 50?
Our queries look like this:
https://cmr.earthdata.nasa.gov/search/granules.json?downloadable=true&collection_concept_id[]=C1000000320-LPDAAC_ECS&page_size=1000&revision_date[]=2021-03-10T19:43:14.473000,2025-05-15T01:28:56.878221&sort_key=revision_date
Though I'm currently seeing some 500 errors, so maybe using more workers is not a good idea.
Thanks,
Simon
Would 20 parallel workers be okay? 50?
Our queries look like this:
https://cmr.earthdata.nasa.gov/search/granules.json?downloadable=true&collection_concept_id[]=C1000000320-LPDAAC_ECS&page_size=1000&revision_date[]=2021-03-10T19:43:14.473000,2025-05-15T01:28:56.878221&sort_key=revision_date
Though I'm currently seeing some 500 errors, so maybe using more workers is not a good idea.
Thanks,
Simon
Filters:
-
- User Services
- Posts: 89
- Joined: Tue Dec 03, 2024 2:37 pm America/New_York
- Has thanked: 23 times
- Been thanked: 2 times
Re: Number of parallel reads from CMR API?
Hello @earthengine_urs The CMR team has a follow up question for you. We have sent you an email, please check your inbox when you have a chance. Thanks -- Danielle
Subscribe to the LP DAAC listserv by sending a blank email to lpdaac-join@lists.nasa.gov.
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.
-
- Posts: 87
- Joined: Mon Jan 27, 2020 10:36 am America/New_York
- Has thanked: 3 times
- Been thanked: 2 times
Re: Number of parallel reads from CMR API?
Hi Danielle - I don't see any emails (neither personal nor to the relevant group). Could you cc me directly on the thread?
-
- User Services
- Posts: 89
- Joined: Tue Dec 03, 2024 2:37 pm America/New_York
- Has thanked: 23 times
- Been thanked: 2 times
Re: Number of parallel reads from CMR API?
Hi Simon, I will resend it. It should be coming from the lpdaac @ usgs.gov email but may also have come from the custserv @ usgs.gov email. Thanks -- Danielle
Subscribe to the LP DAAC listserv by sending a blank email to lpdaac-join@lists.nasa.gov.
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.
Re: Number of parallel reads from CMR API?
Hi Simon,
Thanks very much for contacting us about this.
The short answer to your question is, yes, I think you certainly have room to expand your granule search volume to the CMR, as quantified by number of concurrent requests (quantified as searches per second or minute handled by the CMR).
Over the past 30 days, EarthEngineBot has issued an average of 11.7K/day, with a single day max of 23.6K. During the largest single week in the past 3 months (the week of 5/4/25), there were an average of 53 requests/min, with spikes >= 159 requests/min 5% of the time, with a max of 504/min. During this week, 5% of the time, the CMR handled >= 5 requests/second, with a max of 12 requests/second.
This is not a large percentage of CMR Search operations overall, which regularly sees more than 3M requests / day.
While the overall volume of EarthEngineBot queries is significantly lower than that of many other clients, the per search request latency for EarthEngineBot is actually higher than the overall search statistics for all other clients. This doesn't necessarily mean you can't increase your throughput, only that, on average, EarthEngineBot search requests are more costly. For the month of April, your search requests took an average of 1.3 seconds to complete, with 5% taking 4+ seconds. Something to be aware of as you increase the number of concurrent search threads.
The larger question of how much volume is acceptable vs too much is harder to answer. The CMR does not provide specific search volume guidance due to the variability of the search signatures and target search space. Simply, searches against some data sets, due primarily to the volume of available data, are more costly than others. Some of your search requests may be more readily handled by the CMR than others, so even across your target datasets, some volume increase may be barely noticeable while others are more impactful.
Therefore, we don't give a one-size-fits-all threshold number for search volume. At some point a client could load the system to the point that per search latency increased to the point that the increased volume generated diminishing returns for the client, and impacted overall system performance and stability for all users, but it's difficult to predict that threshold generally.
I would be cautious about increasing the number of worker threads if it meant you were going to be increasing the number of concurrent queries against the same data sets. We're working to improve this, but we've seen issues with overloading a single dataset when a client simply runs more threads against different temporal ranges of the same dataset. If you could structure your harvesting queries so that the expansion of workers included more target datasets, rather than more sub-sets of the same datasets, that would help your performance and the system's stability.
I suggest that you could double your workers from 10 to 20, run for a while and see how it performs. Please feel free to contact us at support@earthdata.nasa.gov for a performance and stability check. If that looks ok, then you should be able to increase it again.
If it helps, as a general reference to any published guidance we offer, here are a couple of documentation links:
https://wiki.earthdata.nasa.gov/display/ED/CMR+Client+Partner+User+Guide#CMRClientPartnerUserGuide-BestPracticesforCMRClientOperations
https://wiki.earthdata.nasa.gov/display/CMR/CMR+Harvesting+Best+Practices
I hope this helps.
Regards,
John Teague
CMR Operations
Thanks very much for contacting us about this.
The short answer to your question is, yes, I think you certainly have room to expand your granule search volume to the CMR, as quantified by number of concurrent requests (quantified as searches per second or minute handled by the CMR).
Over the past 30 days, EarthEngineBot has issued an average of 11.7K/day, with a single day max of 23.6K. During the largest single week in the past 3 months (the week of 5/4/25), there were an average of 53 requests/min, with spikes >= 159 requests/min 5% of the time, with a max of 504/min. During this week, 5% of the time, the CMR handled >= 5 requests/second, with a max of 12 requests/second.
This is not a large percentage of CMR Search operations overall, which regularly sees more than 3M requests / day.
While the overall volume of EarthEngineBot queries is significantly lower than that of many other clients, the per search request latency for EarthEngineBot is actually higher than the overall search statistics for all other clients. This doesn't necessarily mean you can't increase your throughput, only that, on average, EarthEngineBot search requests are more costly. For the month of April, your search requests took an average of 1.3 seconds to complete, with 5% taking 4+ seconds. Something to be aware of as you increase the number of concurrent search threads.
The larger question of how much volume is acceptable vs too much is harder to answer. The CMR does not provide specific search volume guidance due to the variability of the search signatures and target search space. Simply, searches against some data sets, due primarily to the volume of available data, are more costly than others. Some of your search requests may be more readily handled by the CMR than others, so even across your target datasets, some volume increase may be barely noticeable while others are more impactful.
Therefore, we don't give a one-size-fits-all threshold number for search volume. At some point a client could load the system to the point that per search latency increased to the point that the increased volume generated diminishing returns for the client, and impacted overall system performance and stability for all users, but it's difficult to predict that threshold generally.
I would be cautious about increasing the number of worker threads if it meant you were going to be increasing the number of concurrent queries against the same data sets. We're working to improve this, but we've seen issues with overloading a single dataset when a client simply runs more threads against different temporal ranges of the same dataset. If you could structure your harvesting queries so that the expansion of workers included more target datasets, rather than more sub-sets of the same datasets, that would help your performance and the system's stability.
I suggest that you could double your workers from 10 to 20, run for a while and see how it performs. Please feel free to contact us at support@earthdata.nasa.gov for a performance and stability check. If that looks ok, then you should be able to increase it again.
If it helps, as a general reference to any published guidance we offer, here are a couple of documentation links:
https://wiki.earthdata.nasa.gov/display/ED/CMR+Client+Partner+User+Guide#CMRClientPartnerUserGuide-BestPracticesforCMRClientOperations
https://wiki.earthdata.nasa.gov/display/CMR/CMR+Harvesting+Best+Practices
I hope this helps.
Regards,
John Teague
CMR Operations
-
- Posts: 87
- Joined: Mon Jan 27, 2020 10:36 am America/New_York
- Has thanked: 3 times
- Been thanked: 2 times
Re: Number of parallel reads from CMR API?
Thank you, John! I was using an old CMR support mail cmr-support@earthdata.nasa.gov that no longer works. I'll continue with support@earthdata.nasa.gov
Best,
Simon
Best,
Simon