Page 1 of 2
Unexpected behaviour with HLS in LPCLOUD
Posted: Mon May 27, 2024 8:37 am America/New_York
by kristianbodolai
Hi! I was trying to inspect the HLS STAC collection in LPCLOUD, and I came across an (I think) unexpected behavior, as shown in the attachment.
I tried to get the the HLS sentinel-2 collection, there's a warning about the server not conforming to the /collections STAC spec, which I think causes the issue, but the truly unexpected thing is that it returns one of the ECOSTRESS collections.
Is there a way to get the Collection object?
Thanks!
K.
Re: Unexpected behaviour with HLS in LPCLOUD
Posted: Tue May 28, 2024 8:33 am America/New_York
by mitchbon
I have not seen this issue before, but I access the HLS collections using this method:
Code: Select all
itemsS30 = catalog.search(bbox = bboxLL, datetime = f'{start}/{end}', collections = ['HLSS30.v2.0'], limit = 100).item_collection()
Where bboxLL is a set of lat/lon coordinates, start and end are date strings (e.g., '2018-01-01').
After this I build lazy xarrays with stackstac (e.g.,
https://stackstac.readthedocs.io/en/v0.5.0/basic.html).
Re: Unexpected behaviour with HLS in LPCLOUD
Posted: Tue May 28, 2024 8:54 am America/New_York
by kristianbodolai
Yes, I normally do something similar, but I was trying to explore the collection to see if there was a more efficient way of doing search over large spatio-temporal queries (e.g. as in the planetary computer tutorial for bulk stac item queries:
https://planetarycomputer.microsoft.com/docs/quickstarts/stac-geoparquet/)
It's a bit strange, I was trying to diagnose an issue where a search over a large-ish area (comprising about 40 MGRS tiles) and about 8 months took a long time (about 20 minutes to return 4900 items). The same query using the CMR search api takes about 8 seconds.
Anyway, I don't know how much this is related to the HLS collections not appearing listed as collections in the stac catalog (see screenshot in attachment), it also seems to break the HLS tutorial:
https://git.earthdata.nasa.gov/projects/LPDUR/repos/hls-tutorial/browse/HLS_Tutorial.ipynb
Re: Unexpected behaviour with HLS in LPCLOUD
Posted: Tue May 28, 2024 1:12 pm America/New_York
by mitchbon
I have generally kept my searches to smaller areas (I generally use a fishnet of tiles and do my processing one tile at a time - currently using 60 km tiles), but longer/more dense in time (e.g., all available time-steps). I have similarly run into issues when there are thousands of items.
The main thing that sped it up to me was setting the limit parameter in catalog.search() to 100. I am not sure what it is by default, but at 100 you see many times speed up for high item searches. I believe the limit parameter controls how many items are on each page in the created search json. The max is 250 I think, but higher than 100 I started getting intermittent errors on the search call.
Hopefully that helps some! I know it doesn't exactly fit what you are talking about.
I do remember the HLS tutorial breaking because the HLS collections are not on the same page they were on when that tutorial was created.
Re: Unexpected behaviour with HLS in LPCLOUD
Posted: Wed May 29, 2024 3:54 am America/New_York
by kristianbodolai
Definitely, doing smaller tiles helps, it also helps by having less nans in the time dimension when you ingest as an xarray. However for some workflows it feels a bit cumbersome to have to loop through the fishnet.
Ultimately, I don't think these searches should take that long (as is evidenced by the fact that CMR search returns the same query in a matter of seconds), which may point to an issue with the stac search itself.
Thanks for the pointer for the limit parameter, I'll try that today and see what happens. Interesting that you get intermittent errors, I never got an error in the search call, they just took an excessive amount of time.
Re the HLS tutorial breaking, do you know where the HLS collection can be found now? I'm struggling to understand how it's not listed as a collection in LPCLOUD, yet you can access it using pystac...
Re: Unexpected behaviour with HLS in LPCLOUD
Posted: Thu May 30, 2024 8:21 am America/New_York
by mitchbon
The errors I get in search may be related to my network (I work on a restricted government network, which has made working with STAC data a bit cumbersome in some cases - requiring SSL workarounds etc.).
For that tutorial, I manually searched through the STAC links for the HLS catalogs. At the time, I found it on page 3. There may be an automated way to do it.
Code: Select all
lp_cloud3 = r.get('https://cmr.earthdata.nasa.gov/stac/LPCLOUD?page=3').json()
lp_links = lp_cloud3['links'] # Note: changed to new variable representing 3rd page of LPCLOUD
for l in lp_links:
try:
print(f"{l['href']} is the {l['title']}")
except:
print(f"{l['href']}")
You can also directly search for the L30 and S30 catalogs, e.g.,
Code: Select all
l30 = r.get('https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/HLSL30.v2.0').json()
l30
Additionally, you may want to try out the new NASA Earthdata HLS tutorial instead:
https://github.com/nasa/HLS-Data-Resources/blob/main/python/tutorials/HLS_Tutorial.ipynb
Re: Unexpected behaviour with HLS in LPCLOUD
Posted: Tue Jun 18, 2024 11:44 am America/New_York
by kristianbodolai
Hi Mitch, thanks for your help, the `limit` parameter is an absolute game changer, and thanks for the pointer for the other tutorial.
I just wanted to bring attention back to the stac catalogue doesn't seem to properly index the HLS data (e.g. not appearing when you .get_collections()) Is this intentional in the LPCLOUD implementation, or is it something that needs to be addressed?
Thanks!
K.
Re: Unexpected behaviour with HLS in LPCLOUD
Posted: Thu Jul 18, 2024 11:05 am America/New_York
by LP DAAC-EDL - dgolon
Hi @kristianbodolai Apologies for the delay in response. I did want to let you know that this is something we (the LP) have been looking in to. Someone from the LP will follow up when we have more details or a solution. Thanks -- Danielle
Re: Unexpected behaviour with HLS in LPCLOUD
Posted: Tue Aug 27, 2024 4:52 pm America/New_York
by LP DAAC - afriesz
Hey
@kristianbodolai,
Concerning your question about the .get_collections() function, the current implementation of the CMR STAC API does lack collection conformance. We'll bring this up with the team that manages the API. The .get_collections() function does seem to return collections eventually, but it's very slow. This GitHub issue has some good discussion around this topic:
https://github.com/stac-utils/pystac-client/issues/320
The code below works pretty well for me as a workaround.
Code: Select all
from pystac_client import Client
STAC_URL = 'https://cmr.earthdata.nasa.gov/stac'
cat = Client.open(f'{STAC_URL}/LPCLOUD')
cat.add_conforms_to('COLLECTIONS')
[c for c in cat.get_collections() if 'HLS' in c.id] # Get collections with HLS in ID
Re: Unexpected behaviour with HLS in LPCLOUD
Posted: Wed Sep 11, 2024 4:24 am America/New_York
by kristianbodolai
Hey @afriesz, thanks for your reply!
The workaround does indeed work. Thanks for pointing me in the right direction.
Interestingly enough, it seems like that warning is no longer appearing without the `add_conforms_to('COLLECTIONS')` bit, and a lot more datasets are listed, including the HLS products. It also takes about the same time with and without it.
Re one of the other questions, that got buried in the thread - the difference in search times between the CMR search API and the stac API - is there any reason for this speed difference?