Page 1 of 1

HLS data access and spatial subsetting

Posted: Tue Aug 19, 2025 8:33 pm America/New_York
by thaleslc
I am trying to programmatically access the "HLSL30_2.0" and "HLSS30_2.0" products. My goal is to efficiently retrieve data (from the oldest available up to the present) for several hundred geometries daily. The geometries are defined as H3 level 5 hexagons (~250 km² each, depending on latitude).
I need all images for bands 2, 4, NIR, and Fmask to calculate EVI time series for each hexagon. I am not using directly the vegetation index products (HLSS30_VI and HLSL30_VI) because they do not yet cover the full historical period I need (2015–present).
The main Python-based approaches I found are:


1 - CMR-STAC API + Earthdata login: Using pystac_client (connected to https://cmr.earthdata.nasa.gov/stac/LPCLOUD) to search assets, then odc.stac.load() to subset/download. I also tried using rasterio to crop granules, but performance did not improve.

2 - Harmony: Seems promising, but the collections I need (C2021957295-LPCLOUD and C2021957657-LPCLOUD) do not support spatial subsetting, forcing full granule downloads, which is slow.

3 - AppEEARS API: Likely the least suitable, as requests can take hours for a single year of data per region.

4 - Direct access to COGs on AWS (us-west-2): I have not fully tested this. Using Dask + Coiled in the correct region failed, giving the error:
"User: arn:aws:sts::643705676985:assumed-role/s3-same-region-access-role/thaleslc is not authorized to perform: s3:GetObject on resource: "arn:aws:s3:::lp-prod-protected/HLSS30.020/HLS.S30.T21LXG.2025227T140101.v2.0/HLS.S30.T21LXG.2025227T140101.v2.0.B12.tif" with an explicit deny in an identity-based policy"


Given these four options, I would like to know:
1 - Are these four methods really the only/best options for my use case?
2 - Which approach is likely to offer the best performance? I suspect that accessing and subsetting the COGs directly in the S3 bucket would be fastest, but my attempt using Dask + Coiled in that region failed, and I have not yet been able to test it otherwise.

Re: HLS data access and spatial subsetting

Posted: Wed Aug 20, 2025 10:26 am America/New_York
by LP DAAC - lien
Hi,
I talked with some of the developers here, and they think that Dask + Coiled would probably perform the best (fastest). AppEEARS would certainly do the job, but as you say it would take a long time for the entire project. It looks like you placed 2 identical orders for 3 years of data for an area of HLSS30 and they took about 2 hours. So, if you have several hundred areas and 10+ years it would take some time.

As far as identifying the granules Earthaccess would be faster than the CMR-STAC, but the data would still need to be processed. AppEEARS is ours so that is the one we support; however, the thought hear is Dask + Coiled would be the more efficient but there would be fees.

We do not know of other options, but there very well may be. We defer to the user community, as maybe someone other ideas that might work better.
Thanks,
Brett