Get only the latest version of a granule with earthaccess.search_data()

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
fbennitt
Posts: 2
Joined: Mon Aug 05, 2024 4:35 pm America/New_York
Answers: 0

Get only the latest version of a granule with earthaccess.search_data()

by fbennitt » Mon Aug 05, 2024 5:08 pm America/New_York

Hi there,

I am working with the SWOT Water Mask Pixel Product (PIXC). I am accessing and downloading granules locally as shown here using earthaccess.search_data() in a Jupyter notebook: https://podaac.github.io/tutorials/notebooks/datasets/SWOTHR_localmachine.html#water-mask-pixel-cloud-netcdf

Is there a way to only retrieve the latest version of each granule? For example, when I use the following search, I get two granules (_02 and _03) for SWOT_L2_HR_PIXC_018_298_080L_20240720T131306_20240720T131317_PIC0.

pixc_results = earthaccess.search_data(short_name = 'SWOT_L2_HR_PIXC_2.0',
temporal = ('2024-05-01 00:00:00', '2024-08-05 23:59:59'),
granule_name='*_298_080L_*')

I checked the docs for the API, but I don't see a kwarg for this:
https://earthaccess.readthedocs.io/en/stable/user-reference/api/api/#earthaccess.api.search_data

Is there an easy way to filter my results before download?

Thank you,
Fiona

Filters:

NSIDC-EDL - mbeig
Posts: 22
Joined: Tue Dec 07, 2021 11:49 am America/New_York
Answers: 0

Re: Get only the latest version of a granule with earthaccess.search_data()

by NSIDC-EDL - mbeig » Tue Aug 06, 2024 4:19 pm America/New_York

Hi Fiona,

Thank you for reaching out on the forum regarding the earthaccess Python library. It's a great tool!

When using earthaccess.search_data() you can add "version" as an argument.

Example:
pixc_results = earthaccess.search_data(short_name = 'SWOT_L2_HR_PIXC_2.0', version = '03',
temporal = ('2024-05-01 00:00:00', '2024-08-05 23:59:59'),
granule_name='*_298_080L_*')

CMR does not support the option to search by most recent version because of the heterogeneity of versioning syntax across EOSDIS data, so you do have to have some prior knowledge of which version you are wishing to filter on.

Hope that helps!

Kind regards,

Mikala

PODAAC - celiaoued
Subject Matter Expert
Subject Matter Expert
Posts: 51
Joined: Fri May 28, 2021 1:30 pm America/New_York
Answers: 0
Been thanked: 2 times

Re: Get only the latest version of a granule with earthaccess.search_data()

by PODAAC - celiaoued » Tue Aug 06, 2024 8:04 pm America/New_York

Hello Fiona,

Unfortunately, I do not have the solution you are looking for, because the different versions of granules in each collection are not distinguishable through the 'version' argument. Currently, I also cannot say to just search for granules ending in '03', since as you already see, the latest granules do not all end in the same number. I can only suggest for now to manually sort through the repeated granules, or write some code that catches when you get multiples listed in the earthaccess search results.
Let me consult with PO.DAAC colleagues about any future plans regarding the repeated granules and get back to you on this.

Best,
Celia

fbennitt
Posts: 2
Joined: Mon Aug 05, 2024 4:35 pm America/New_York
Answers: 0

Re: Get only the latest version of a granule with earthaccess.search_data()

by fbennitt » Tue Oct 01, 2024 12:37 pm America/New_York

Hi Celia and Mikala,

Thanks for the help! I would also love a way to preferentially select the PGC0 products over the PIC0 as the former represent the reprocessed granules.

Do you have any code you could point me towards to parse pixc_results and filter them myself? I am not familiar with 'Collection' objects.

Thank you,
Fiona

PODAAC - celiaoued
Subject Matter Expert
Subject Matter Expert
Posts: 51
Joined: Fri May 28, 2021 1:30 pm America/New_York
Answers: 0
Been thanked: 2 times

Re: Get only the latest version of a granule with earthaccess.search_data()

by PODAAC - celiaoued » Tue Oct 01, 2024 2:55 pm America/New_York

Hi,
With Earthaccess, you can use the granule name option because PGC0 is denoted in the filename:

Code: Select all

test_results = earthaccess.search_data(short_name = 'SWOT_L2_HR_PIXC_2.0',
temporal = ('2023-08-01 00:00:00', '2024-01-05 23:59:59'),
granule_name='*_298_080L_*PGC0*')
Alternative: PO.DAAC has a Python command line tool that can select from PGC0: https://github.com/podaac/data-subscriber

You'll first need to set up a .netrc file to store Earthdata login according to instructions.
Then, I'd use the granule name option to search by a filename pattern that includes PCG0 like so:

Code: Select all

podaac-data-downloader -c SWOT_L2_HR_PIXC_2.0 -d ./myfolder -gr SWOT_L2_HR_PIXC_00?_348_073L_*_PGC0_* --dry-run
Unfortunately, the granule name option does not work together with start-date and end-date options, so to narrow down according to time frame, you'll have to rely on wildcards in the cycle number or dates within the filename, like above. Narrowing down by time may even need multiple commands, each using a different wildcard pattern to search by time/cycle.

Post Reply