Get only the latest version of a granule with earthaccess.search_data()
Get only the latest version of a granule with earthaccess.search_data()
Hi there,
I am working with the SWOT Water Mask Pixel Product (PIXC). I am accessing and downloading granules locally as shown here using earthaccess.search_data() in a Jupyter notebook: https://podaac.github.io/tutorials/notebooks/datasets/SWOTHR_localmachine.html#water-mask-pixel-cloud-netcdf
Is there a way to only retrieve the latest version of each granule? For example, when I use the following search, I get two granules (_02 and _03) for SWOT_L2_HR_PIXC_018_298_080L_20240720T131306_20240720T131317_PIC0.
pixc_results = earthaccess.search_data(short_name = 'SWOT_L2_HR_PIXC_2.0',
temporal = ('2024-05-01 00:00:00', '2024-08-05 23:59:59'),
granule_name='*_298_080L_*')
I checked the docs for the API, but I don't see a kwarg for this:
https://earthaccess.readthedocs.io/en/stable/user-reference/api/api/#earthaccess.api.search_data
Is there an easy way to filter my results before download?
Thank you,
Fiona
I am working with the SWOT Water Mask Pixel Product (PIXC). I am accessing and downloading granules locally as shown here using earthaccess.search_data() in a Jupyter notebook: https://podaac.github.io/tutorials/notebooks/datasets/SWOTHR_localmachine.html#water-mask-pixel-cloud-netcdf
Is there a way to only retrieve the latest version of each granule? For example, when I use the following search, I get two granules (_02 and _03) for SWOT_L2_HR_PIXC_018_298_080L_20240720T131306_20240720T131317_PIC0.
pixc_results = earthaccess.search_data(short_name = 'SWOT_L2_HR_PIXC_2.0',
temporal = ('2024-05-01 00:00:00', '2024-08-05 23:59:59'),
granule_name='*_298_080L_*')
I checked the docs for the API, but I don't see a kwarg for this:
https://earthaccess.readthedocs.io/en/stable/user-reference/api/api/#earthaccess.api.search_data
Is there an easy way to filter my results before download?
Thank you,
Fiona
Filters:
-
- Posts: 22
- Joined: Tue Dec 07, 2021 11:49 am America/New_York
Re: Get only the latest version of a granule with earthaccess.search_data()
Hi Fiona,
Thank you for reaching out on the forum regarding the earthaccess Python library. It's a great tool!
When using earthaccess.search_data() you can add "version" as an argument.
Example:
pixc_results = earthaccess.search_data(short_name = 'SWOT_L2_HR_PIXC_2.0', version = '03',
temporal = ('2024-05-01 00:00:00', '2024-08-05 23:59:59'),
granule_name='*_298_080L_*')
CMR does not support the option to search by most recent version because of the heterogeneity of versioning syntax across EOSDIS data, so you do have to have some prior knowledge of which version you are wishing to filter on.
Hope that helps!
Kind regards,
Mikala
Thank you for reaching out on the forum regarding the earthaccess Python library. It's a great tool!
When using earthaccess.search_data() you can add "version" as an argument.
Example:
pixc_results = earthaccess.search_data(short_name = 'SWOT_L2_HR_PIXC_2.0', version = '03',
temporal = ('2024-05-01 00:00:00', '2024-08-05 23:59:59'),
granule_name='*_298_080L_*')
CMR does not support the option to search by most recent version because of the heterogeneity of versioning syntax across EOSDIS data, so you do have to have some prior knowledge of which version you are wishing to filter on.
Hope that helps!
Kind regards,
Mikala
-
- Subject Matter Expert
- Posts: 51
- Joined: Fri May 28, 2021 1:30 pm America/New_York
- Been thanked: 2 times
Re: Get only the latest version of a granule with earthaccess.search_data()
Hello Fiona,
Unfortunately, I do not have the solution you are looking for, because the different versions of granules in each collection are not distinguishable through the 'version' argument. Currently, I also cannot say to just search for granules ending in '03', since as you already see, the latest granules do not all end in the same number. I can only suggest for now to manually sort through the repeated granules, or write some code that catches when you get multiples listed in the earthaccess search results.
Let me consult with PO.DAAC colleagues about any future plans regarding the repeated granules and get back to you on this.
Best,
Celia
Unfortunately, I do not have the solution you are looking for, because the different versions of granules in each collection are not distinguishable through the 'version' argument. Currently, I also cannot say to just search for granules ending in '03', since as you already see, the latest granules do not all end in the same number. I can only suggest for now to manually sort through the repeated granules, or write some code that catches when you get multiples listed in the earthaccess search results.
Let me consult with PO.DAAC colleagues about any future plans regarding the repeated granules and get back to you on this.
Best,
Celia
Re: Get only the latest version of a granule with earthaccess.search_data()
Hi Celia and Mikala,
Thanks for the help! I would also love a way to preferentially select the PGC0 products over the PIC0 as the former represent the reprocessed granules.
Do you have any code you could point me towards to parse pixc_results and filter them myself? I am not familiar with 'Collection' objects.
Thank you,
Fiona
Thanks for the help! I would also love a way to preferentially select the PGC0 products over the PIC0 as the former represent the reprocessed granules.
Do you have any code you could point me towards to parse pixc_results and filter them myself? I am not familiar with 'Collection' objects.
Thank you,
Fiona
-
- Subject Matter Expert
- Posts: 51
- Joined: Fri May 28, 2021 1:30 pm America/New_York
- Been thanked: 2 times
Re: Get only the latest version of a granule with earthaccess.search_data()
Hi,
With Earthaccess, you can use the granule name option because PGC0 is denoted in the filename:
Alternative: PO.DAAC has a Python command line tool that can select from PGC0: https://github.com/podaac/data-subscriber
You'll first need to set up a .netrc file to store Earthdata login according to instructions.
Then, I'd use the granule name option to search by a filename pattern that includes PCG0 like so:
Unfortunately, the granule name option does not work together with start-date and end-date options, so to narrow down according to time frame, you'll have to rely on wildcards in the cycle number or dates within the filename, like above. Narrowing down by time may even need multiple commands, each using a different wildcard pattern to search by time/cycle.
With Earthaccess, you can use the granule name option because PGC0 is denoted in the filename:
Code: Select all
test_results = earthaccess.search_data(short_name = 'SWOT_L2_HR_PIXC_2.0',
temporal = ('2023-08-01 00:00:00', '2024-01-05 23:59:59'),
granule_name='*_298_080L_*PGC0*')
You'll first need to set up a .netrc file to store Earthdata login according to instructions.
Then, I'd use the granule name option to search by a filename pattern that includes PCG0 like so:
Code: Select all
podaac-data-downloader -c SWOT_L2_HR_PIXC_2.0 -d ./myfolder -gr SWOT_L2_HR_PIXC_00?_348_073L_*_PGC0_* --dry-run