examples/instructs for virtual Zarr stores

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
mdsumner
Posts: 5
Joined: Mon Jul 14, 2014 9:05 pm America/New_York
Answers: 0
Been thanked: 1 time

examples/instructs for virtual Zarr stores

by mdsumner » Thu May 08, 2025 3:00 am America/New_York

I'm exploring the virtual JSON kerchunk stores, and I can't find out how to authorize for data-read. I can connect to a remote store and see the dataset md fine:

```python
import xarray

u = "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-docs/ccmp/open/L4_V3.1/docs/CCMP_WINDS_10M6HR_L4_V3.1_combined-ref.json"
ds = xarray.open_dataset(u, engine="kerchunk")
<xarray.Dataset> Size: 775GB
Dimensions: (time: 46696, latitude: 720, longitude: 1440)
Coordinates:
* latitude (latitude) float32 3kB -89.88 -89.62 -89.38 ... 89.38 89.62 89.88
* longitude (longitude) float32 6kB 0.125 0.375 0.625 ... 359.4 359.6 359.9
* time (time) datetime64[ns] 374kB 1993-01-02 ... 2024-12-31T18:00:00
Data variables:
ws (time, latitude, longitude) float32 194GB ...
nobs (time, latitude, longitude) float32 194GB ...
uwnd (time, latitude, longitude) float32 194GB ...
vwnd (time, latitude, longitude) float32 194GB ...
```

but at read time there's no authorization for the actual files referenced (full trace below)

I thought that this would work, but no luck. Is there any examples or documentation on these new jsons? I understand that it's experimental, I'm able to explore this deeply.

```python
import earthaccess
earthaccess.login()
fs = earthaccess.get_s3_filesystem(daac="podaac")
so = {"remote_options": fs.storage_options}
ds = xarray.open_dataset(u, engine="kerchunk", storage_options=so)
```

fwiw, it works in (very) recent GDAL, where I just use env vars for earthdata creds:

```python
## use GDAL_HTTP_HEADER_FILE or GDAL_HTTP_HEADERS for earthdata auth
from osgeo import gdal
gdal.UseExceptions()
gdal.OpenEx("ZARR:https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-docs/ccmp/open/L4_V3.1/docs/CCMP_WINDS_10M6HR_L4_V3.1_combined-ref.json", gdal.OF_MULTIDIM_RASTER)
ds = gdal.OpenEx("ZARR:\"/vsicurl/https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-docs/ccmp/open/L4_V3.1/docs/CCMP_WINDS_10M6HR_L4_V3.1_combined-ref.json\"", gdal.OF_MULTIDIM_RASTER)
[dim.GetSize() for dim in ds.GetRootGroup().OpenMDArray("uwnd").GetDimensions()]
#[46696, 720, 1440]

slc = ds.GetRootGroup().OpenMDArray("uwnd").GetView("[1,:,:]")
bytes = slc.Read()
bytes[0:10]
bytearray(b'\x00<\x1c\xc6\x00<\x1c\xc6\x00<')
```

Thanks!

```
ds.isel(time = -1).sel(longitude = 100, method = "nearest").uwnd.values
Traceback (most recent call last):
File "/workenv/lib/python3.12/site-packages/fsspec/implementations/reference.py", line 825, in _cat_file
return await self.fss[protocol]._cat_file(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 1160, in _cat_file
return await _error_wrapper(_call_and_read, retries=self.retries)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 146, in _error_wrapper
raise err
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 114, in _error_wrapper
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 1147, in _call_and_read
resp = await self._call_s3(
^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 371, in _call_s3
return await _error_wrapper(
^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 146, in _error_wrapper
raise err
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 114, in _error_wrapper
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/client.py", line 394, in _make_api_call
http, parsed_response = await self._make_request(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/client.py", line 420, in _make_request
return await self._endpoint.make_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/endpoint.py", line 86, in _send_request
request = await self.create_request(request_dict, operation_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/endpoint.py", line 74, in create_request
await self._event_emitter.emit(
File "/workenv/lib/python3.12/site-packages/aiobotocore/hooks.py", line 68, in _emit
response = await resolve_awaitable(handler(**kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/_helpers.py", line 6, in resolve_awaitable
return await obj
^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/signers.py", line 24, in handler
return await self.sign(operation_name, request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/signers.py", line 90, in sign
auth.add_auth(request)
File "/workenv/lib/python3.12/site-packages/botocore/auth.py", line 424, in add_auth
raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/workenv/lib/python3.12/site-packages/xarray/core/dataarray.py", line 823, in values
return self.variable.values
^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/variable.py", line 508, in values
return _as_array_or_item(self._data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/variable.py", line 302, in _as_array_or_item
data = np.asarray(data)
^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 510, in __array__
return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/dask/array/core.py", line 1748, in __array__
x = self.compute()
^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/dask/base.py", line 370, in compute
(result,) = compute(self, traverse=False, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/dask/base.py", line 656, in compute
results = schedule(dsk, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 574, in __array__
return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 579, in get_duck_array
return self.array.get_duck_array()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 790, in get_duck_array
return self.array.get_duck_array()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 660, in get_duck_array
array = array.get_duck_array()
^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/coding/common.py", line 76, in get_duck_array
return self.func(self.array.get_duck_array())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 653, in get_duck_array
array = self.array[self.key]
~~~~~~~~~~^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/backends/zarr.py", line 223, in __getitem__
return indexing.explicit_indexing_adapter(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 1014, in explicit_indexing_adapter
result = raw_indexing_method(raw_key.tuple)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/backends/zarr.py", line 213, in _getitem
return self._array[key]
~~~~~~~~~~~^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/array.py", line 2425, in __getitem__
return self.get_orthogonal_selection(pure_selection, fields=fields)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/_compat.py", line 43, in inner_f
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/array.py", line 2867, in get_orthogonal_selection
return sync(
^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/sync.py", line 163, in sync
raise return_result
File "/workenv/lib/python3.12/site-packages/zarr/core/sync.py", line 119, in _runner
return await coro
^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/array.py", line 1287, in _get_selection
await self.codec_pipeline.read(
File "/workenv/lib/python3.12/site-packages/zarr/core/codec_pipeline.py", line 464, in read
await concurrent_map(
File "/workenv/lib/python3.12/site-packages/zarr/core/common.py", line 68, in concurrent_map
return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/common.py", line 66, in run
return await func(*item)
^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/codec_pipeline.py", line 265, in read_batch
chunk_bytes_batch = await concurrent_map(
^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/common.py", line 68, in concurrent_map
return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/common.py", line 66, in run
return await func(*item)
^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/storage/_common.py", line 124, in get
return await self.store.get(self.path, prototype=prototype, byte_range=byte_range)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/storage/_fsspec.py", line 230, in get
value = prototype.buffer.from_bytes(await self.fs._cat_file(path))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/fsspec/implementations/reference.py", line 829, in _cat_file
raise ReferenceNotReachable(path, part_or_url) from e
fsspec.implementations.reference.ReferenceNotReachable: Reference "uwnd/46695.0.0" failed to fetch target s3://podaac-ops-cumulus-protected/CCMP_WINDS_10M6HR_L4_V3.1/CCMP_Wind_Analysis_20241231_V03.1_L4.nc
```

Filters:

PODAAC - deanh808
Subject Matter Expert
Subject Matter Expert
Posts: 8
Joined: Wed May 31, 2023 7:13 pm America/New_York
Answers: 0

Re: examples/instructs for virtual Zarr stores

by PODAAC - deanh808 » Mon May 12, 2025 6:34 pm America/New_York

Hi,

Dean here from PO.DAAC. A few thoughts

1. This is the function I usually use for opening a data set with a kerchunk virtual reference file, give it a try. I think it has a few more steps and kwargs than what you tried.

Code: Select all

import earthaccess
import fsspec
import xarray as xr

def opendf_withref(ref, fs_data):
        """
        "ref" is the path to a reference file or object. "fs_data" is a filesystem with credentials to
        access the actual data files. 
        """
        storage_opts = {"fo": ref, "remote_protocol": "s3", "remote_options": fs_data.storage_options}
        fs_ref = fsspec.filesystem('reference', **storage_opts)
        m = fs_ref.get_mapper('')
        data = xr.open_dataset(
            m, engine="zarr", chunks={},
            backend_kwargs={"consolidated": False}
            )
        return data
So in your case you would go

Code: Select all

opendf_withref(u, fs)
2. You can also try out this PR we're working on for earthaccess that has the functionality built in, e.g.

Code: Select all

pip install git+https://github.com/DeanHenze/earthaccess.git
Then

Code: Select all

earthaccess.login()
mapper_ccmp = earthaccess.get_virtual_reference(short_name = "CCMP_WINDS_10M6HR_L4_V3.1", format="json")
ds_ccmp = xr.open_dataset(mapper_ccmp, engine="zarr", chunks={}, backend_kwargs={"consolidated":False})
Let me know if either of those options work.

Thanks,

- Dean

Post Reply