I'm exploring the virtual JSON kerchunk stores, and I can't find out how to authorize for data-read. I can connect to a remote store and see the dataset md fine:
```python
import xarray
u = "https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-docs/ccmp/open/L4_V3.1/docs/CCMP_WINDS_10M6HR_L4_V3.1_combined-ref.json"
ds = xarray.open_dataset(u, engine="kerchunk")
<xarray.Dataset> Size: 775GB
Dimensions: (time: 46696, latitude: 720, longitude: 1440)
Coordinates:
* latitude (latitude) float32 3kB -89.88 -89.62 -89.38 ... 89.38 89.62 89.88
* longitude (longitude) float32 6kB 0.125 0.375 0.625 ... 359.4 359.6 359.9
* time (time) datetime64[ns] 374kB 1993-01-02 ... 2024-12-31T18:00:00
Data variables:
ws (time, latitude, longitude) float32 194GB ...
nobs (time, latitude, longitude) float32 194GB ...
uwnd (time, latitude, longitude) float32 194GB ...
vwnd (time, latitude, longitude) float32 194GB ...
```
but at read time there's no authorization for the actual files referenced (full trace below)
I thought that this would work, but no luck. Is there any examples or documentation on these new jsons? I understand that it's experimental, I'm able to explore this deeply.
```python
import earthaccess
earthaccess.login()
fs = earthaccess.get_s3_filesystem(daac="podaac")
so = {"remote_options": fs.storage_options}
ds = xarray.open_dataset(u, engine="kerchunk", storage_options=so)
```
fwiw, it works in (very) recent GDAL, where I just use env vars for earthdata creds:
```python
## use GDAL_HTTP_HEADER_FILE or GDAL_HTTP_HEADERS for earthdata auth
from osgeo import gdal
gdal.UseExceptions()
gdal.OpenEx("ZARR:https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-docs/ccmp/open/L4_V3.1/docs/CCMP_WINDS_10M6HR_L4_V3.1_combined-ref.json", gdal.OF_MULTIDIM_RASTER)
ds = gdal.OpenEx("ZARR:\"/vsicurl/https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-docs/ccmp/open/L4_V3.1/docs/CCMP_WINDS_10M6HR_L4_V3.1_combined-ref.json\"", gdal.OF_MULTIDIM_RASTER)
[dim.GetSize() for dim in ds.GetRootGroup().OpenMDArray("uwnd").GetDimensions()]
#[46696, 720, 1440]
slc = ds.GetRootGroup().OpenMDArray("uwnd").GetView("[1,:,:]")
bytes = slc.Read()
bytes[0:10]
bytearray(b'\x00<\x1c\xc6\x00<\x1c\xc6\x00<')
```
Thanks!
```
ds.isel(time = -1).sel(longitude = 100, method = "nearest").uwnd.values
Traceback (most recent call last):
File "/workenv/lib/python3.12/site-packages/fsspec/implementations/reference.py", line 825, in _cat_file
return await self.fss[protocol]._cat_file(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 1160, in _cat_file
return await _error_wrapper(_call_and_read, retries=self.retries)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 146, in _error_wrapper
raise err
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 114, in _error_wrapper
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 1147, in _call_and_read
resp = await self._call_s3(
^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 371, in _call_s3
return await _error_wrapper(
^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 146, in _error_wrapper
raise err
File "/workenv/lib/python3.12/site-packages/s3fs/core.py", line 114, in _error_wrapper
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/client.py", line 394, in _make_api_call
http, parsed_response = await self._make_request(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/client.py", line 420, in _make_request
return await self._endpoint.make_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/endpoint.py", line 86, in _send_request
request = await self.create_request(request_dict, operation_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/endpoint.py", line 74, in create_request
await self._event_emitter.emit(
File "/workenv/lib/python3.12/site-packages/aiobotocore/hooks.py", line 68, in _emit
response = await resolve_awaitable(handler(**kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/_helpers.py", line 6, in resolve_awaitable
return await obj
^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/signers.py", line 24, in handler
return await self.sign(operation_name, request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/aiobotocore/signers.py", line 90, in sign
auth.add_auth(request)
File "/workenv/lib/python3.12/site-packages/botocore/auth.py", line 424, in add_auth
raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/workenv/lib/python3.12/site-packages/xarray/core/dataarray.py", line 823, in values
return self.variable.values
^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/variable.py", line 508, in values
return _as_array_or_item(self._data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/variable.py", line 302, in _as_array_or_item
data = np.asarray(data)
^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 510, in __array__
return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/dask/array/core.py", line 1748, in __array__
x = self.compute()
^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/dask/base.py", line 370, in compute
(result,) = compute(self, traverse=False, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/dask/base.py", line 656, in compute
results = schedule(dsk, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 574, in __array__
return np.asarray(self.get_duck_array(), dtype=dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 579, in get_duck_array
return self.array.get_duck_array()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 790, in get_duck_array
return self.array.get_duck_array()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 660, in get_duck_array
array = array.get_duck_array()
^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/coding/common.py", line 76, in get_duck_array
return self.func(self.array.get_duck_array())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 653, in get_duck_array
array = self.array[self.key]
~~~~~~~~~~^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/backends/zarr.py", line 223, in __getitem__
return indexing.explicit_indexing_adapter(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/core/indexing.py", line 1014, in explicit_indexing_adapter
result = raw_indexing_method(raw_key.tuple)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/xarray/backends/zarr.py", line 213, in _getitem
return self._array[key]
~~~~~~~~~~~^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/array.py", line 2425, in __getitem__
return self.get_orthogonal_selection(pure_selection, fields=fields)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/_compat.py", line 43, in inner_f
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/array.py", line 2867, in get_orthogonal_selection
return sync(
^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/sync.py", line 163, in sync
raise return_result
File "/workenv/lib/python3.12/site-packages/zarr/core/sync.py", line 119, in _runner
return await coro
^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/array.py", line 1287, in _get_selection
await self.codec_pipeline.read(
File "/workenv/lib/python3.12/site-packages/zarr/core/codec_pipeline.py", line 464, in read
await concurrent_map(
File "/workenv/lib/python3.12/site-packages/zarr/core/common.py", line 68, in concurrent_map
return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/common.py", line 66, in run
return await func(*item)
^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/codec_pipeline.py", line 265, in read_batch
chunk_bytes_batch = await concurrent_map(
^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/common.py", line 68, in concurrent_map
return await asyncio.gather(*[asyncio.ensure_future(run(item)) for item in items])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/core/common.py", line 66, in run
return await func(*item)
^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/storage/_common.py", line 124, in get
return await self.store.get(self.path, prototype=prototype, byte_range=byte_range)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/zarr/storage/_fsspec.py", line 230, in get
value = prototype.buffer.from_bytes(await self.fs._cat_file(path))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workenv/lib/python3.12/site-packages/fsspec/implementations/reference.py", line 829, in _cat_file
raise ReferenceNotReachable(path, part_or_url) from e
fsspec.implementations.reference.ReferenceNotReachable: Reference "uwnd/46695.0.0" failed to fetch target s3://podaac-ops-cumulus-protected/CCMP_WINDS_10M6HR_L4_V3.1/CCMP_Wind_Analysis_20241231_V03.1_L4.nc
```
examples/instructs for virtual Zarr stores
-
- Subject Matter Expert
- Posts: 8
- Joined: Wed May 31, 2023 7:13 pm America/New_York
Re: examples/instructs for virtual Zarr stores
Hi,
Dean here from PO.DAAC. A few thoughts
1. This is the function I usually use for opening a data set with a kerchunk virtual reference file, give it a try. I think it has a few more steps and kwargs than what you tried.
So in your case you would go
2. You can also try out this PR we're working on for earthaccess that has the functionality built in, e.g.
Then
Let me know if either of those options work.
Thanks,
- Dean
Dean here from PO.DAAC. A few thoughts
1. This is the function I usually use for opening a data set with a kerchunk virtual reference file, give it a try. I think it has a few more steps and kwargs than what you tried.
Code: Select all
import earthaccess
import fsspec
import xarray as xr
def opendf_withref(ref, fs_data):
"""
"ref" is the path to a reference file or object. "fs_data" is a filesystem with credentials to
access the actual data files.
"""
storage_opts = {"fo": ref, "remote_protocol": "s3", "remote_options": fs_data.storage_options}
fs_ref = fsspec.filesystem('reference', **storage_opts)
m = fs_ref.get_mapper('')
data = xr.open_dataset(
m, engine="zarr", chunks={},
backend_kwargs={"consolidated": False}
)
return data
Code: Select all
opendf_withref(u, fs)
Code: Select all
pip install git+https://github.com/DeanHenze/earthaccess.git
Code: Select all
earthaccess.login()
mapper_ccmp = earthaccess.get_virtual_reference(short_name = "CCMP_WINDS_10M6HR_L4_V3.1", format="json")
ds_ccmp = xr.open_dataset(mapper_ccmp, engine="zarr", chunks={}, backend_kwargs={"consolidated":False})
Thanks,
- Dean