Page 1 of 1

Earthaccess Download timeouts

Posted: Wed Oct 30, 2024 9:43 pm America/New_York
by zachary.fasnacht
Running the following code to download PACE L1b with earthaccess. Right now it's just hanging and now downloading any data. Any idea what might be happening?

import earthaccess


min_lon = -130; max_lon = -100; min_lat = 20; max_lat = 60
earthaccess.login(persist=True)

for day in range(1,31):
print('DAY: ',day)

start_date = '2024-05-'+str(day).zfill(2)+' 00:00:00'
end_date = '2024-05-'+str(day).zfill(2)+' 23:59:00'

results = earthaccess.search_data(short_name = 'PACE_OCI_L1B_SCI',cloud_hosted=True,temporal=(start_date,end_date),count=50,bounding_box=(min_lon,min_lat,max_lon,max_lat),version='2')
earthaccess.download(results,'')

Re: Earthaccess Download timeouts

Posted: Wed Oct 30, 2024 9:46 pm America/New_York
by zachary.fasnacht
And now it's timing out...

Error while downloading the file PACE_OCI.20240501T183631.L1B.V2.nc
Traceback (most recent call last):
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connection.py", line 198, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connectionpool.py", line 491, in _make_request
raise new_e
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
conn.connect()
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connection.py", line 616, in connect
self.sock = sock = self._new_conn()
^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connection.py", line 207, in _new_conn
raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7fa2b80dcad0>, 'Connection to obdaac-tea.earthdatacloud.nasa.gov timed out. (connect timeout=None)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connectionpool.py", line 847, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='obdaac-tea.earthdatacloud.nasa.gov', port=443): Max retries exceeded with url: /ob-cumulus-prod-public/PACE_OCI.20240501T183631.L1B.V2.nc (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fa2b80dcad0>, 'Connection to obdaac-tea.earthdatacloud.nasa.gov timed out. (connect timeout=None)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/earthaccess/store.py", line 602, in _download_file
with session.get(
^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/requests/adapters.py", line 688, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='obdaac-tea.earthdatacloud.nasa.gov', port=443): Max retries exceeded with url: /ob-cumulus-prod-public/PACE_OCI.20240501T183631.L1B.V2.nc (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fa2b80dcad0>, 'Connection to obdaac-tea.earthdatacloud.nasa.gov timed out. (connect timeout=None)'))

Re: Earthaccess Download timeouts

Posted: Thu Oct 31, 2024 9:35 am America/New_York
by dschuck
I believe this is a transient issue. I was able to download a file in that range (I did not attempt to download all of the files that your code does).

I suggest you simply try again. If you still get a timeout, let us know.

Re: Earthaccess Download timeouts

Posted: Thu Oct 31, 2024 11:46 am America/New_York
by dschuck
For reference, I posted a workaround in a comment for the earthaccess issue related to this problem: https://github.com/nsidc/earthaccess/issues/600#issuecomment-2450210273

Re: Earthaccess Download timeouts

Posted: Fri Nov 01, 2024 7:43 pm America/New_York
by zachary.fasnacht
Thanks for the information. You are correct, the problem is random and it's when downloading multiple files. I've implemented your fix and tried to download a subset of PACE files spatially for a 30 day period. Your code snippet is working for now to add a retry so that it will eventually grab the needed files. I'm concerned long term though about the practically in downloading data from earthdata in an operational processing sense.

Re: Earthaccess Download timeouts

Posted: Sat Nov 02, 2024 10:11 am America/New_York
by dschuck
I'm glad the workaround does the trick.

However, if you wouldn't mind sharing, is there any reason you are needing to fully download these massive files instead of using a library to directly read only the parts of the files you require for your processing? In general, we want to discourage such downloading in practice, so I'd like to better understand your use case to see if we can offer advice for how you could avoid these downloads.

Re: Earthaccess Download timeouts

Posted: Mon Dec 16, 2024 6:18 pm America/New_York
by NSIDC-EDL - mbeig
For those who may be interested, some discussion of this issue continued in the GitHub Issue: https://github.com/nsidc/earthaccess/issues/600#issuecomment-2450210273