Earthaccess Download timeouts

Use this Forum to find information on, or ask a question about, NASA Earth Science data.
Post Reply
zachary.fasnacht
Posts: 6
Joined: Wed Nov 14, 2018 3:45 pm America/New_York
Answers: 0

Earthaccess Download timeouts

by zachary.fasnacht » Wed Oct 30, 2024 9:43 pm America/New_York

Running the following code to download PACE L1b with earthaccess. Right now it's just hanging and now downloading any data. Any idea what might be happening?

import earthaccess


min_lon = -130; max_lon = -100; min_lat = 20; max_lat = 60
earthaccess.login(persist=True)

for day in range(1,31):
print('DAY: ',day)

start_date = '2024-05-'+str(day).zfill(2)+' 00:00:00'
end_date = '2024-05-'+str(day).zfill(2)+' 23:59:00'

results = earthaccess.search_data(short_name = 'PACE_OCI_L1B_SCI',cloud_hosted=True,temporal=(start_date,end_date),count=50,bounding_box=(min_lon,min_lat,max_lon,max_lat),version='2')
earthaccess.download(results,'')

Filters:

zachary.fasnacht
Posts: 6
Joined: Wed Nov 14, 2018 3:45 pm America/New_York
Answers: 0

Re: Earthaccess Download timeouts

by zachary.fasnacht » Wed Oct 30, 2024 9:46 pm America/New_York

And now it's timing out...

Error while downloading the file PACE_OCI.20240501T183631.L1B.V2.nc
Traceback (most recent call last):
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connection.py", line 198, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connectionpool.py", line 491, in _make_request
raise new_e
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1099, in _validate_conn
conn.connect()
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connection.py", line 616, in connect
self.sock = sock = self._new_conn()
^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connection.py", line 207, in _new_conn
raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7fa2b80dcad0>, 'Connection to obdaac-tea.earthdatacloud.nasa.gov timed out. (connect timeout=None)')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/requests/adapters.py", line 667, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/connectionpool.py", line 847, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='obdaac-tea.earthdatacloud.nasa.gov', port=443): Max retries exceeded with url: /ob-cumulus-prod-public/PACE_OCI.20240501T183631.L1B.V2.nc (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fa2b80dcad0>, 'Connection to obdaac-tea.earthdatacloud.nasa.gov timed out. (connect timeout=None)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/earthaccess/store.py", line 602, in _download_file
with session.get(
^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/explore/nobackup/people/zfasnach/miniconda3/envs/gpu/lib/python3.11/site-packages/requests/adapters.py", line 688, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='obdaac-tea.earthdatacloud.nasa.gov', port=443): Max retries exceeded with url: /ob-cumulus-prod-public/PACE_OCI.20240501T183631.L1B.V2.nc (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fa2b80dcad0>, 'Connection to obdaac-tea.earthdatacloud.nasa.gov timed out. (connect timeout=None)'))

dschuck
Posts: 6
Joined: Thu Oct 31, 2024 9:32 am America/New_York
Answers: 0
Been thanked: 1 time

Re: Earthaccess Download timeouts

by dschuck » Thu Oct 31, 2024 9:35 am America/New_York

I believe this is a transient issue. I was able to download a file in that range (I did not attempt to download all of the files that your code does).

I suggest you simply try again. If you still get a timeout, let us know.

dschuck
Posts: 6
Joined: Thu Oct 31, 2024 9:32 am America/New_York
Answers: 0
Been thanked: 1 time

Re: Earthaccess Download timeouts

by dschuck » Thu Oct 31, 2024 11:46 am America/New_York

For reference, I posted a workaround in a comment for the earthaccess issue related to this problem: https://github.com/nsidc/earthaccess/issues/600#issuecomment-2450210273

zachary.fasnacht
Posts: 6
Joined: Wed Nov 14, 2018 3:45 pm America/New_York
Answers: 0

Re: Earthaccess Download timeouts

by zachary.fasnacht » Fri Nov 01, 2024 7:43 pm America/New_York

Thanks for the information. You are correct, the problem is random and it's when downloading multiple files. I've implemented your fix and tried to download a subset of PACE files spatially for a 30 day period. Your code snippet is working for now to add a retry so that it will eventually grab the needed files. I'm concerned long term though about the practically in downloading data from earthdata in an operational processing sense.

dschuck
Posts: 6
Joined: Thu Oct 31, 2024 9:32 am America/New_York
Answers: 0
Been thanked: 1 time

Re: Earthaccess Download timeouts

by dschuck » Sat Nov 02, 2024 10:11 am America/New_York

I'm glad the workaround does the trick.

However, if you wouldn't mind sharing, is there any reason you are needing to fully download these massive files instead of using a library to directly read only the parts of the files you require for your processing? In general, we want to discourage such downloading in practice, so I'd like to better understand your use case to see if we can offer advice for how you could avoid these downloads.

NSIDCx - mbeig
Posts: 22
Joined: Tue Dec 07, 2021 11:49 am America/New_York
Answers: 0

Re: Earthaccess Download timeouts

by NSIDCx - mbeig » Mon Dec 16, 2024 6:18 pm America/New_York

For those who may be interested, some discussion of this issue continued in the GitHub Issue: https://github.com/nsidc/earthaccess/issues/600#issuecomment-2450210273

Post Reply