Use this Forum to find information on, or ask a question about, NASA Earth Science data.
OB.DAACx - SeanBailey
Posts: 1519 Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 9 times
by OB.DAACx - SeanBailey » Wed Jan 29, 2020 2:22 pm America/New_York
It's not impossible. Just seemed prudent to keep crawlers out of APIs that require some apriori knowledge to be meaningful.
I'll make the request of the network gurus that the deny entry be removed for file_search.
That said, if you are writing something that is intended to get data from multiple DAACs, then you probably do want to target CMR for consistency.
It is the reason CMR exists, after all.
Sean
Edit: ask and ye shall receive...
>Done. Googlebot ips should now be able to see indexed pages and the api/file_search
Filters:
earthengine_urs
Posts: 80 Joined: Mon Jan 27, 2020 10:36 am America/New_York
Answers: 0
Has thanked: 3 times
Been thanked: 1 time
by earthengine_urs » Wed Jan 29, 2020 4:56 pm America/New_York
Thank you, these URL now work. Last question - if a file has different creation time and modification time, which of them shows up in the cdate field? (Hopefully, modification time?)
OB.DAACx - SeanBailey
Posts: 1519 Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 9 times
by OB.DAACx - SeanBailey » Wed Jan 29, 2020 5:35 pm America/New_York
Short answer: it is the time you should care about Long answer: No, it's not "modification time" - because in our database world, modification time is set for actions that have nothing to do with the file being modified :grin: It's not the mtime in the sense of a filesystem mtime, it's more akin to ctime. If the *file* is modified (i.e. contents changed), then the creation time is changed, because the only time we modify files is when we create (or recreate) them. So, the time reported in the cdate field is the time you should care about...which is why it's the time we report. Sean
earthengine_urs
Posts: 80 Joined: Mon Jan 27, 2020 10:36 am America/New_York
Answers: 0
Has thanked: 3 times
Been thanked: 1 time
by earthengine_urs » Fri Apr 03, 2020 1:29 pm America/New_York
Sean, I am able to read the file listing from NASA CMR, but the lack of the file size in the HEAD request is still a problem. Our generic downloading code looks at the expected file size to make sure the download is not interrupted. I can turn this check off or propagate the known file size from elsewhere, but the Content-Length header is fairly standard, so I was hoping your server can be configured to send it. Thanks, Simon
OB.DAACx - SeanBailey
Posts: 1519 Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 9 times
by OB.DAACx - SeanBailey » Wed Apr 15, 2020 9:14 am America/New_York
Simon,
I'm not sure what you're issue is... our server *does* report the content-length, e.g.:
$ curl --head -b ~/.urs_cookies -c ~/.urs_cookies -L -n https://oceandata.sci.gsfc.nasa.gov/ob/getfile/T2017004001500.L1A_LAC.bz2 HTTP/1.1 200 OK Server: nginx Date: Wed, 15 Apr 2020 13:12:31 GMT Content-Type: application/octet-streamContent-Length: 45004319 Connection: keep-alive Keep-Alive: timeout=60 Set-Cookie: app-obdaac=ede629ca49fc9f69786b7b0801846946112c18bb; path=/; secure Last-Modified: Wed, 04 Jan 2017 08:05:27 GMT Content-Disposition: attachment; filename=T2017004001500.L1A_LAC.bz2 X-Username: <it's me!> Referrer-Policy: no-referrer Expect-CT: max-age=31536000, enforce Strict-Transport-Security: max-age=31536000; includeSubDomains; preload Content-Security-Policy: upgrade-insecure-requests; default-src 'self' oceancolor.gsfc.nasa.gov data:; script-src 'self' 'unsafe-inline' 'unsafe-eval' www.google-analytics.com www.googletagmanager.com cdn.earthdata.nasa.gov dap.digitalgov.gov data:; style-src 'self' 'unsafe-inline' code.jquery.com cdn.earthdata.nasa.gov; img-src 'self' data: oceancolor.gsfc.nasa.gov www.google-analytics.com cdn.earthdata.nasa.gov
gnwiii
Posts: 713 Joined: Fri Jan 29, 2021 5:51 pm America/New_York
Answers: 2
Has thanked: 1 time
by gnwiii » Wed Apr 15, 2020 11:02 am America/New_York
From Fedroa 31 (also tried Linux Mint 19) I get "400 Bad Request":
% curl --version curl 7.66.0 (x86_64-redhat-linux-gnu) libcurl/7.66.0 OpenSSL/1.1.1d-fips zlib/1.2.11 brotli/1.0.7 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.2.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 Release-Date: 2019-09-11 Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp Features: AsynchDNS brotli GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz Metalink NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP UnixSockets % curl --head -b ~/.urs_cookies -c ~/.urs_cookies -L -n https://oceandata.sci.gsfc.nasa.gov/ob/getfile/T2017004001500.L1A_LAC.bz2 HTTP/2 302 server: nginx date: Wed, 15 Apr 2020 14:49:01 GMT ___location: https://urs.earthdata.nasa.gov/oauth/authorize?response_type=code&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&client_id=Z0u-MdLNypXBjiDREZ3roA expires: Mon, 01 Jan 1970 00:00:00 GMT cache-control: no-cache, must-revalidate, max-age=0, no-store pragma: no-cache set-cookie: app-obdaac=2237029341fdafdbf55ebfcb015dbc5632d67b75; path=/; secure referrer-policy: no-referrer expect-ct: max-age=31536000, enforce strict-transport-security: max-age=31536000; includeSubDomains; preload content-security-policy: upgrade-insecure-requests; default-src 'self' oceancolor.gsfc.nasa.gov data:; script-src 'self' 'unsafe-inline' 'unsafe-eval' www.google-analytics.com www.googletagmanager.com cdn.earthdata.nasa.gov dap.digitalgov.gov data:; style-src 'self' 'unsafe-inline' code.jquery.com cdn.earthdata.nasa.gov; img-src 'self' data: oceancolor.gsfc.nasa.gov www.google-analytics.com cdn.earthdata.nasa.gov HTTP/1.1 400 Bad Request Server: nginx/1.17.5 Date: Wed, 15 Apr 2020 14:49:01 GMT Content-Type: application/json; charset=utf-8 Connection: keep-alive X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block X-Content-Type-Options: nosniff X-Download-Options: noopen X-Permitted-Cross-Domain-Policies: none Referrer-Policy: strict-origin-when-cross-origin Access-Control-Allow-Origin: null Access-Control-Allow-Credentials: true Access-Control-Allow-Methods: GET, POST Access-Control-Expose-Headers: true Cache-Control: no-cache Set-Cookie: urs_user_already_logged=yes; ___domain=earthdata.nasa.gov; path=/; expires=Thu, 16 Apr 2020 14:49:01 -0000 Set-Cookie: _urs-gui_session=266848e6e89b82c5c0cb2873323ecc62; path=/; expires=Thu, 16 Apr 2020 14:49:01 -0000; HttpOnly X-Request-Id: 405584a6-f946-44a5-bc97-9cac8b84fc0c X-Runtime: 0.191130 Strict-Transport-Security: max-age=31536000
Using your Python script works:
% obdaac_download.py T2017004001500.L1A_LAC.bz2 % bzip2 -t T2017004001500.L1A_LAC.bz2 % [no news is good news]
It seems
curl
wants an entry for
oceandata.sci.nasa.gov
in
.netrc
.
$ vi .netrc $ curl --head -b ~/.urs_cookies -c ~/.urs_cookies -L -n https://oceandata.sci.gsfc.nasa.gov/ob/getfile/T2017004001500.L1A_LAC.bz2 HTTP/2 200 server: nginx date: Wed, 15 Apr 2020 15:23:29 GMT content-type: application/octet-stream content-length: 45004319 set-cookie: app-obdaac=b1c9efa0e92c905814033b978d5faf9025567914; path=/; secure last-modified: Wed, 04 Jan 2017 08:05:27 GMT content-disposition: attachment; filename=T2017004001500.L1A_LAC.bz2 x-username: <...> referrer-policy: no-referrer expect-ct: max-age=31536000, enforce strict-transport-security: max-age=31536000; includeSubDomains; preload content-security-policy: upgrade-insecure-requests; default-src 'self' oceancolor.gsfc.nasa.gov data:; script-src 'self' 'unsafe-inline' 'unsafe-eval' www.google-analytics.com www.googletagmanager.com cdn.earthdata.nasa.gov dap.digitalgov.gov data:; style-src 'self' 'unsafe-inline' code.jquery.com cdn.earthdata.nasa.gov; img-src 'self' data: oceancolor.gsfc.nasa.gov www.google-analytics.com cdn.earthdata.nasa.gov
OB.DAACx - SeanBailey
Posts: 1519 Joined: Wed Sep 18, 2019 6:15 pm America/New_York
Answers: 1
Been thanked: 9 times
by OB.DAACx - SeanBailey » Wed Apr 15, 2020 11:17 am America/New_York
Odd, but my point was that the server does report the content-length is still valid, since it does :razz:
As for the cURL not working for you, what if you try
-i
instead of
--head
? e.g.:
$ curl -i -b ~/.urs_cookies -c ~/.urs_cookies -L -n https://oceandata.sci.gsfc.nasa.gov/ob/getfile/T2017004001500.L1A_LAC.bz2 HTTP/1.1 200 OK Server: nginx Date: Wed, 15 Apr 2020 15:13:44 GMT Content-Type: application/octet-stream Content-Length: 45004319 Connection: keep-alive Keep-Alive: timeout=60 Set-Cookie: app-obdaac=f706ab6ee85bbc3507b775433974d8527134e1dd; path=/; secure Last-Modified: Wed, 04 Jan 2017 08:05:27 GMT Content-Disposition: attachment; filename=T2017004001500.L1A_LAC.bz2 X-Username: <it's me!> Referrer-Policy: no-referrer Expect-CT: max-age=31536000, enforce Strict-Transport-Security: max-age=31536000; includeSubDomains; preload Content-Security-Policy: upgrade-insecure-requests; default-src 'self' oceancolor.gsfc.nasa.gov data:; script-src 'self' 'unsafe-inline' 'unsafe-eval' www.google-analytics.com www.googletagmanager.com cdn.earthdata.nasa.gov dap.digitalgov.gov data:; style-src 'self' 'unsafe-inline' code.jquery.com cdn.earthdata.nasa.gov; img-src 'self' data: oceancolor.gsfc.nasa.gov www.google-analytics.com cdn.earthdata.nasa.gov
gnwiii
Posts: 713 Joined: Fri Jan 29, 2021 5:51 pm America/New_York
Answers: 2
Has thanked: 1 time
by gnwiii » Wed Apr 15, 2020 2:36 pm America/New_York
curl -i
works without the second entry in ~/.netrc