Page 1 of 3

Issues with wget authentication

Posted: Mon Jan 27, 2020 10:52 am America/New_York
by earthengine_urs
Hi,

I'm having the same problems as above - wget only works --auth-no-challenge=on, but since this method sends plaintext password, it's not great. LPDAAC downloads (eg, https://e4ftl01.cr.usgs.gov/VIIRS/VNP13A1.001/2016.08.28/VNP13A1.A2016241.h24v06.001.2018162005736.h5) also require authentication, but work with just "wget --user --password". Is it possible to configure this site in a similar way?

Thanks,
Simon

Issues with wget authentication

Posted: Mon Jan 27, 2020 3:03 pm America/New_York
by OB.DAACx - SeanBailey
Simon,

True, using --auth-no-challenge=on in not ideal, but it is going over an HTTPS connection, so it isn't as bad as it could be :eek:
You probably should use the .netrc/urs_cookie approach described on https://oceancolor.gsfc.nasa.gov/data/download_methods/ instead of the command line username/password approach.

BTW, I ran your example file through wget with the --verbose option set, and it seems that the login fails (yes, I did pass it proper credentials :wink:), but the download proceeds anyway - which suggests to me that they're not verifying the URS response.  This would explain why it *works* for them but not us (we verify).

Sean

Issues with wget authentication

Posted: Mon Jan 27, 2020 10:57 pm America/New_York
by earthengine_urs
Sean,

Thank you, but the problem is I actually can't use neither wget nor curl, as my binaries do not have access to the Internet. Instead, we have an internal system that proxies HTTP requests, and I don't think I'd be able to use plaintext auth there. I was hoping to use this system for HEAD requests to get file sizes - any chance you could turn off auth for HEAD, maybe? (The actual downloads go through another system, but it's too cumbersome to use for HEAD.)

The .netrc approach also might not be trivial with the internal system.

What auth failure do you see with the LP DAAC file? What about https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/2000.02.24/MOD10A1.A2000055.h34v10.006.2016061160522.hdf?
(They definitely require the correct credentials.)

Issues with wget authentication

Posted: Tue Jan 28, 2020 12:46 pm America/New_York
by OB.DAACx - SeanBailey
Simon,

Yes, upon closer inspection, it does indeed seem to require a valid login.
It also spits out a 401 amid a flurry of 302s, so that is odd...I've asked folks to dig deeper to see if we can get wget to be happy without the --auth-no-challenge=on option set.
Perhaps there is something in the way we're making the authentication request to URS...

You do not need to a HEAD request to get a filesize.  In fact, bad form to do so ( in my opinion :grin: )

The file_search API does not require authentication and can be used to retrieve information about a file.
For example:
https://oceandata.sci.gsfc.nasa.gov/api/file_search?search=V2020001020000.L1A_SNPP.nc&format=json
will return:
{"V2020001020000.L1A_SNPP.nc":{"cdate":"2020-01-03 13:34:00","checksum":"sha1:54ab2d38004208ad9612ba4581357686dc8071d0","getfile":"https:\/\/oceandata.sci.gsfc.nasa.gov\/ob\/getfile","size":386400944}}

If you are less specific in the search parameters, you'll get a JSON output with the information for all the files that match your search.

Regards,
Sean

Issues with wget authentication

Posted: Tue Jan 28, 2020 5:48 pm America/New_York
by earthengine_urs
Thanks, Sean. This looks promising, but I'm hitting a weird snag. Using wget on such URLs works fine, but fetching them using our internal system returns a 403 and this:

<!DOCTYPE html><html lang="en-US"><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta name="ROBOTS" content="NOARCHIVE"><title>ERROR @ OceanColor Biology Processing Group (OBPG)</title></head><body link=#323232 vlink=#323232 alink=#323232 style="background-color:#ffffff; color:#323232; font-size:175%"><br><hr color=#323232><center><h1><b>.:. ERROR .:.</b></h1><h2>OceanColor Biology Processing Group (OBPG)</h2><blockquote>Sorry, an error has occurred. Use the back button to return to the previous page or go to the <a href="https://oceancolor.gsfc.nasa.gov">Ocean Color Home Page</a>.</blockquote><br><hr color= #323232></body></html>

Do you happen to have some IP blocks, maybe?

Thanks,
Simon

Issues with wget authentication

Posted: Tue Jan 28, 2020 6:34 pm America/New_York
by OB.DAACx - SeanBailey
It is possible to get hit by a network block, but if you're seeing the error page, you're not (yet) blocked (hit it enough and you may get blocked).  More likely there is an issue with the request you're making.  Without seeing exactly what you're asking for, I can't say what that issue would be.

Sean

Issues with wget authentication

Posted: Tue Jan 28, 2020 7:54 pm America/New_York
by earthengine_urs
I'm able to repeat the request with the exact same headers without issues from my desktop, so I'm kinda stumped. Could you look in nginx's error logs to see if there are any details? The URL I tried was:

https://oceandata.sci.gsfc.nasa.gov/api/file_search?search=A2020014.L3m_DAY_CHL_chlor_a_4km.nc&format=json

BTW, is the information about file size.date exported to NASA CMR?

Issues with wget authentication

Posted: Wed Jan 29, 2020 12:08 pm America/New_York
by OB.DAACx - SeanBailey
Simon,

Would your internal system be seen as crawl-????.googlebot.com?  If so, then yes, you're being denied access to our search API.

Yes, the information returned by the API for filesize, etc. should match the corresponding information we provide to CMR.

Sean

Issues with wget authentication

Posted: Wed Jan 29, 2020 12:13 pm America/New_York
by earthengine_urs
Sean,

Yes, crawl-????.googlebot.com sounds right. If it's impossible to unblock it, I'll try CMR.

Issues with wget authentication

Posted: Wed Jan 29, 2020 1:19 pm America/New_York
by gnwiii
GNU wget's bugzilla has a new take on the use of --auth-no-challenge=on, arguing that there is no real security advantage, that the extra request has low cost/benefit, and that curl already defaults to wget's --auth-no-challenge=on behaviour.   I expect such a change is more likely to appear in wget2.