Issues with wget authentication
-
- Posts: 80
- Joined: Mon Jan 27, 2020 10:36 am America/New_York
- Has thanked: 3 times
- Been thanked: 1 time
Issues with wget authentication
Hi,
I'm having the same problems as above - wget only works --auth-no-challenge=on, but since this method sends plaintext password, it's not great. LPDAAC downloads (eg, https://e4ftl01.cr.usgs.gov/VIIRS/VNP13A1.001/2016.08.28/VNP13A1.A2016241.h24v06.001.2018162005736.h5) also require authentication, but work with just "wget --user --password". Is it possible to configure this site in a similar way?
Thanks,
Simon
I'm having the same problems as above - wget only works --auth-no-challenge=on, but since this method sends plaintext password, it's not great. LPDAAC downloads (eg, https://e4ftl01.cr.usgs.gov/VIIRS/VNP13A1.001/2016.08.28/VNP13A1.A2016241.h24v06.001.2018162005736.h5) also require authentication, but work with just "wget --user --password". Is it possible to configure this site in a similar way?
Thanks,
Simon
Filters:
-
- Posts: 1519
- Joined: Wed Sep 18, 2019 6:15 pm America/New_York
- Been thanked: 9 times
Issues with wget authentication
Simon,
True, using
You probably should use the .netrc/urs_cookie approach described on https://oceancolor.gsfc.nasa.gov/data/download_methods/ instead of the command line username/password approach.
BTW, I ran your example file through wget with the --verbose option set, and it seems that the login fails (yes, I did pass it proper credentials :wink:), but the download proceeds anyway - which suggests to me that they're not verifying the URS response. This would explain why it *works* for them but not us (we verify).
Sean
True, using
--auth-no-challenge=on
in not ideal, but it is going over an HTTPS connection, so it isn't as bad as it could be :eek:You probably should use the .netrc/urs_cookie approach described on https://oceancolor.gsfc.nasa.gov/data/download_methods/ instead of the command line username/password approach.
BTW, I ran your example file through wget with the --verbose option set, and it seems that the login fails (yes, I did pass it proper credentials :wink:), but the download proceeds anyway - which suggests to me that they're not verifying the URS response. This would explain why it *works* for them but not us (we verify).
Sean
-
- Posts: 80
- Joined: Mon Jan 27, 2020 10:36 am America/New_York
- Has thanked: 3 times
- Been thanked: 1 time
Issues with wget authentication
Sean,
Thank you, but the problem is I actually can't use neither wget nor curl, as my binaries do not have access to the Internet. Instead, we have an internal system that proxies HTTP requests, and I don't think I'd be able to use plaintext auth there. I was hoping to use this system for HEAD requests to get file sizes - any chance you could turn off auth for HEAD, maybe? (The actual downloads go through another system, but it's too cumbersome to use for HEAD.)
The .netrc approach also might not be trivial with the internal system.
What auth failure do you see with the LP DAAC file? What about https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/2000.02.24/MOD10A1.A2000055.h34v10.006.2016061160522.hdf?
(They definitely require the correct credentials.)
Thank you, but the problem is I actually can't use neither wget nor curl, as my binaries do not have access to the Internet. Instead, we have an internal system that proxies HTTP requests, and I don't think I'd be able to use plaintext auth there. I was hoping to use this system for HEAD requests to get file sizes - any chance you could turn off auth for HEAD, maybe? (The actual downloads go through another system, but it's too cumbersome to use for HEAD.)
The .netrc approach also might not be trivial with the internal system.
What auth failure do you see with the LP DAAC file? What about https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/2000.02.24/MOD10A1.A2000055.h34v10.006.2016061160522.hdf?
(They definitely require the correct credentials.)
-
- Posts: 1519
- Joined: Wed Sep 18, 2019 6:15 pm America/New_York
- Been thanked: 9 times
Issues with wget authentication
Simon,
Yes, upon closer inspection, it does indeed seem to require a valid login.
It also spits out a 401 amid a flurry of 302s, so that is odd...I've asked folks to dig deeper to see if we can get wget to be happy without the
Perhaps there is something in the way we're making the authentication request to URS...
You do not need to a HEAD request to get a filesize. In fact, bad form to do so ( in my opinion :grin: )
The file_search API does not require authentication and can be used to retrieve information about a file.
For example:
https://oceandata.sci.gsfc.nasa.gov/api/file_search?search=V2020001020000.L1A_SNPP.nc&format=json
will return:
If you are less specific in the search parameters, you'll get a JSON output with the information for all the files that match your search.
Regards,
Sean
Yes, upon closer inspection, it does indeed seem to require a valid login.
It also spits out a 401 amid a flurry of 302s, so that is odd...I've asked folks to dig deeper to see if we can get wget to be happy without the
--auth-no-challenge=on
option set.Perhaps there is something in the way we're making the authentication request to URS...
You do not need to a HEAD request to get a filesize. In fact, bad form to do so ( in my opinion :grin: )
The file_search API does not require authentication and can be used to retrieve information about a file.
For example:
https://oceandata.sci.gsfc.nasa.gov/api/file_search?search=V2020001020000.L1A_SNPP.nc&format=json
will return:
{"V2020001020000.L1A_SNPP.nc":{"cdate":"2020-01-03 13:34:00","checksum":"sha1:54ab2d38004208ad9612ba4581357686dc8071d0","getfile":"https:\/\/oceandata.sci.gsfc.nasa.gov\/ob\/getfile","size":386400944}}
If you are less specific in the search parameters, you'll get a JSON output with the information for all the files that match your search.
Regards,
Sean
-
- Posts: 80
- Joined: Mon Jan 27, 2020 10:36 am America/New_York
- Has thanked: 3 times
- Been thanked: 1 time
Issues with wget authentication
Thanks, Sean. This looks promising, but I'm hitting a weird snag. Using wget on such URLs works fine, but fetching them using our internal system returns a 403 and this:
<!DOCTYPE html><html lang="en-US"><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta name="ROBOTS" content="NOARCHIVE"><title>ERROR @ OceanColor Biology Processing Group (OBPG)</title></head><body link=#323232 vlink=#323232 alink=#323232 style="background-color:#ffffff; color:#323232; font-size:175%"><br><hr color=#323232><center><h1><b>.:. ERROR .:.</b></h1><h2>OceanColor Biology Processing Group (OBPG)</h2><blockquote>Sorry, an error has occurred. Use the back button to return to the previous page or go to the <a href="https://oceancolor.gsfc.nasa.gov">Ocean Color Home Page</a>.</blockquote><br><hr color= #323232></body></html>
Do you happen to have some IP blocks, maybe?
Thanks,
Simon
<!DOCTYPE html><html lang="en-US"><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta name="ROBOTS" content="NOARCHIVE"><title>ERROR @ OceanColor Biology Processing Group (OBPG)</title></head><body link=#323232 vlink=#323232 alink=#323232 style="background-color:#ffffff; color:#323232; font-size:175%"><br><hr color=#323232><center><h1><b>.:. ERROR .:.</b></h1><h2>OceanColor Biology Processing Group (OBPG)</h2><blockquote>Sorry, an error has occurred. Use the back button to return to the previous page or go to the <a href="https://oceancolor.gsfc.nasa.gov">Ocean Color Home Page</a>.</blockquote><br><hr color= #323232></body></html>
Do you happen to have some IP blocks, maybe?
Thanks,
Simon
-
- Posts: 1519
- Joined: Wed Sep 18, 2019 6:15 pm America/New_York
- Been thanked: 9 times
Issues with wget authentication
It is possible to get hit by a network block, but if you're seeing the error page, you're not (yet) blocked (hit it enough and you may get blocked). More likely there is an issue with the request you're making. Without seeing exactly what you're asking for, I can't say what that issue would be.
Sean
Sean
-
- Posts: 80
- Joined: Mon Jan 27, 2020 10:36 am America/New_York
- Has thanked: 3 times
- Been thanked: 1 time
Issues with wget authentication
I'm able to repeat the request with the exact same headers without issues from my desktop, so I'm kinda stumped. Could you look in nginx's error logs to see if there are any details? The URL I tried was:
https://oceandata.sci.gsfc.nasa.gov/api/file_search?search=A2020014.L3m_DAY_CHL_chlor_a_4km.nc&format=json
BTW, is the information about file size.date exported to NASA CMR?
https://oceandata.sci.gsfc.nasa.gov/api/file_search?search=A2020014.L3m_DAY_CHL_chlor_a_4km.nc&format=json
BTW, is the information about file size.date exported to NASA CMR?
-
- Posts: 1519
- Joined: Wed Sep 18, 2019 6:15 pm America/New_York
- Been thanked: 9 times
Issues with wget authentication
Simon,
Would your internal system be seen as crawl-????.googlebot.com? If so, then yes, you're being denied access to our search API.
Yes, the information returned by the API for filesize, etc. should match the corresponding information we provide to CMR.
Sean
Would your internal system be seen as crawl-????.googlebot.com? If so, then yes, you're being denied access to our search API.
Yes, the information returned by the API for filesize, etc. should match the corresponding information we provide to CMR.
Sean
-
- Posts: 80
- Joined: Mon Jan 27, 2020 10:36 am America/New_York
- Has thanked: 3 times
- Been thanked: 1 time
Issues with wget authentication
Sean,
Yes, crawl-????.googlebot.com sounds right. If it's impossible to unblock it, I'll try CMR.
Yes, crawl-????.googlebot.com sounds right. If it's impossible to unblock it, I'll try CMR.
Issues with wget authentication
GNU wget's bugzilla has a new take on the use of
--auth-no-challenge=on
, arguing that there is no real security advantage, that the extra request has low cost/benefit, and that curl already defaults to wget's --auth-no-challenge=on
behaviour. I expect such a change is more likely to appear in wget2.