xz compressed, not bzip2
-
- Posts: 338
- Joined: Wed Apr 06, 2005 12:11 pm America/New_York
- Has thanked: 10 times
- Been thanked: 3 times
xz compressed, not bzip2
I have had two downloaded files from 2018 that have the wrong compression type?
This is one of them (and how I 'fixed' them):
[seadas_l1a_geo_extract_h5]$ file /cms_zfs/sat_data/modis/l0/2018/070/MOD00.P2018070.0140_1.PDS.bz2
/cms_zfs/sat_data/modis/l0/2018/070/MOD00.P2018070.0140_1.PDS.bz2: xz compressed data
[070]$ mv MOD00.P2018070.0140_1.PDS.bz2 MOD00.P2018070.0140_1.PDS.xz
[070]$ xz --decompress MOD00.P2018070.0140_1.PDS.xz
[070]$ ll MOD00.P2018070.0140_1.PDS
-rw-rw-r-- 1 bmurch cms_optics 396889536 Jul 30 01:30 MOD00.P2018070.0140_1.PDS
[070]$ bzip2 MOD00.P2018070.0140_1.PDS
[070]$ file /cms_zfs/sat_data/modis/l0/2018/070/MOD00.P2018070.0140_1.PDS.bz2
/cms_zfs/sat_data/modis/l0/2018/070/MOD00.P2018070.0140_1.PDS.bz2: bzip2 compressed data, block size = 900k
This is one of them (and how I 'fixed' them):
[seadas_l1a_geo_extract_h5]$ file /cms_zfs/sat_data/modis/l0/2018/070/MOD00.P2018070.0140_1.PDS.bz2
/cms_zfs/sat_data/modis/l0/2018/070/MOD00.P2018070.0140_1.PDS.bz2: xz compressed data
[070]$ mv MOD00.P2018070.0140_1.PDS.bz2 MOD00.P2018070.0140_1.PDS.xz
[070]$ xz --decompress MOD00.P2018070.0140_1.PDS.xz
[070]$ ll MOD00.P2018070.0140_1.PDS
-rw-rw-r-- 1 bmurch cms_optics 396889536 Jul 30 01:30 MOD00.P2018070.0140_1.PDS
[070]$ bzip2 MOD00.P2018070.0140_1.PDS
[070]$ file /cms_zfs/sat_data/modis/l0/2018/070/MOD00.P2018070.0140_1.PDS.bz2
/cms_zfs/sat_data/modis/l0/2018/070/MOD00.P2018070.0140_1.PDS.bz2: bzip2 compressed data, block size = 900k
Filters:
-
- Subject Matter Expert
- Posts: 451
- Joined: Fri Feb 05, 2021 9:17 am America/New_York
- Been thanked: 7 times
xz compressed, not bzip2
Our data provider changed to using xz compression for their long-term storage.
Our ingest code was not expecting it when we replaced some corrupted files with new copies.
I'll fix these on the server.
Thanks,
Tommy
Our ingest code was not expecting it when we replaced some corrupted files with new copies.
I'll fix these on the server.
Thanks,
Tommy
-
- Subject Matter Expert
- Posts: 451
- Joined: Fri Feb 05, 2021 9:17 am America/New_York
- Been thanked: 7 times
xz compressed, not bzip2
I just checked the server, the file has the correct xz extension: MOD00.P2018070.0140_1.PDS.xz
Is your code renaming it to bz2?
Tommy
Is your code renaming it to bz2?
Tommy
-
- Posts: 338
- Joined: Wed Apr 06, 2005 12:11 pm America/New_York
- Has thanked: 10 times
- Been thanked: 3 times
xz compressed, not bzip2
Tommy,
I use the L1/2 browser to generate a L0 list. I then drop it into a file and I append the bz2 to the names and normally get them like this where x00 is the list:
time curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/{$(sed ':a;N;$!ba;s/\n/,/g' /cms_zfs/work_orders/modis/PDS/2018/x00)}
However, I just noticed this:
[bin]$ time curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2000364.1045_1.PDS
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 289M 100 289M 0 0 20.7M 0 0:00:13 0:00:13 --:--:-- 21.8M
real 0m13.955s
user 0m0.362s
sys 0m0.258s
[bin]$ ll MOD00.A2000364.1045_1.PDS
-rw-rw-r-- 1 bmurch bmurch 303855723 Aug 5 14:23 MOD00.A2000364.1045_1.PDS
[bin]$ file MOD00.A2000364.1045_1.PDS
MOD00.A2000364.1045_1.PDS: bzip2 compressed data, block size = 900k
[bin]$ time curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2000364.1045_1.PDS.bz2
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 289M 100 289M 0 0 21.3M 0 0:00:13 0:00:13 --:--:-- 21.8M
real 0m13.591s
user 0m0.330s
sys 0m0.254s
[bin]$ file MOD00.A2000364.1045_1.PDS*
MOD00.A2000364.1045_1.PDS: bzip2 compressed data, block size = 900k
MOD00.A2000364.1045_1.PDS.bz2: bzip2 compressed data, block size = 900k
[bin]$ diff MOD00.A2000364.1045_1.PDS MOD00.A2000364.1045_1.PDS.bz2
[bin]$
So, it appears that the same file is returned regardless of the extension in the above cases.
BUT not with xz extention
time curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2000364.1045_1.PDS.xz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
110 665 110 665 0 0 119 0 0:00:05 0:00:05 --:--:-- 197
real 0m5.581s
user 0m0.051s
sys 0m0.051s
[bin]$ cat MOD00.A2000364.1045_1.PDS.xz
<!DOCTYPE html><html lang="en-US"><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta name="ROBOTS" content="NOARCHIVE"><title>ERROR @ OceanColor Biology Processing Group (OBPG)</title></head><body link=#323232 vlink=#323232 alink=#323232 style="background-color:#ffffff; color:#323232; font-size:175%"><br><hr color=#323232><center><h1><b>.:. ERROR .:.</b></h1><h2>OceanColor Biology Processing Group (OBPG)</h2><blockquote>Sorry, an error has occurred. Use the back button to return to the previous page or go to the <a href="https://oceancolor.gsfc.nasa.gov">Ocean Color Home Page</a>.</blockquote><br><hr color= #323232></body></html>
So do you suggest that I need to test every downloaded file (with file command) and then determine the type of compression from that?
I use the L1/2 browser to generate a L0 list. I then drop it into a file and I append the bz2 to the names and normally get them like this where x00 is the list:
time curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/{$(sed ':a;N;$!ba;s/\n/,/g' /cms_zfs/work_orders/modis/PDS/2018/x00)}
However, I just noticed this:
[bin]$ time curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2000364.1045_1.PDS
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 289M 100 289M 0 0 20.7M 0 0:00:13 0:00:13 --:--:-- 21.8M
real 0m13.955s
user 0m0.362s
sys 0m0.258s
[bin]$ ll MOD00.A2000364.1045_1.PDS
-rw-rw-r-- 1 bmurch bmurch 303855723 Aug 5 14:23 MOD00.A2000364.1045_1.PDS
[bin]$ file MOD00.A2000364.1045_1.PDS
MOD00.A2000364.1045_1.PDS: bzip2 compressed data, block size = 900k
[bin]$ time curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2000364.1045_1.PDS.bz2
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 289M 100 289M 0 0 21.3M 0 0:00:13 0:00:13 --:--:-- 21.8M
real 0m13.591s
user 0m0.330s
sys 0m0.254s
[bin]$ file MOD00.A2000364.1045_1.PDS*
MOD00.A2000364.1045_1.PDS: bzip2 compressed data, block size = 900k
MOD00.A2000364.1045_1.PDS.bz2: bzip2 compressed data, block size = 900k
[bin]$ diff MOD00.A2000364.1045_1.PDS MOD00.A2000364.1045_1.PDS.bz2
[bin]$
So, it appears that the same file is returned regardless of the extension in the above cases.
BUT not with xz extention
time curl --interface 2607:fe50:0:6330::100 --retry 5 --retry-delay 2 --max-time 0 --remote-name-all https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2000364.1045_1.PDS.xz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
110 665 110 665 0 0 119 0 0:00:05 0:00:05 --:--:-- 197
real 0m5.581s
user 0m0.051s
sys 0m0.051s
[bin]$ cat MOD00.A2000364.1045_1.PDS.xz
<!DOCTYPE html><html lang="en-US"><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta name="ROBOTS" content="NOARCHIVE"><title>ERROR @ OceanColor Biology Processing Group (OBPG)</title></head><body link=#323232 vlink=#323232 alink=#323232 style="background-color:#ffffff; color:#323232; font-size:175%"><br><hr color=#323232><center><h1><b>.:. ERROR .:.</b></h1><h2>OceanColor Biology Processing Group (OBPG)</h2><blockquote>Sorry, an error has occurred. Use the back button to return to the previous page or go to the <a href="https://oceancolor.gsfc.nasa.gov">Ocean Color Home Page</a>.</blockquote><br><hr color= #323232></body></html>
So do you suggest that I need to test every downloaded file (with file command) and then determine the type of compression from that?
-
- Posts: 1519
- Joined: Wed Sep 18, 2019 6:15 pm America/New_York
- Been thanked: 9 times
xz compressed, not bzip2
I suggest you don't append the .bz2. The file search is based on the uncompressed filename - which is why it pulls down the .bz2 file even if you don't append the extension.
.xz is not one of the compression extensions (currently) recognized by the script, so it doesn't know to strip it off when doing the lookup, and so doesn't find the file. If you don't go to the effort to guess the extension, you won't have to, well, guess the extension :grin:
Let cURL assign the filename from the Content-Disposition header:
Sean
.xz is not one of the compression extensions (currently) recognized by the script, so it doesn't know to strip it off when doing the lookup, and so doesn't find the file. If you don't go to the effort to guess the extension, you won't have to, well, guess the extension :grin:
Let cURL assign the filename from the Content-Disposition header:
$ curl -O -J https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.A2000364.1045_1.PDS
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 289M 100 289M 0 0 6980k 0 0:00:42 0:00:42 --:--:-- 7370k
curl: Saved to filename 'MOD00.A2000364.1045_1.PDS.bz2'
$ curl -O -J https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/MOD00.P2018070.0140_1.PDS
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 277M 100 277M 0 0 8695k 0 0:00:32 0:00:32 --:--:-- 8933k
curl: Saved to filename 'MOD00.P2018070.0140_1.PDS.xz'
Sean
-
- Posts: 338
- Joined: Wed Apr 06, 2005 12:11 pm America/New_York
- Has thanked: 10 times
- Been thanked: 3 times
xz compressed, not bzip2
But, I will have to decide what to do with the downloaded file.
So do I bunzip2 it? Or unxz?
I guess test to ensure it is a bzip2 file?
Brock
So do I bunzip2 it? Or unxz?
I guess test to ensure it is a bzip2 file?
Brock
-
- Posts: 1519
- Joined: Wed Sep 18, 2019 6:15 pm America/New_York
- Been thanked: 9 times
xz compressed, not bzip2
> But, I will have to decide what to do with the downloaded file.
Yes, you will, but the file extension should clue you in as to which decompression utility to use.