S3 to Azure Blob Storage using azcopy
-
- Posts: 1
- Joined: Wed Jul 17, 2024 1:47 pm America/New_York
S3 to Azure Blob Storage using azcopy
Hello,
We're using the bulk download script (https://git.earthdata.nasa.gov/projects/LPDUR/repos/hls-bulk-download/browse/getHLS.sh) to transfer the HLS data into an Azure Storage account. With more than 7,000 Tile IDs to download, the process is quite time-consuming. To accelerate it, we're running the script in parallel across multiple VMs and considering other methods like `azcopy`. `Azcopy` is a tool designed for transferring data to and from Azure Storage accounts, offering relatively faster performance and supporting transfers from S3, as documented here https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-s3
Using the following link: https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials, we successfully obtained temporary S3 credentials and configured them as detailed below:
export aws_access_key_id=XXXX
export aws_secret_access_key=XXXX
export aws_session_token=XXXX
Upon attempting to transfer a sample file from S3 directly to a Blob storage using the `azcopy` command, we encountered the following error:
Error:
failed to perform copy command due to error: cannot start job due to error: cannot list objects, Access Denied
azcopy command executed:
azcopy cp 'https://lp-prod-protected.s3.us-west-2.amazonaws.com/HLSS30.020/HLS.S30.T37QGC.2024001T075239.v2.0/HLS.S30.T37QGC.2024001T075239.v2.0.SAA.tif' 'https://store1.blob.core.windows.net/user-test' --recursive=true
We are currently investigating whether a "requester pays" configuration is required to copy the data from the S3 bucket, or if there are other settings that need adjustment to facilitate the data transfer to the Blob storage. Please share if you have any update on this.
Additionally, if you have any recommendations on enhancing the efficiency of bulk downloads using the script, we would greatly appreciate your input.
We're using the bulk download script (https://git.earthdata.nasa.gov/projects/LPDUR/repos/hls-bulk-download/browse/getHLS.sh) to transfer the HLS data into an Azure Storage account. With more than 7,000 Tile IDs to download, the process is quite time-consuming. To accelerate it, we're running the script in parallel across multiple VMs and considering other methods like `azcopy`. `Azcopy` is a tool designed for transferring data to and from Azure Storage accounts, offering relatively faster performance and supporting transfers from S3, as documented here https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-s3
Using the following link: https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials, we successfully obtained temporary S3 credentials and configured them as detailed below:
export aws_access_key_id=XXXX
export aws_secret_access_key=XXXX
export aws_session_token=XXXX
Upon attempting to transfer a sample file from S3 directly to a Blob storage using the `azcopy` command, we encountered the following error:
Error:
failed to perform copy command due to error: cannot start job due to error: cannot list objects, Access Denied
azcopy command executed:
azcopy cp 'https://lp-prod-protected.s3.us-west-2.amazonaws.com/HLSS30.020/HLS.S30.T37QGC.2024001T075239.v2.0/HLS.S30.T37QGC.2024001T075239.v2.0.SAA.tif' 'https://store1.blob.core.windows.net/user-test' --recursive=true
We are currently investigating whether a "requester pays" configuration is required to copy the data from the S3 bucket, or if there are other settings that need adjustment to facilitate the data transfer to the Blob storage. Please share if you have any update on this.
Additionally, if you have any recommendations on enhancing the efficiency of bulk downloads using the script, we would greatly appreciate your input.
Filters:
-
- Posts: 422
- Joined: Mon Sep 30, 2019 10:00 am America/New_York
- Has thanked: 31 times
- Been thanked: 8 times
- Contact:
Re: S3 to Azure Blob Storage using azcopy
Hi @karthick_rn We are looking into this but just to confirm, are you working in us-west-2? If not, that could be causing the access denied error you are seeing.
Subscribe to the LP DAAC listserv by sending a blank email to lpdaac-join@lists.nasa.gov.
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.
-
- Subject Matter Expert
- Posts: 71
- Joined: Tue Nov 12, 2019 4:02 pm America/New_York
- Been thanked: 3 times
Re: S3 to Azure Blob Storage using azcopy
@karthick_rn,
Hi, my understanding around how azcopy works is that it attempts to move/copy data from S3 to blob storage. I don't think this would work with data in Earthdata Cloud because: 1.) direct access to data in S3 is restricted to access methods being executed within AWS us-west-2 only, and 2.) Earthdata Cloud assets cannot be 'pulled' out of the cloud using the S3 URI. Data can be accessed/download from outside the cloud using the available HTTPS links for each asset, but I suspect this not what azcopy is set up to use.
Hi, my understanding around how azcopy works is that it attempts to move/copy data from S3 to blob storage. I don't think this would work with data in Earthdata Cloud because: 1.) direct access to data in S3 is restricted to access methods being executed within AWS us-west-2 only, and 2.) Earthdata Cloud assets cannot be 'pulled' out of the cloud using the S3 URI. Data can be accessed/download from outside the cloud using the available HTTPS links for each asset, but I suspect this not what azcopy is set up to use.
Re: S3 to Azure Blob Storage using azcopy
Just use tools like Goodsync and Gs Richcopy 360 to copy directly from blob to S3, both are easy , fast and straightforward
Re: S3 to Azure Blob Storage using azcopy
margarite wrote:
> Just use tools like Goodsync and Gs Richcopy 360 to copy directly from the blob
> to S3, both are easy, fast and straightforward
I already use Gs Richcopy 360 for cloud data migration, it is good choice
But as an alternative, I prefer Syncback Pro, because GoodSync is extremely expensive
> Just use tools like Goodsync and Gs Richcopy 360 to copy directly from the blob
> to S3, both are easy, fast and straightforward
I already use Gs Richcopy 360 for cloud data migration, it is good choice
But as an alternative, I prefer Syncback Pro, because GoodSync is extremely expensive
Re: S3 to Azure Blob Storage using azcopy
@karthick_rn, the issue you're running into with the "list objects" error is similar to what I just posted here: viewtopic.php?t=6406
Could you try copying this file (159 MB) from the ORNL DAAC bucket and see if you get a similar error:
s3://ornl-cumulus-prod-protected/gedi/GEDI_L3_LandSurface_Metrics_V2/data/GEDI03_counts_2019108_2020287_002_02.tif
you can get the token credentials from here: https://data.ornldaac.earthdata.nasa.gov/s3credentialsREADME
Could you try copying this file (159 MB) from the ORNL DAAC bucket and see if you get a similar error:
s3://ornl-cumulus-prod-protected/gedi/GEDI_L3_LandSurface_Metrics_V2/data/GEDI03_counts_2019108_2020287_002_02.tif
you can get the token credentials from here: https://data.ornldaac.earthdata.nasa.gov/s3credentialsREADME
Re: S3 to Azure Blob Storage using azcopy
karthick_rn wrote:
> Hello,
>
> We're using the bulk download script
> (https://git.earthdata.nasa.gov/projects/LPDUR/repos/hls-bulk-download/browse/getHLS.sh)
> to transfer the HLS data into an Azure Storage account. With more than
> 7,000 Tile IDs to download, the process is quite time-consuming. To
> accelerate it, we're running the script in parallel across multiple VMs and
> considering other methods like `azcopy`. `Azcopy` is a tool designed for
> transferring data to and from Azure Storage accounts, offering relatively
> faster performance and supporting transfers from S3, as documented here
> https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-s3
>
> Using the following link:
> https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials, we successfully
> obtained temporary S3 credentials and configured them as detailed below:
>
> export aws_access_key_id=XXXX
> export aws_secret_access_key=XXXX
> export aws_session_token=XXXX
>
> Upon attempting to transfer a sample file from S3 directly to a Blob
> storage using the `azcopy` command, we encountered the following error:
>
> Error:
> failed to perform copy command due to error: cannot start job due to error:
> cannot list objects, Access Denied
>
> azcopy command executed:
> azcopy cp
> 'https://lp-prod-protected.s3.us-west-2.amazonaws.com/HLSS30.020/HLS.S30.T37QGC.2024001T075239.v2.0/HLS.S30.T37QGC.2024001T075239.v2.0.SAA.tif'
> 'https://store1.blob.core.windows.net/user-test' --recursive=true
>
> We are currently investigating whether a "requester pays"
> configuration is required to copy the data from the S3 bucket, or if there
> are other settings that need adjustment to facilitate the data transfer to
> the Blob storage. Please share if you have any update on this.
>
> Additionally, if you have any recommendations on enhancing the efficiency
> of bulk downloads using the script, we would greatly appreciate your input.
Try running your script again to see if it now works. The LP DAAC team resolved the problem I brought up in my post which had to do with the bucket guidance.
> Hello,
>
> We're using the bulk download script
> (https://git.earthdata.nasa.gov/projects/LPDUR/repos/hls-bulk-download/browse/getHLS.sh)
> to transfer the HLS data into an Azure Storage account. With more than
> 7,000 Tile IDs to download, the process is quite time-consuming. To
> accelerate it, we're running the script in parallel across multiple VMs and
> considering other methods like `azcopy`. `Azcopy` is a tool designed for
> transferring data to and from Azure Storage accounts, offering relatively
> faster performance and supporting transfers from S3, as documented here
> https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-s3
>
> Using the following link:
> https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials, we successfully
> obtained temporary S3 credentials and configured them as detailed below:
>
> export aws_access_key_id=XXXX
> export aws_secret_access_key=XXXX
> export aws_session_token=XXXX
>
> Upon attempting to transfer a sample file from S3 directly to a Blob
> storage using the `azcopy` command, we encountered the following error:
>
> Error:
> failed to perform copy command due to error: cannot start job due to error:
> cannot list objects, Access Denied
>
> azcopy command executed:
> azcopy cp
> 'https://lp-prod-protected.s3.us-west-2.amazonaws.com/HLSS30.020/HLS.S30.T37QGC.2024001T075239.v2.0/HLS.S30.T37QGC.2024001T075239.v2.0.SAA.tif'
> 'https://store1.blob.core.windows.net/user-test' --recursive=true
>
> We are currently investigating whether a "requester pays"
> configuration is required to copy the data from the S3 bucket, or if there
> are other settings that need adjustment to facilitate the data transfer to
> the Blob storage. Please share if you have any update on this.
>
> Additionally, if you have any recommendations on enhancing the efficiency
> of bulk downloads using the script, we would greatly appreciate your input.
Try running your script again to see if it now works. The LP DAAC team resolved the problem I brought up in my post which had to do with the bucket guidance.
Re: S3 to Azure Blob Storage using azcopy
Downloading that example file from "ornl-cumulus-prod-protected" worked for me, but trying to download or list anything from "lp-prod-protected" gives me an Access Denied, is it possible the lp-prod-protected bucket isn't included in a list of buckets that https://data.ornldaac.earthdata.nasa.gov/s3credentials provides access to? Thanks!
-Marc
-Marc
Last edited by 777arc on Thu Feb 27, 2025 12:40 pm America/New_York, edited 1 time in total.
-
- User Services
- Posts: 88
- Joined: Tue Dec 03, 2024 2:37 pm America/New_York
- Has thanked: 23 times
- Been thanked: 2 times
Re: S3 to Azure Blob Storage using azcopy
Hi @777arc Apologies for the delay in response. Please try using the LP DAAC S3 credentials link instead of ORNL's, https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials
Subscribe to the LP DAAC listserv by sending a blank email to lpdaac-join@lists.nasa.gov.
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.
Sign up for the Landsat listserv to receive the most up to date information about Landsat data: https://public.govdelivery.com/accounts/USDOIGS/subscriber/new#tab1.