S3 to Azure Blob Storage using azcopy
Posted: Thu Jul 18, 2024 7:54 am America/New_York
Hello,
We're using the bulk download script (https://git.earthdata.nasa.gov/projects/LPDUR/repos/hls-bulk-download/browse/getHLS.sh) to transfer the HLS data into an Azure Storage account. With more than 7,000 Tile IDs to download, the process is quite time-consuming. To accelerate it, we're running the script in parallel across multiple VMs and considering other methods like `azcopy`. `Azcopy` is a tool designed for transferring data to and from Azure Storage accounts, offering relatively faster performance and supporting transfers from S3, as documented here https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-s3
Using the following link: https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials, we successfully obtained temporary S3 credentials and configured them as detailed below:
export aws_access_key_id=XXXX
export aws_secret_access_key=XXXX
export aws_session_token=XXXX
Upon attempting to transfer a sample file from S3 directly to a Blob storage using the `azcopy` command, we encountered the following error:
Error:
failed to perform copy command due to error: cannot start job due to error: cannot list objects, Access Denied
azcopy command executed:
azcopy cp 'https://lp-prod-protected.s3.us-west-2.amazonaws.com/HLSS30.020/HLS.S30.T37QGC.2024001T075239.v2.0/HLS.S30.T37QGC.2024001T075239.v2.0.SAA.tif' 'https://store1.blob.core.windows.net/user-test' --recursive=true
We are currently investigating whether a "requester pays" configuration is required to copy the data from the S3 bucket, or if there are other settings that need adjustment to facilitate the data transfer to the Blob storage. Please share if you have any update on this.
Additionally, if you have any recommendations on enhancing the efficiency of bulk downloads using the script, we would greatly appreciate your input.
We're using the bulk download script (https://git.earthdata.nasa.gov/projects/LPDUR/repos/hls-bulk-download/browse/getHLS.sh) to transfer the HLS data into an Azure Storage account. With more than 7,000 Tile IDs to download, the process is quite time-consuming. To accelerate it, we're running the script in parallel across multiple VMs and considering other methods like `azcopy`. `Azcopy` is a tool designed for transferring data to and from Azure Storage accounts, offering relatively faster performance and supporting transfers from S3, as documented here https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-s3
Using the following link: https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials, we successfully obtained temporary S3 credentials and configured them as detailed below:
export aws_access_key_id=XXXX
export aws_secret_access_key=XXXX
export aws_session_token=XXXX
Upon attempting to transfer a sample file from S3 directly to a Blob storage using the `azcopy` command, we encountered the following error:
Error:
failed to perform copy command due to error: cannot start job due to error: cannot list objects, Access Denied
azcopy command executed:
azcopy cp 'https://lp-prod-protected.s3.us-west-2.amazonaws.com/HLSS30.020/HLS.S30.T37QGC.2024001T075239.v2.0/HLS.S30.T37QGC.2024001T075239.v2.0.SAA.tif' 'https://store1.blob.core.windows.net/user-test' --recursive=true
We are currently investigating whether a "requester pays" configuration is required to copy the data from the S3 bucket, or if there are other settings that need adjustment to facilitate the data transfer to the Blob storage. Please share if you have any update on this.
Additionally, if you have any recommendations on enhancing the efficiency of bulk downloads using the script, we would greatly appreciate your input.