Using Amazon S3 Glacier in the AWS CLI

Last Updated : 10 Oct, 2024

Amazon S3 Glacier is now incorporated under the umbrella of Amazon S3 Intelligent-Tiering as the Glacier and Deep Archive storage classes. It is a very cost-effective, durable, and secure service for archiving data meant for cold storage, with retrieval times of minutes to hours, targeting long-term storage of infrequently accessed data. For this reason, Glacier is known as "cold storage" because it is optimized for data that would rarely be accessed but could be quickly retrieved if needed.

Table of Content

Primary Terminologies

Step-by-Step Process for Glacier Management Using AWS CLI

Step-by-Step Guide to Use AWS CLI for Glacier Management: AWS CLI Commands for Cold Storage
Best Practices for Efficient Glacier Management Using AWS CLI

1. Optimizing Costs
2. Setting Up Lifecycle Policies
3. Securing Data
4. Data Management and Retrieval

Amazon S3 Glacier: Vault Inventory Listing with SNS Topic Integration
Conclusion
Glacier Managemen - FAQs

Primary Terminologies

To effectively manage Glacier storage, it’s important to understand the core terminologies associated with the service.

Vault
Glacier Vault is a facility for the storage of archives. The vault may hold many archives, and you could be having different vaults in a specified geographical area. Each vault has a particular name.
Archive
In Glacier, an Archive refers to the object put inside a vault. Each archive has a particular ID within the vault, and it's normally the primary data object that you download from Glacier.
Job
A job is an operation that you submit to Glacier to retrieve data or generate a vault inventory. Glacier retrievals are not instantaneous; jobs are used for recovering archived data that can take minutes to hours.
Retrieval Tier
Glacier provides three tiers for retrieval of access to data
- Expedited: It will be available shortly and should be in 1-5 minutes.
- Standard: It should be retrieved in 3-5 hours.
- Bulk: The least expensive but up to 12 hours.
AWS CLI
AWS CLI is an integrated tool that enables the administrator of several AWS services through the terminal. Use the AWS CLI to work with S3 Glacier and automate workflows.

Step-by-Step Process for Glacier Management Using AWS CLI

1. Configure AWS CLI

Open AWS Console and Navigate Amzon S3 Glacier

2. Creating a Glacier Vault

To create a Glacier vault, use the following command. Make sure to replace <vault-name> with your desired vault name and <region> with the appropriate AWS region.

3. Listing Vaults

To list all Glacier vaults in your AWS account

This will return a JSON list of all your Glacier vaults, including their names, ARNs, and creation dates.

4. Deleting a Glacier Vault

Before you can delete a vault, ensure that the vault is empty (i.e., no archives are stored).

Step-by-Step Guide to Use AWS CLI for Glacier Management: AWS CLI Commands for Cold Storage

Step 1: AWS CLI Installation

Ensure that you have the AWS CLI installed.

pip install awscli

Step 2: AWS Configuration

Configure the AWS CLI with your credentials

aws configure

Step 3: Create a Glacier Vault

To create a Glacier vault, use the create-vault Command.

aws glacier create-vault --account-id - --vault-name namvault

Replace my-vault with your desired vault name.

Step 3: List Vaults

To list all Glacier vaults in your account, use the list-vaults command.

aws glacier list-vaults --account-id -

Step 4: Describe a Vault

To get details about a specific vault, use the describe-vault command.

aws glacier describe-vault --account-id - --vault-name namvault

Step 5: Upload an Archive to a Vault

To upload a file (archive) to a Glacier vault, use the upload-archive command.

aws glacier upload-archive --account-id - --vault-name namvault --body two.txt

Replace my-vault with your vault name and my-file.zip with the path to the file you want to upload.

Step 6: List Archives in a Vault

To list the archives in a vault, you first need to initiate an inventory-retrieval job.

aws glacier initiate-job --account-id - --vault-name namvault --job-parameters file://inventory-retrieval.json

This command will return a job ID. You need to wait for the job to complete, which can take several hours.

Step 7: Retrieve an Archive

To retrieve an archive, initiate a retrieval job.

aws glacier initiate-job --account-id - --vault-name namvault --job-parameters file://job-archive-retrieval.json

Replace ARCHIVE_ID with the ID of the archive you want to retrieve.

Step 8: Delete an Archive

To delete an archive from a vault, use the delete-archive command.

aws glacier delete-archive --account-id - --vault-name namvault 
--archive-id L3_Ey_PSsQEaeIh8_iIGWXNOx4hmaGCE4NPPHz5UVUnOJGycNJHq7DiPXY2Vdg5u4W3U17YP_
uCSryVsCZ_1yV00xVNojc1py_VP_zUxEHb4X4sFY_6vhCirhh80QJwVAW7PPWlUaA

Replace ARCHIVE_ID with the ID of the archive you want to delete.

Step 8: Delete a Vault

To delete a Glacier vault, ensure it is empty first. Then, use the delete-vault command.

aws glacier delete-vault --account-id - --vault-name demovault

Replace my-vault with the name of your vault.

Best Practices for Efficient Glacier Management Using AWS CLI

1. Optimizing Costs

Use S3 Glacier Storage Classes

S3 Glacier and S3 Glacier Deep Archive are cost-effective storage classes designed for long-term data archiving. Choose the appropriate class based on your retrieval needs.

Monitor Storage Usage

Regularly monitor the storage usage and costs using AWS Cost Explorer or the AWS CLI.

Review Retrieval Patterns

Avoid unnecessary retrievals and batch your retrieval requests to minimize costs.

Select Appropriate Retrieval Options

Use Standard, Bulk, or Expedited retrieval options based on urgency and cost considerations.

2. Setting Up Lifecycle Policies

Define Lifecycle Policies

Automate data transitions between storage classes using lifecycle policies.

Example JSON for Lifecycle Policy

json
{
  "Rules": [
    {
      "ID": "Move to Glacier after 30 days",
      "Prefix": "",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "GLACIER"
        }
      ],
      "NoncurrentVersionTransitions": [
        {
          "NoncurrentDays": 30,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 365
      }
    }
  ]
}

Command to apply the lifecycle policy

aws s3api put-bucket-lifecycle-configuration --bucket your-bucket --lifecycle-configuration file://lifecycle.json

Review and Update Policies

Regularly review and update lifecycle policies to ensure they align with your data retention and archival requirements.

3. Securing Data

Enable Default Encryption

Enable default encryption on your S3 bucket to ensure all objects are automatically encrypted when stored in Glacier.

Use AWS Key Management Service (KMS)

For enhanced security, use AWS KMS to manage encryption keys.

Implement Access Controls

Use IAM policies, bucket policies, and ACLs to restrict access to your Glacier archives.

Example IAM Policy

json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::your-bucket/*"
    }
  ]
}

Enable Logging and Monitoring

Enable S3 server access logging and configure CloudWatch alarms to monitor access and activities in your Glacier storage.

4. Data Management and Retrieval

Inventory Reports

Use S3 Inventory reports to manage and track objects stored in Glacier.

Efficient Retrieval Planning

Plan retrievals ahead of time and use the appropriate retrieval option to minimize costs and meet your performance requirements.

Other significant data archives are managed using Amazon S3 Glacier by tracking the data in vaults. Glacier enables you to create vaults, which can store data objects and has tools to manage and retrieve information about your archived data. Vault Inventory is one such feature it provides-your listing of all the objects stored in a vault, retrieved on demand or periodically.

What is Vault Inventory?

An Amazon S3 Glacier vault inventory returns a JSON-formatted list of archives, which are the files contained in a given vault. The vault inventory isn't available in real time, and the list takes several hours to generate. You can use SNS to receive an e-mail when it is ready.

An SNS topic is needed so that Amazon Glacier can send you a notification when the vault inventory is ready.

Create a New Topic: Example CLI command to create a topic

aws sns create-topic --name glacier-inventory-notifications

Create a Subscription: Once the topic is created, you need to add subscriptions. This subscription could be an email, SMS, or another AWS service (like Lambda) that will receive the notifications.

aws sns subscribe --topic-arn arn:aws:sns:us-east-1:123456789012:glacier-inventory-notifications --protocol email --notification-endpoint [email protected]

Check your email and confirm the subscription by clicking the link in the email from AWS SNS.

2. Creating a Glacier Vault

aws glacier create-vault --account-id - --vault-name namvault --region us-east-1

This command will return a JSON response with the location of the vault.

3. Retrieving Data from Glacier

Initiate Archive Retrieval

aws glacier initiate-job --account-id - --vault-name namvault --job-parameters "{\"Type\": \"archive-retrieval\",
 \"ArchiveId\": \"4ID8-ydgHsbSz37hXknsUOmzWPX_7pRnk0tHT6gAuo_Bmb7zvbw0JhMX0oW-
WmHdc4evcKxKJixOG7tbgxBcgB8bQ9FN0Hbe3xOYmENFMeg0eq4apyWq89X6DwnJbGvF-izq1i4KZw\", \"SNSTopic\": \"arn:aws:sns:us-east-1:490004638420:mytopic\"}"

Replace YourArchiveId with the actual ArchiveId you want to retrieve.

Initiate Inventory Retrieval

aws glacier initiate-job --account-id - --vault-name namvault --job-parameters 
"{\"Type\": \"inventory-retrieval\", \"SNSTopic\": \"arn:aws:sns:us-east-1:490004638420:mytopic\"}"

4. Handling Notifications

When the retrieval job is complete, an email notification will be sent to the subscribed endpoint. This email will contain details about the job and how to access the retrieved data.

Conclusion

AWS S3 Glacier offers low-cost long-term archives for infrequently accessed data. Glacial storage can easily be managed using the AWS CLI, but this management comes with great potential in the field of automation. The AWS CLI is a vital tool one needs to possess when working with Glacier; it offers commands for creating, uploading, retrieving, and deleting data. In terms of archival for business compliance or disaster recovery, Glacier has a lot to deliver at minimal cost.

Can I store any file type in Glacier?

Yes, you can archive whatever type of file you want in Glacier. However, it's most often used for archiving large datasets, backups, or regulatory information that doesn't need to be accessed frequently.

What happens if I try to delete a vault that still has archives in it?

Glacier won't let you delete a vault if it holds archives. You must remove all archives before deleting the vault.

What is Amazon Glacier?

nikhithamcb76

Improve

Article Tags :