0% found this document useful (0 votes)
2 views

Week 5 Cloud Storage Service

The document discusses AWS Cloud Storage Services, highlighting the importance of cloud storage for cost-effectiveness, agility, faster deployment, efficient data management, scalability, and business continuity. It details the types of cloud storage—object, file, and block storage—and provides an overview of Amazon S3, including its features, bucket and object management, versioning, and encryption options. Additionally, it includes practical labs on creating folders, calculating folder sizes, deleting folders, enabling versioning, and configuring encryption in S3.

Uploaded by

akashnavani17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Week 5 Cloud Storage Service

The document discusses AWS Cloud Storage Services, highlighting the importance of cloud storage for cost-effectiveness, agility, faster deployment, efficient data management, scalability, and business continuity. It details the types of cloud storage—object, file, and block storage—and provides an overview of Amazon S3, including its features, bucket and object management, versioning, and encryption options. Additionally, it includes practical labs on creating folders, calculating folder sizes, deleting folders, enabling versioning, and configuring encryption in S3.

Uploaded by

akashnavani17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Week 5 Cloud Storage Service

5.1 - AWS Storage Services


Cloud storage is a cloud computing model that enables storing data and files on the internet through a
cloud computing provider that you access either through the public internet or a dedicated private network
connection. The provider securely stores, manages, and maintains the storage servers, infrastructure, and
network to ensure you have access to the data when you need it at virtually unlimited scale, and with elastic
capacity. Cloud storage removes the need to buy and manage your own data storage infrastructure, giving
you agility, scalability, and durability, with any time, anywhere data access.

Q.Why is aws cloud storage important? Or Why aws cloud storage is required?

Aws cloud storage is important/required for the following reasons-

Cost effectiveness
With cloud storage, there is no hardware to purchase, no storage to provision, and no extra capital being
used for business spikes. You can add or remove storage capacity on demand and only pay for storage that
you actually use. As data becomes infrequently and rarely accessed, you can even automatically move it to
lower-cost storage, thus creating even more cost savings.

Increased agility
With cloud storage, resources are only a click away. You reduce the time to make those resources
available to your organization from weeks to just minutes. This results in a dramatic increase in agility for
your organization. Your staff is largely freed from the tasks of procurement, installation, administration, and
maintenance.

Faster deployment
Cloud storage services allow IT to quickly deliver the exact amount of storage needed, whenever and
wherever it's needed. Your developers can focus on solving complex application problems instead of having
to manage storage systems.

Efficient data management


You can also use cloud storage to create multi-region or global storage for your distributed teams by using
tools such as replication. You can organize and manage your data in ways that support specific use cases,
create cost efficiencies, enforce security, and meet compliance requirements.

Virtually unlimited scalability


Cloud storage delivers virtually unlimited storage capacity, allowing you to scale up as much and as
quickly as you need. This removes the constraints of on-premises storage capacity. You can efficiently scale
cloud storage up and down as required for analytics, data lakes, backups, or cloud native applications. Users
can access storage from anywhere, at any time.

Business continuity
Cloud storage providers store your data in highly secure data centers, protecting your data and ensuring
business continuity.You can further protect your data by using versioning and replication tools to more
easily recover from both unintended user actions or application failures.
What are the types of cloud storage?

There are three main cloud storage types: object storage, file storage, and block storage. Each offers its
own advantages and has its own use cases.

1.Object storage
Organizations have to store a massive and growing amount of unstructured data, such as photos, videos,
machine learning (ML), sensor data, audio files, and other types of web content. Object storage is a data
storage architecture for large stores of unstructured data. Objects store data in the format it arrives in and
makes it possible to customize metadata in ways that make the data easier to access and analyze. Instead of
being organized in files or folder hierarchies, objects are kept in secure buckets that deliver virtually
unlimited scalability. It is also less costly to store large data volumes.

2.File storage
File-based storage or file storage is widely used among applications and stores data in a hierarchical folder
and file format. This type of storage is often known as a network-attached storage (NAS) server with
common file level protocols of Server Message Block (SMB) used in Windows instances and Network File
System (NFS) found in Linux.

3.Block storage
Enterprise applications like databases or enterprise resource planning (ERP) systems often require
dedicated, low-latency storage for each host. This is analogous to direct-attached storage (DAS) or a storage
area network (SAN). In this case, you can use a cloud storage service that stores data in the form of blocks.
Each block has its own unique identifier for quick storage and retrieval.

Storage Offered By Amazon Web Services (AWS)

5.2 Amazon S3 - Section Introduction


Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading
scalability, data availability, security, and performance. Customers of all sizes and industries can use
Amazon S3 to store and protect any amount of data for a range of use cases, such as data lakes, websites,
mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics.
Amazon S3 provides management features so that you can optimize, organize, and configure access to your
data to meet your specific business, organizational, and compliance requirements. The data which is
uploaded by the user in S3, that data is stored as objects and provided an ID. Moreover, they store in shapes
like buckets and can upload the maximum file size is of 5 Terabyte(TB).

5.3 S3 Buckets and Objects


In Amazon S3, buckets and objects are the primary resources, and objects are stored in buckets. Amazon
S3 has a flat structure instead of a hierarchy like you would see in a file system. However, for the sake of
organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping
objects. The console does this by using a shared name prefix for the grouped objects. In other words, the
grouped objects have names that begin with a common string. This common string, or shared prefix, is the
folder name. Object names are also referred to as key names.
For example, you can create a folder in the console named photos and store an object
named myphoto.jpg in it. The object is then stored with the key name photos/myphoto.jpg, where photos/ is
the prefix.
Here are two more examples:
 If you have three objects in your bucket—logs/date1.txt, logs/date2.txt, and logs/date3.txt—the console
will show a folder named logs. If you open the folder in the console, you will see three
objects: date1.txt, date2.txt, and date3.txt.
 If you have an object named photos/2017/example.jpg, the console will show you a folder
named photos containing the folder 2017. The folder 2017 will contain the object example.jpg.

You can have folders within folders, but not buckets within buckets. You can upload and copy objects
directly into a folder. Folders can be created, deleted, and made public, but they cannot be renamed. Objects
can be copied from one folder to another.

5.4 Lab - S3 Buckets and Objects

This section describes how to use the Amazon S3 console to create a folder.
To create a folder
1. Sign in to the AWS Management Console and open the Amazon S3 console
at https://console.aws.amazon.com/s3/.
2. In the left navigation pane, choose Buckets.
3. In the Buckets list, choose the name of the bucket that you want to create a folder
in.
4. If your bucket policy prevents uploading objects to this bucket without encryption,
you must choose Enable under Server-side encryption.
5. Choose Create folder.
6. Enter a name for the folder (for example, favorite-pics). Then choose Create folder.

Calculating folder size

This section describes how to use the Amazon S3 console to calculate a folder's size.
To calculate a folder's size
1. Sign in to the AWS Management Console and open the Amazon S3 console
at https://console.aws.amazon.com/s3/.
2. In the left navigation pane, choose Buckets.
3. In the Buckets list, choose the name of the bucket in which your folder is stored.
4. In the Objects list, select the check box next to the name of the folder.
5. Choose Actions, and then choose Calculate total size.

Deleting folders
To delete folders from an S3 bucket
1. Sign in to the AWS Management Console and open the Amazon S3 console
at https://console.aws.amazon.com/s3/.
2. In the Buckets list, choose the name of the bucket that you want to delete folders from.
3. In the Objects list, select the check box next to the folders and objects that you want to delete.
4. Choose Delete.
5. On the Delete objects page, verify that the names of the folders you selected for deletion are listed.
6. In the Delete objects box, enter delete, and choose Delete objects.

5.4 S3 Versioning

Versioning in Amazon S3 is a means of keeping multiple variants of an object in the same bucket. You
can use the S3 Versioning feature to preserve, retrieve, and restore every version of every object stored in
your buckets. With versioning you can recover more easily from both unintended user actions and
application failures. After versioning is enabled for a bucket, if Amazon S3 receives multiple write requests
for the same object simultaneously, it stores all of those objects.
Versioning-enabled buckets can help you recover objects from accidental deletion or overwrite. For
example, if you delete an object, Amazon S3 inserts a delete marker instead of removing the object
permanently. The delete marker becomes the current object version. If you overwrite an object, it results in a
new object version in the bucket. You can always restore the previous version.
By default, S3 Versioning is disabled on buckets, and you must explicitly enable it. Buckets can be in one
of three states:
1. Unversioned (the default)
2. Versioning-enabled
3. Versioning-suspended
You enable and suspend versioning at the bucket level. After you version-enable a bucket, it can never
return to an unversioned state. But you can suspend versioning on that bucket.
The versioning state applies to all (never some) of the objects in that bucket. When you enable versioning
in a bucket, all new objects are versioned and given a unique version ID. Objects that already existed in the
bucket at the time versioning was enabled will thereafter always be versioned and given a unique version ID
when they are modified by future requests.
5.5 Lab - S3 Versioning
You can use S3 Versioning to keep multiple versions of an object in one bucket. This section provides
examples of how to enable versioning on a bucket using the console, REST API, AWS SDKs, and AWS
Command Line Interface (AWS CLI).
Note : If you enable versioning on a bucket for the first time, it might take a short amount of time for the
change to be fully propagated. We recommend that you wait for 15 minutes after enabling versioning before
issuing write operations (PUT or DELETE) on objects in the bucket.
To learn more about how to use S3 Versioning to protect data, see Tutorial: Protecting data on Amazon S3
against accidental deletion or application bugs using S3 Versioning, S3 Object Lock, and S3 Replication
Each S3 bucket that you create has a versioning subresource associated with it. By default, your bucket is
unversioned, and the versioning subresource stores the empty versioning configuration, as follows.
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
</VersioningConfiguration>
To enable versioning, you can send a request to Amazon S3 with a versioning configuration that includes a
status.
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Status>Enabled</Status>
</VersioningConfiguration>
To suspend versioning, you set the status value to Suspended.
The bucket owner and all authorized users can enable versioning. The bucket owner is the AWS account
that created the bucket (the root account). For more information about permissions, see Identity and access
management in Amazon S3.
The following sections provide more detail about enabling S3 Versioning using the console & AWS CLI.
Using the S3 console
Follow these steps to use the AWS Management Console to enable versioning on an S3 bucket.
To enable or disable versioning on an S3 bucket
1. Sign in to the AWS Management Console and open the Amazon S3 console at
https://console.aws.amazon.com/s3/
2. In the Buckets list, choose the name of the bucket that you want to enable versioning for.
3. Choose Properties.
4. Under Bucket Versioning, choose Edit.
5. Choose Suspend or Enable, and then choose Save changes.
Using the AWS CLI
The following example enables versioning on an S3 bucket.
aws s3api put-bucket-versioning --bucket DOC-EXAMPLE-BUCKET1 --versioning-configuration
Status=Enabled

The following example enables S3 Versioning and multi-factor authentication (MFA) delete on a bucket.
aws s3api put-bucket-versioning --bucket DOC-EXAMPLE-BUCKET1 --versioning-configuration
Status=Enabled,MFADelete=Enabled --mfa "SERIAL 123456"

5.6 S3 Encryption
Data encryption is a process for securing data by encoding information. Data is encoded using a password
or an encryption (cypher) key and special encryption algorithms. The encrypted data can then be accessed
by using the correct password or encryption (decryption) key. Data encryption is used to protect digital data
confidentiality even if an unauthorized person gains logical or physical access to that data. If an
unauthorized person gets access to the encrypted data, the data is unreadable without the key or password.
Amazon recommends the use of S3 encryption when storing data in Amazon S3 buckets. The first reason
for this recommendation is security. Encryption increases the level of security and privacy. However, there
is another reason for why data stored in the cloud should be encrypted. Amazon stores data of users from
different countries. Sometimes a country can request data be submitted for an investigation if a client or an
organization is suspected of violating the law. However, Amazon must respect the license agreement and
laws of other countries (countries whose citizens are Amazon customers) and a conflict can occur.
If a user’s data is encrypted and Amazon doesn’t have the encryption keys, the user’s data cannot be
provided to third party organizations or persons (even if the encrypted data is provided, it is a useless and
unreadable set of bits). Imagine a situation in which the USA requests data from a European Amazon
customer for investigation. What to do in this case? As you may already know, the personal data of
European citizens is protected by the General Data Protection Regulation (GDPR).
Amazon S3 Encryption Types
How does S3 encryption work? Amazon provides several encryption types for data stored in Amazon S3.
Is S3 encrypted? By default, data stored in an S3 bucket is not encrypted, but you can configure the AWS S3
encryption settings.
You should define which encryption method to use after answering the following questions:
 Who encrypts and decrypts the data?
 Who stores the secret key?
 Who manages the secret key?
Let’s look at the available AWS encryption methods for S3 objects stored in a bucket.
Server-side encryption
Server-Side Encryption (SSE) is the simplest data encryption option. All heavy encryption operations are
performed on the server side in the AWS cloud. You send raw (unencrypted) data to AWS and then data is
encrypted on the AWS side when recorded on the cloud storage. When you need to get your data back,
Amazon reads the encrypted data, decrypts the needed data on the Amazon server side, and then sends the
unencrypted data to you over the network. This process is transparent for end-users.
SSE-S3 is the simplest method – the keys are managed and handled by AWS to encrypt the data you have
selected.
SSE-KMS is a slightly different method from SSE-S3. AWS Key Management Service (KMS) is used to
encrypt S3 data on the Amazon server side. The data key is managed by AWS, but a user manages the
customer master key (CMK) in AWS KMS.

Client-side encryption
When using S3 client-side encryption, the client is responsible for all encryption operations. In this case,
data is not encrypted by AWS but rather it is encrypted on the user’s side. Data encrypted in the user’s
datacenter is uploaded directly to AWS. Two options are provided for S3 client-side encryption – a master
key can be stored on the client side or on the server side. If a master key is stored on the client side, the
client takes full responsibility for encryption. The advantage of this approach is that Amazon never knows
the encryption keys of the user and data is never stored on Amazon servers in an unencrypted state. A user
encrypts data before sending data to Amazon S3 and decrypts data after retrieving it from Amazon S3.

5.7 -Lab- S3 Encryption

How to Configure AWS S3 Encryption?


1. Log into the web interface of AWS. Your account must have enough permissions to edit S3 settings.
2. Go to the Amazon S3 page (the link may differ depending on your region and account):
https://s3.console.aws.amazon.com/s3/home

3. Select your bucket or create a new bucket for which you want to configure encryption settings.
4. On the page with the bucket settings, click the Properties tab and then click Default encryption.

5. The encryption settings are now open. By default, S3 bucket encryption option is disabled.
6. Select the needed option, for example, AES-256. This is server-side encryption with Amazon S3-
managed keys (SSE-S3).You can view the bucket policy. Click Save to save the encryption settings
for the bucket. The settings will be used as the default S3 encryption settings for objects added to this
bucket in future.
.

7. Click Save.
8. Now default encryption is set. All new objects stored in the S3 bucket will be encrypted according to
the set configuration. It is recommended that you enable encryption when creating a bucket. You can
also enable encryption later at the bucket level. However, if you configure encryption settings later,
these setting won’t affect unencrypted files that have been already uploaded to the bucket.
9. If you want to select the AWS-KMS encryption, click the appropriate option. In this case, select a
key from the drop-down list.

5.8 S3 Security & Bucket Policies


An S3 bucket policy is an object that allows you to manage access to specific Amazon S3 storage
resources. You can specify permissions for each resource to allow or deny actions requested by a principal
(a user or role). When you create a new Amazon S3 bucket, you should set a policy granting the relevant
permissions to the data forwarder’s principal roles.
Bucket policies are an Identity and Access Management (IAM) mechanism for controlling access to
resources. They are a critical element in securing your S3 buckets against unauthorized access and attacks.

S3 Bucket Policy Elements


An Amazon S3 bucket policy contains the following basic elements:
 Statements—a statement is the main element in a policy. It consists of several elements, including
principals, resources, actions, and effects. Bucket policies typically contain an array of statements.
 Permitted principals—a principal is a user, entity, or account with access permissions to resources
and actions in a statement.
 Resources—Amazon S3 resources to which the policy applies include buckets, objects, jobs, and
access points. You can identify resources using ARNs(Amazon Resource Names).
 Actions—there are specific, permitted operations for each resource. You can use action keywords to
allow or deny operations.
 Effects—each request by a principal must generate an allow or deny effect. In the absence of an
explicit access permission to a resource, the policy will automatically deny the request.
 Conditions—these determine when the policy applies. You can specify conditions for access
policies using AWS-wide or S3-specific keys.
 Version—this determines the policy’s language version. This element is optional, allowing you to
specify a new language version instead of the old default version.
 ID—this optional element specifies a policy identifier. Policy IDs should be unique, with GUID
values.
 Statement ID (Sid)—this is an identifier that you can assign to policy statements. You may assign
Sid values to every statement in a policy. In AWS services like SNS and SQS, which allow you to
specify ID elements, the Sid values are sub-IDs of the policy’s ID. IAM requires the Sid values in a
JSON policy to be unique.
5.9 -Lab - S3 Security & Bucket Policies. Creating and Editing a Bucket Policy
Here is a step-by-step guide to adding a bucket policy or modifying an existing policy via the Amazon S3
console. You can add a policy to an S3 bucket to provide IAM users and AWS accounts with access
permissions either to the entire bucket or to specific objects contained in the bucket. Object permissions are
limited to the specified objects.
To add or modify a bucket policy via the Amazon S3 console:
1. Go to the Amazon S3 console in the AWS management console
(https://console.aws.amazon.com/s3/).
2. Select the bucket to which you wish to add (or edit) a policy in the buckets list and select
permissions.
3. Enter your policy text (or edit the text) in the text box of the bucket policy editor. Bucket policies use
JSON files, so you must type or paste JSON text.
4. Once you’ve created your desired policy, select save changes.

To create a bucket policy with the AWS Policy Generator:


1. Open the policy generator and select S3 bucket policy under the select type of policy menu.
2. Populate the fields presented to add statements and then select generate policy. Copy the text of the
generated policy.
3. Go back to the edit bucket policy section in the Amazon S3 console and select edit under the policy
you wish to modify.
4. Once you’ve created your desired policy, select save changes.
Above the policy text field for each bucket in the Amazon S3 console, you will see an Amazon Resource
Name (ARN), which you can use in your policy. You can also preview the effect of your policy on cross-
account and public access to the relevant resource. You can check for findings in IAM Access Analyzer
before you save the policy.

5.10 - S3 Consistency Model


Data consistency ensures that all data in a database remains accurate and consistent throughout every
transaction. In distributed systems, such as Amazon S3, achieving consistency can be challenging due to the
system’s distributed nature.
Since its inception, Amazon S3 has been working on a strong consistency model across all its storage
classes. Prior to 2020, Amazon S3 operated on an eventual consistency model for overwrite PUTS and
DELETES. However, as of December 2020, AWS announced that S3 now provides strong read-after-write
consistency automatically for all objects, including overwrite PUTS and DELETES, without changes to
performance or availability, without sacrificing regional isolation for applications, and at no additional cost.
This means when you upload, update, or delete an object in your S3 bucket, the changes are instantly
propagated, and you can immediately read the object or confirm that it has been deleted.
Why is this Important?
Unpredictability in system behavior can lead to a lot of time spent debugging for software engineers and
data scientists. The new consistency model reduces this unpredictability, thus saving time.
Moreover, real-time data processing applications can immediately process new objects uploaded to Amazon
S3. Data replication solutions can also validate replication just after it’s completed.

How to Use Amazon S3 Data Consistency


The best part about S3’s strong consistency model is that it comes out-of-the-box. There are no specific
steps you need to take to enable or use it.
However, it’s essential to remember that while S3 ensures strong consistency, the application you’re
working on might require additional measures to ensure its consistency. For instance, if your application has
its own caching layer, you might have to invalidate the cache after a write operation to ensure the next read
operation fetches the latest data.

5.11 - S3 MFA Delete


Most of us know within AWS we have a cheapest storage service available which we all love and use
AWS S3 to store objects which is considered as one of the most important content data for any company.
In AWS S3 we can optionally add another layer of protection by configuring buckets to enable MFA
Delete, which can help to prevent accidental bucket deletions and it’s content.
What is MFA delete?
 Securing objects from accidental deletion is one of the major concern.
 To avoid such scenarios , AWS has a feature which we can implement on the S3 buckets by applying
MFA.
 MFA (multi factor authentication) adds a layer of security for the following reasons.

Authentication required for MFA delete


 While configuring MFA delete on the buckets , There should be two level of authentication required.
- By using security credentials
- Six digit code from the approved authentication device such as Google authenticator

Note: Currently this option is only available via use the AWS CLI, AWS SDK, or the Amazon S3 REST
API.

5.12 - Lab - S3 MFA Delete


Enable MFA on S3 Bucket
1- Create S3 Bucket with Versioning enabled.

Note:- if you want to know regards to versioning please follow me on YouTube where i have covered
all detail.
Note:- Enabling versioning on the S3 buckets can be done using IAM users But activating and de-
activating MFA delete can only be done using Root account.

2- Install Google/MS Authenticator [I prefer these 2]


 Install Google/MS Authenticator in your mobile and then configure MFA for the Root account , As
we are going to use this MFA code to enable and disable MFA delete.
 Once the MFA is configured , Now its time to configure MFA delete on the S3 buckets.

3- You need to pass root account MFA device serial number and current MFA token value. (Created a
separate CLI profile for my root account).

$ aws s3api put-bucket-versioning --profile shashank-profile --bucket bucket-name --versioning-


configuration Status=Enabled,MFADelete=Enabled --mfa “arn:aws:iam::111111111111:mfa/shashank
897699”

Once we execute above command , it will put the bucket into versioning mode with MFA Delete layer on
for that bucket.

Verify MFA delete With CLI

aws s3api get-bucket-versioning --bucket bucket-name --profile shashank-profile

Note:- The MFA delete is exclusively for versioned objects, which means that if you delete a file, it will
be deleted but all of its versions will be kept.

Disable MFA delete on S3 bucket

Let’s say you want to keep the versioning, but want to delete the MFA delete on the S3 bucket.
This can be done with the command listed below.

aws s3api put-bucket-versioning --profile shashank-profile --bucket bucket-name --versioning-configuration


Status=Enabled,MFADelete=Disabled --mfa "arn:aws:iam::111111111111:mfa/shashank 897698"

aws s3api get-bucket-versioning --bucket bucket-name --profile shashank-profile


5.13 - S3 Websites

→ AWS provides us with the facility to host static websites in S3.

→ For that go to AWS console → Search S3 service → Go to your S3 bucket


NOTE: If you have not created an S3 bucket. Click Create bucket in S3 service
→ We need to upload the index.html file. This file should contain HTML codes that will render on a
webpage.
→ Go toPermissions tab → Uncheck Block Public Sharing setting

→ Go to Properties tab → Scroll extreme down → Enable static website → Save


→ Once the website is enabled, we get the URL of our static website. Hit it in the browser.

S3 provides a great way to easily host static websites. Many professionals want to have their portfolio or a
basic website for their business. AWS S3 will be an awesome fit for the same.

5.14 S3 CORS
Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one
domain to interact with resources in a different domain. With CORS support, you can build rich client-side
web applications with Amazon S3 and selectively allow cross-origin access to your Amazon S3 resources.
This section provides an overview of CORS. The subtopics describe how you can enable CORS using the
Amazon S3 console, or programmatically by using the Amazon S3 REST API and the AWS SDKs.
The following are example use case scenarios for using CORS.

Scenario 1
Suppose that you are hosting a website in an Amazon S3 bucket named website as described in Hosting a
static website using Amazon S3. Your users load the website endpoint:
http://website.s3-website.us-east-1.amazonaws.com
Now you want to use JavaScript on the webpages that are stored in this bucket to be able to make
authenticated GET and PUT requests against the same bucket by using the Amazon S3 API endpoint for the
bucket, website.s3.us-east-1.amazonaws.com. A browser would normally block JavaScript from allowing
those requests, but with CORS you can configure your bucket to explicitly enable cross-origin requests from
website.s3-website.us-east-1.amazonaws.com.

Scenario 2
Suppose that you want to host a web font from your S3 bucket. Again, browsers require a CORS check
(also called a preflight check) for loading web fonts. You would configure the bucket that is hosting the web
font to allow any origin to make these requests.

5.15 Lab - S3 CORS


Configuring cross-origin resource sharing (CORS)
Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one
domain to interact with resources in a different domain. With CORS support, you can build rich client-side
web applications with Amazon S3 and selectively allow cross-origin access to your Amazon S3 resources.
To configure your bucket to allow cross-origin requests, you add a CORS configuration to the bucket. A
CORS configuration is a document that defines rules that identify the origins that you will allow to access
your bucket, the operations (HTTP methods) supported for each origin, and other operation-specific
information. In the S3 console, the CORS configuration must be a JSON document.

Using the S3 console


This section explains how to use the Amazon S3 console to add a cross-origin resource sharing (CORS)
configuration to an S3 bucket.
When you enable CORS on the bucket, the access control lists (ACLs) and other access permission
policies continue to apply.

Important
In the new S3 console, the CORS configuration must be JSON. For examples CORS configurations in
JSON and XML, see CORS configuration.

To add a CORS configuration to an S3 bucket


1. Sign in to the AWS Management Console and open the Amazon S3 console at
https://console.aws.amazon.com/s3/
 In the Buckets list, choose the name of the bucket that you want to create a bucket policy for.
 Choose Permissions.
 In the Cross-origin resource sharing (CORS) section, choose Edit.
 In the CORS configuration editor text box, type or copy and paste a new CORS configuration, or
edit an existing configuration.
 The CORS configuration is a JSON file. The text that you type in the editor must be valid JSON. For
more information, see CORS configuration.
 Choose Save changes.
Note : Amazon S3 displays the Amazon Resource Name (ARN) for the bucket next to the CORS
configuration editor title..

5.16 S3 Default Encryption


All Amazon S3 buckets have encryption configured by default, and objects are automatically encrypted by
using server-side encryption with Amazon S3 managed keys (SSE-S3). This encryption setting applies to all
objects in your Amazon S3 buckets.
If you need more control over your keys, such as managing key rotation and access policy grants, you can
choose to use server-side encryption with AWS Key Management Service (AWS KMS) keys (SSE-KMS),
or dual-layer server-side encryption with AWS KMS keys (DSSE-KMS). For more information about
editing KMS keys, see Editing keys in AWS Key Management Service Developer Guide.
When you configure your bucket to use default encryption with SSE-KMS, you can also enable S3 Bucket
Keys to decrease request traffic from Amazon S3 to AWS KMS and reduce the cost of encryption. To
identify buckets that have SSE-KMS enabled for default encryption, you can use Amazon S3 Storage Lens
metrics. S3 Storage Lens is a cloud-storage analytics feature that you can use to gain organization-wide
visibility into object-storage usage and activity.

5.17 S3 Access Logs

AWS S3 provides log information regarding access to buckets and their objects. AWS S3 logs record the
following: bucket owner, bucket, time, remote IP, requester, operation, request ID, request URI, key, error
code, bytes sent, HTTP status, total time, object size, turnaround time, user agent, referrer, host ID, version
ID, cipher suite, signature version, authentication type, TLS version, and host header.
When access to an object is requested, users can use this information to identify the origin of the requester.
You can check if unauthorized agents have accessed any resources or identify a resource with an unusually
high number of downloads. You can also determine whether the turnaround time for receiving a file is
within the expectations of applications and users. In addition, this information can help you understand how
an application has been used by showing the resource and version that has a request pending.

5.18 Lab - S3 Access Logs

Enabling Logging for Bucket Objects


To use S3 logs, you first need to create one bucket to store files (objects) and another to store the logs. This
should be created in the same region. It is a good practice not to save the logs in the same bucket because we
want to save the logs for the interactions that the bucket receives and if the bucket has a problem the logs
may not be able to be saved with the information about what is causing the error.


After you’ve created the buckets, go to the Properties of the bucket that will store the files to associate it
with the bucket for logs. On the Properties page, click on the Edit button in the Server access logging box.
In this form, select Enable to allow the bucket to provide log data about stored objects, then click
on Browse S3 to select the log bucket.

‍ In the modal, select the proper bucket and click on Choose path. Back in the form, click on Save
changes to apply the association between the buckets. Clicking that button is all you need to do to start
saving object usage logs.

5.19 S3 Replication (Cross Region and Same Region)


Replication enables automatic, asynchronous copying of objects across Amazon S3 buckets. Buckets that are
configured for object replication can be owned by the same AWS account or by different accounts. You can replicate
objects to a single destination bucket or to multiple destination buckets. The destination buckets can be in different
AWS Regions or within the same Region as the source bucket.
When to use Cross-Region Replication
S3 Cross-Region Replication (CRR) is used to copy objects across Amazon S3 buckets in different AWS
Regions. CRR can help you do the following:
 Meet compliance requirements – Although Amazon S3 stores your data across multiple
geographically distant Availability Zones by default, compliance requirements might dictate that you
store data at even greater distances. To satisfy these requirements, use Cross-Region Replication to
replicate data between distant AWS Regions.
 Minimize latency – If your customers are in two geographic locations, you can minimize latency in
accessing objects by maintaining object copies in AWS Regions that are geographically closer to
your users.

 Increase operational efficiency – If you have compute clusters in two different AWS Regions that
analyze the same set of objects, you might choose to maintain object copies in those Regions.

When to use Same-Region Replication


Same-Region Replication (SRR) is used to copy objects across Amazon S3 buckets in the same AWS
Region. SRR can help you do the following:
 Aggregate logs into a single bucket – If you store logs in multiple buckets or across multiple
accounts, you can easily replicate logs into a single, in-Region bucket. Doing so allows for simpler
processing of logs in a single location.

 Configure live replication between production and test accounts – If you or your customers have
production and test accounts that use the same data, you can replicate objects between those multiple
accounts, while maintaining object metadata.
 Abide by data sovereignty laws – You might be required to store multiple copies of your data in
separate AWS accounts within a certain Region. Same-Region Replication can help you
automatically replicate critical data when compliance regulations don't allow the data to leave your
country.

5.20 Lab - S3 Replication

The AWS S3 Replication process can be easily carried out by using Replication Rule .

Setting up AWS S3 Replication to another S3 bucket can be performed by adding a Replication rule to the
source bucket. In case you need to Replicate to a bucket belonging to a different account, you will need to
set up certain bucket policies at the destination bucket also. Let us begin adding a Replication rule.

 Step 1: Sign in to the AWS S3 management console and choose the name of the bucket you want.
 Step 2: Select Replication in the management section as below. And click Add rule.

 Step 3: We will Replicate the whole bucket in this case. Choose the entire bucket as given below.

In case you choose to Replicate buckets encrypted using AWS Key management service, you will need to
select the correct key at this stage.

 Step 4: The next step is to select the destination. Select buckets in this account using the radio button as
below.
In case you require Replicating to another account, select the other option. In this case, AWS will warn
you about the bucket policies that should exist at the other end, since it cannot verify them. You will be
provided with a bucket policy that you need to ensure at the destination.

 Step5: If you need to change the storage class of the destination object, do it through the drop-down in
destination options as below.

You will also find a checkbox to enable Replication time control. This option ensures that 99.99 % of all
objects will be Replicated under a service level agreement of 15 minutes. Please note that this incurs
additional fees.

 Step 6: Create a new IAM role for this transfer as below.


If you already have a role with Replication permission, it can be used.

 Step 7: Set the status of the Replication rule and click next to create the rule.

As soon as you create the rule with enabled status, the Replication will start working. You can go into your
destination bucket after a few minutes and ensure that the Replication is indeed working.

5.21 S3 Pre-signed URLs


A Presigned URL is a time-limited URL that grants temporary access permissions to an S3 object. It is a
signed URL that contains authentication information and specifies the operations permitted on the object.
By generating a Presigned URL, we can delegate access to specific S3 resources to other users or make
things publicly accessible for a limited period.
Below is an example Presigned URL:

https://bucket.s3.region.amazonaws.com/Ninjafile.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-
Credential=random-aws-credential-to-identify-the-signer&X-Amz-Date=timestamp-of-generation-of-
url&X-Amz-Expires=validity-from-generation-timestamp&X-Amz-Signature=6zzca349-f6c2-38bd-98ce-
4bs464fb45cc&X-Amz-SignedHeaders=host

Parameters
Looking carefully at the above URL, we can see the following parameters. AWS Software Development
Kit automatically generates these.
 X-AMZ-Algorithm: Specifies the encryption algorithm used for authentication in AWS requests.

 X-AMZ-Credential: Contains the AWS access key and security token used to authenticate the request.

 X-AMZ-Date: The date and time at which the request was made, formatted according to AWS
standards.

 X-AMZ-Expires: Specifies the expiration time for the request, after which it is no longer valid.

 X-AMZ-Signature: The cryptographic signature generated using the request data, credentials, and
specified algorithm, used for request authentication.

 X-AMZ-SignedHeaders: Lists the headers included in the request that is part of the signature,
ensuring their integrity and authenticity.

When a user attempts to access S3 files using a Presigned URL, S3 validates the signature by computing it
with the provided credentials, including any optional SignedHeaders parameter. It then verifies the
signature's validity and checks if the link has expired before granting access to the requested resource.

5.22 Lab - S3 Pre-signed URLs


Generating Presigned URL using Python for S3 Bucket
To generate a Presigned URL, we first need to install the boto3 package in Python. It is the official AWS
Software Development Kit ( SDK ) for Python. Type the below command to install boto3:

Command
pip install boto3

Now type the following command in Python IDE to generate a Presigned URL:
Code
import boto3
AWS_S3_REGION = 'ap-south-1'
AWS_S3_BUCKET_NAME = "Ninja_s3_bucket"
AWS_S3_FILE_NAME = "Ninjafile.jpg"
PRESIGNED_URL_EXPIRY = 3600 # in seconds

s3_client = boto3.client('s3', aws_access_key_id=AWS_ACCESS_KEY_ID,


region_name=AWS_S3_REGION, aws_secret_access_key=AWS_SECRET_ACCESS_KEY,)

presigned_url = s3_client.generate_presigned_url('get_object', Params={"Bucket":


AWS_S3_BUCKET_NAME, "Key": AWS_S3_FILE_NAME}, ExpiresIn=PRESIGNED_URL_EXPIRY)

if presigned_url:
print("Presigned URL: ", presigned_url)

Explanation
Let’s see what is happening in the above code:
 The boto3 library is imported to interact with AWS services.

 Constants such as AWS_S3_REGION, AWS_S3_BUCKET_NAME, AWS_S3_FILE_NAME, and


PRESIGNED_URL_EXPIRY are defined. These values represent the AWS S3 region, bucket name,
file name/key, and the expiration time (in seconds) for the Presigned URL.

 An S3 client is created using boto3.client() by passing in the necessary parameters such as the service
name ('s3'), AWS access key ID, AWS secret access key, and region name.

 Then the generate_presigned_url method is invoked. The method is called with the operation name
('get_object') and a dictionary containing the parameters 'Bucket' (the S3 bucket name) and 'Key' (the
S3 object key/file name). The 'ExpiresIn' parameter specifies the duration for which the Presigned URL
will be valid.

 If the Presigned URL is successfully generated, it is printed to the console.

5.23 S3 Storage Classes + Glacier

Amazon S3 Storage Classes:


This storage maintains the originality of data by inspecting it. Types of storage classes are as follows:
S3 contains four types of storage classes:
o S3 Standard
o S3 Standard IA
o S3 one zone-infrequent access
o S3 Glacier

1. S3 Standard
o Standard storage class stores the data redundantly across multiple devices in multiple facilities.
o It is designed to sustain the loss of 2 facilities concurrently.
o Standard is a default storage class if none of the storage class is specified during upload.
o It provides low latency and high throughput performance.
o It designed for 99.99% availability and 99.999999999% durability

2. S3 Standard IA
o IA stands for infrequently accessed.
o Standard IA storage class is used when data is accessed less frequently but requires rapid access
when needed.
o It has a lower fee than S3, but you will be charged for a retrieval fee.
o It is designed to sustain the loss of 2 facilities concurrently.
o It is mainly used for larger objects greater than 128 KB kept for at least 30 days.
o It provides low latency and high throughput performance.
o It designed for 99.99% availability and 99.999999999% durability

3. S3 one zone-infrequent access


o S3 one zone-infrequent access storage class is used when data is accessed less frequently but requires
rapid access when needed.
o It stores the data in a single availability zone while other storage classes store the data in a minimum
of three availability zones. Due to this reason, its cost is 20% less than Standard IA storage class.
o It is an optimal choice for the less frequently accessed data but does not require the availability of
Standard or Standard IA storage class.
o It is a good choice for storing the backup data.
o It is cost-effective storage which is replicated from other AWS region using S3 Cross Region
replication.
o It has the same durability, high performance, and low latency, with a low storage price and low
retrieval fee.
o It designed for 99.5% availability and 99.999999999% durability of objects in a single availability
zone.
o It provides lifecycle management for the automatic migration of objects to other S3 storage classes.
o The data can be lost at the time of the destruction of an availability zone as it stores the data in a
single availability zone.
4. S3 Glacier
o S3 Glacier storage class is the cheapest storage class, but it can be used for archive only.
o You can store any amount of data at a lower cost than other storage classes.
o S3 Glacier provides three types of models:
o Expedited: In this model, data is stored for a few minutes, and it has a very higher fee.
o Standard: The retrieval time of the standard model is 3 to 5 hours.
o Bulk: The retrieval time of the bulk model is 5 to 12 hours.
o You can upload the objects directly to the S3 Glacier.
o It is designed for 99.999999999% durability of objects across multiple availability zones.
Performance across the Storage classes

S3 Standard S3 Standard S3 One S3 Glacier


IA Zone-IA

Designed for 99.99999999% 99.99999999% 99.99999999% 99.99999999%


durability

Designed for 99.99% 99.9% 99.5% N/A


availability

Availability 99.9% 99% 99% N/A


SLA

Availability >=3 >=3 1 >=3


zones

Minimum N/A 128KB 128KB 40KB


capacity
charge per
Minimum N/A 30 days 30 days 90 days
storage
duration
Retrieval fee N/A per GB per GB per GB
retrieved retrieved retrieved

First byte milliseconds milliseconds milliseconds Select minutes


latency or hours
Storage type Object Object Object Object

Lifecycle Yes Yes Yes Yes


transitions

5.24 S3 Lifecycle Rules

Lifecycle Management is used so that objects are stored cost-effectively throughout their lifecycle.
A lifecycle configuration is a set of rules that define the actions applied by S3 to a group of objects.
The lifecycle defines two types of actions:
o Transition actions: When you define the transition to another storage class. For example, you
choose to transit the objects to Standard IA storage class 30 days after you have created them or
archive the objects to the Glacier storage class 60 days after you have created them.
o Expiration actions: You need to define when objects expire, the Amazon S3 deletes the expired
object on your behalf.

Suppose business generates a lot of data in the form of test files, images, audios or videos and the data is
relevant for 30 days only. After that, you might want to transition from standard to standard IA as storage
cost is lower. After 60 days, you might want to transit to Glacier storage class for the longtime archival.
Perhaps you want to expire the object after 60 days completely, so Amazon has a service known as
Lifecycle Management, and this service exist within S3 bucket.

Lifecycle policies:
o Use Lifecycle rules to manage your object: You can manage the Lifecycle of an object by using a
Lifecycle rule that defines how Amazon S3 manages objects during their lifetime.
o Automate transition to tiered storage: Lifecycle allows you to transition objects to Standard IA
storage class automatically and then to the Glacier storage class.
o Expire your objects: Using Lifecycle rule, you can automatically expire your objects.

5.25 Lab - S3 Lifecycle Rules

Creation of Lifecycle rule


o Sign in to the AWS Management console.
o Click on the S3 service
o Create a new bucket in S3.
o Enter the bucket name and then click on the Next button.
o Now, you can configure the options, i.e., you can set the versioning, server access logging, etc. I
leave all the settings as default and then click on the Next button.

o Set the permissions. I leave all the permissions as default and then click on the Next button.
o Click on the Create bucket button.
o Finally, the new bucket is created whose name is "javatpointlifecycle".

o Click on the javatpointlifecycle bucket.


From the above screen, we observe that the bucket is empty. Before uploading the objects in a bucket, we
first create the policy.

o Move to the Management tab; we use the lifecycle.

o Add Lifecycle rule and then enter the rule name. Click on the Next.
o You can create the storage class transition in both the current version and the previous version.
Initially, I create the transition in the current version. Check the current version and then click on
the Add transition.

First transition: 30 days after the creation of an object, object's storage class is converted to Standard
Infrequently access storage class.

o Similarly, we can do with the previous version objects. Check the "previous version" and
then "Add transitions". Click on the Next.
o Now, we expire the object after its creation. Suppose we expire the current and previous version
objects after 425 days of its creation. Click on the Next.

o The Lifecycle rule is shown given below:


o Click on the Save.

The above screen shows that "Lifecyclerule" has been created.

5.26 S3 Analytics

By using Amazon S3 analytics -Storage Class Analysis you can analyze storage access patterns to help
you decide when to transition the right data to the right storage class. This new Amazon S3 analytics feature
observes data access patterns to help you determine when to transition less frequently accessed
STANDARD storage to the STANDARD_IA (IA, for infrequent access) storage class.
After storage class analysis observes the infrequent access patterns of a filtered set of data over a period of
time, you can use the analysis results to help you improve your lifecycle configurations. You can configure
storage class analysis to analyze all the objects in a bucket. Or, you can configure filters to group objects
together for analysis by common prefix (that is, objects that have names that begin with a common string),
by object tags, or by both prefix and tags. You'll most likely find that filtering by object groups is the best
way to benefit from storage class analysis.
Important : Storage class analysis only provides recommendations for Standard to Standard IA classes.
You can have multiple storage class analysis filters per bucket, up to 1,000, and will receive a separate
analysis for each filter. Multiple filter configurations allow you analyze specific groups of objects to
improve your lifecycle configurations that transition objects to STANDARD_IA.
Storage class analysis provides storage usage visualizations in the Amazon S3 console that are updated
daily. You can also export this daily usage data to an S3 bucket and view them in a spreadsheet application,
or with business intelligence tools, like Amazon QuickSight.

5.27 S3 Performance

AWS S3 provides a great performance. It automatically scales to high request rates, with a very low latency
of 100–200 milliseconds.
Your application can achieve at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests
per second per prefix in a bucket. And there is no limit in prefix number. A prefix is nothing but an object
path. For example for bucket /folder1/sub1/file -> prefix is folder1/sub1.

Let’s talk about improving S3 performance more. Below are the ways available to achieve higher AWS S3
performance:

Multi-Part Upload:
For files, greater in size than 5GB, it is mandatory to use multi-part upload. But for file greater than
100MB, it has been recommended to use it as well. What does multi-part upload do? It helps in parallelize
upload, hence, speeding up transfers.

Multi-Part Upload

S3 Transfer Acceleration:
S3 Transfer Acceleration is for both upload and download. It increases transfer speed by transferring the
file to an AWS edge location which will forward the data to the S3 bucket in the target region. With this, it is
compatible with the multi-part upload. For transferring to Edge location, it uses public network and then from
Edge Location to S3 bucket, it uses private AWS network which is very fast. Hence, it reduces the use of
public network and maximizes the use of AWS private network to improve S3 performance.
S3 Transfer Acceleration

To implement this, just open up the s3 console, scroll to your bucket of choice. Click on the Properties Tab.
Then find the transfer acceleration tab from there. All you have to do is hit enabled and save.

Properties Tab

Transfer acceleration tab

S3 Byte-Range Fetches:
How about reading a file in the most efficient way? AWS has an amazing option called S3 Byte-Range
Fetches to do so. It is to parallelize GETs by requesting a specific byte range. In case of failure, it has better
resilience. Hence, it could be used to speed up downloads.

S3 Byte-Range Fetches

The second use-case in which S3 Byte-Range Fetches could be used is to retrieve only partial data. For
example, when you know the first XX bytes is the header of a file, in this case, it could be used.
S3 Select:
Apart from this, we can also use S3 select. It retrieves fewer data using SQL by performing server-side
filtering. This results in less network transfer, therefore, less CPU cost client-side.

5.28 S3 Event Notifications

You can use the Amazon S3 Event Notifications feature to receive notifications when certain
events happen in your S3 bucket. To enable notifications, add a notification configuration that
identifies the events that you want Amazon S3 to publish. Make sure that it also identifies the
destinations where you want Amazon S3 to send the notifications. You store this configuration
in the notification subresource that's associated with a bucket.

Overview of Amazon S3 Event Notifications


Currently, Amazon S3 can publish notifications for the following events:
 New object created events
 Object removal events
 Restore object events
 Reduced Redundancy Storage (RRS) object lost events
 Replication events
 S3 Lifecycle expiration events
 S3 Lifecycle transition events
 S3 Intelligent-Tiering automatic archival events
 Object tagging events
 Object ACL PUT events

Amazon S3 can send event notification messages to the following destinations. You specify the Amazon
Resource Name (ARN) value of these destinations in the notification configuration.
 Amazon Simple Notification Service (Amazon SNS) topics
 Amazon Simple Queue Service (Amazon SQS) queues
 AWS Lambda function

5.29 Athena Overview

Amazon Athena is a serverless, interactive query service that allows you to analyze data in Amazon S3
using SQL. It enables you to analyze large amounts of data stored in S3 with a pay-per-query pricing model,
making it cost-effective for querying data sets that are infrequently accessed. Athena supports several data
formats including CSV(Comma Separated Value), JSON(JavaScript Object Notation), ORC(Optimized Row
Columnar), Avro, and Parquet. It also integrates with other AWS services such as Amazon QuickSight for
data visualization, and AWS Glue for data cataloging. With Athena, you can query data in S3 without the
need to move or load the data into a separate data store, making it easy to analyze large amounts of data
stored in S3.
Features of Athena
Some of its features include:
 Serverless: Athena is a fully managed service, so there is no infrastructure to provision or manage.
 Interactive querying: Athena allows you to run ad-hoc queries and get results in seconds.
 Standard SQL: Athena supports standard SQL, making it easy for users who are familiar with SQL to
get started.
 Scalable: Athena can handle large amounts of data and concurrent queries, and automatically scales up
and down based on query demand.
 Integrations: Athena integrates with other AWS services such as Amazon QuickSight, Amazon
Redshift Spectrum, and AWS Glue.
 Low cost: Athena charges only for the amount of data scanned per query, so you pay only for what
you use.
 Secure: Athena encrypts data at rest and in transit, and integrates with AWS Identity and Access
Management (IAM) for fine-grained access control.

5.30 Lab - Athena

AWS Athena is a serverless, interactive query service that allows you to analyze data in Amazon S3 using
standard SQL. Here is an example of how to use Athena to query data stored in S3:
1. Create an S3 bucket and upload your data files to it.
2. Create a new table in Athena using the CREATE TABLE statement. You will need to specify the S3
bucket location and the format of your data. For example:
CREATE EXTERNAL TABLE mydatabase.mytable
(col1 INT, col2 STRING, col3 DOUBLE)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3://mybucket/data/'
3. Run a query against your table using the SELECT statement. For example:
SELECT col1, col2, col3 FROM mydatabase.mytable WHERE col1 > 10;
4. Athena will return the results of your query, which can then be displayed or saved to a new table.

You can also use AWS Glue Crawlers to create, update and delete the table in Athena automatically.
AWS Athena charges for the amount of data scanned by each query, so it is recommended to use query
optimization techniques, such as partitioning and compressing data, to minimize the amount of data scanned
and reduce costs.

5.31 AWS Snow Family Overview

AWS Snow Family is a group of devices that transport data in and out of AWS. AWS Snow Family
devices are physical devices.They can transfer up to exabytes of data.

One exabyte is 1 000 000 000 000 megabytes.


AWS Snow Family include three device types:
 AWS Snowcone
 AWS Snowball
 AWS Snowmobile

1.AWS Snowcone
AWS Snowcone is a secure and small device.It transfers data.It is made out of 8 TB of storage space, 4
GB of memory, and 2 CPUs.

2. AWS Snowball
AWS Snowball has 2 types of devices, described in the table below.
Snowball Edge Storage Optimized devices Snowball Edge Compute Optimized devices

Great for large-scale data migrations Great for services that require a large amount of
computing resources.

Have 80 TB of HDD storage space for object Have 42 TB of HDD storage for object storage, and
storage 7.68 TB of NVMe SSD storage space for AWS EBS
block volumes.

Have 1 TB of SSD storage for block volumes Work on 208 Gib of memory and 52 vCPUs.

3.AWS Snowmobile
AWS Snowmobile moves large amounts of data to AWS. It can transfer up to 100 petabytes of data
One petabyte is 1 000 000 000 megabytes.

5.32 Amazon FSx

Amazon FSx makes it easy and cost effective to launch, run, and scale feature-rich, high-performance file
systems in the cloud. It supports a wide range of workloads with its reliability, security, scalability, and
broad set of capabilities. Amazon FSx is built on the latest AWS compute, networking, and disk
technologies to provide high performance and lower TCO.( Total Cost of Ownership).TCO is the
combination of the buying price (purchase price) of a product and the total cost of operation.) And as a fully
managed service, it handles hardware provisioning, patching, and backups -- freeing you up to focus on your
applications, your end users, and your business.

Amazon FSx lets you choose between four widely-used file systems: NetApp ONTAP, OpenZFS,
Windows File Server, and Lustre. This choice is typically based on your familiarity with a given file system
or by matching the file system's feature sets, performance profiles, and data management capabilities to the
requirements of your workload.

5.33 Lab - Amazon FSx

To create your Amazon FSx file system, you must create your Amazon Elastic Compute Cloud (Amazon
EC2) instance and the AWS Directory Service directory. If you don't have that set up already.

To create your first file system


1. Open the Amazon FSx console at https://console.aws.amazon.com/fsx/.
2. On the dashboard, choose Create file system to start the file system creation wizard.
3. On the Select file system type page, choose FSx for Windows File Server, and then choose Next.
The Create file system page appears.
4. In the File system details section, provide a name for your file system. It's easier to find and manage
your file systems when you name them. You can use a maximum of 256 Unicode letters, white space, and
numbers, plus the special characters + - = . _ : /
The following image shows all of the configuration options available in the File system details section.

5. For Deployment type choose Multi-AZ or Single-AZ.


 Choose Multi-AZ to deploy a file system that is tolerant to Availability Zone unavailability. This
option supports SSD and HDD storage.
 Choose Single-AZ to deploy a file system that is deployed in a single Availability Zone. Single-AZ
2 is the latest generation of single Availability Zone file systems, and it supports SSD and HDD storage.
6. For Storage type, you can choose either SSD or HDD.
FSx for Windows File Server offers solid state drive (SSD) and hard disk drive (HDD) storage
types. SSD storage is designed for the highest-performance and most latency-sensitive workloads, including
databases, media processing workloads, and data analytics applications. HDD storage is designed for a
broad spectrum of workloads, including home directories, user and departmental file shares, and content
management systems..
7. For Provisioned SSD IOPS, you can choose either Automatic or User-provisioned mode.
If you choose Automatic mode, FSx for Windows File Server automatically scales your SSD IOPS to
maintain 3 SSD IOPS per GiB of storage capacity. If you choose User-provisioned mode, enter any whole
number in the range of 96–350,000. Scaling SSD IOPS above 80,000 is available in US East (N. Virginia),
US West (Oregon), US East (Ohio), Europe (Ireland), Asia Pacific (Tokyo), and Asia Pacific (Singapore).
8. For Storage capacity, enter the storage capacity of your file system, in GiB. If you're using SSD
storage, enter any whole number in the range of 32–65,536. If you're using HDD storage, enter any whole
number in the range of 2,000–65,536. You can increase the amount of storage capacity as needed at any
time after you create the file system.
9. Keep Throughput capacity at its default setting. Throughput capacity is the sustained speed at which
the file server that hosts your file system can serve data. The Recommended throughput capacity setting is
based on the amount of storage capacity you choose. If you need more than the recommended throughput
capacity, choose Specify throughput capacity, and then choose a value.
10. In the Network & security section, choose the Amazon VPC that you want to associate with your file
system. For this getting started exercise, choose the same Amazon VPC that you chose for your AWS
Directory Service directory and your Amazon EC2 instance.
11. For VPC Security Groups, the default security group for your default Amazon VPC is already added to
your file system in the console. If you're not using the default security group, make sure that the security
group you choose is in the same AWS Region as your file system. You will also need to add the following
rules to your chosen security group:
a. Add the following inbound and outbound rules to allow the following ports.

Rules Ports
UDP 53, 88, 123, 389, 464
TCP 53, 88, 135, 389, 445, 464, 636, 3268, 3269, 5985, 9389, 49152-65535
b. Add from and to IP addresses or security group IDs associated with the client compute instances that you
want to access your file system from.
c. Add outbound rules to allow all traffic to the Active Directory that you're joining your file system to. To
do this, do one of the following:
 Allow outbound traffic to the security group ID associated with your AWS Managed AD directory.
 Allow outbound traffic to the IP addresses associated with your self-managed Active Directory
domain controllers.
If you have a Multi-AZ deployment (see step 5), choose a Preferred subnet value for the primary file
server and a Standby subnet value for the standby file server. A Multi-AZ deployment has a
primary and a standby file server, each in its own Availability Zone and subnet.
For Windows authentication, you have the following options:
If you want to join your file system to a Microsoft Active Directory domain that is managed by AWS,
choose AWS Managed Microsoft Active Directory, and then choose your AWS Directory Service
directory from the list.
If you want to join your file system to a self-managed Microsoft Active Directory domain, choose Self-
managed Microsoft Active Directory, and provide the following details for your Active Directory.
 The fully qualified domain name of your Active Directory.
Important : For Single-AZ 2 and all Multi-AZ file systems, the Active Directory domain name cannot
exceed 47 characters. This limitation applies to both AWS managed and self-managed Active Directory
domain names.
Amazon FSx requires a direct connection or internal traffic to your DNS IP Address. Connection via an
internet gateway is not supported. Instead, use a VPN, VPC peering, Direct Connect or a transit gateway
association.
 DNS server IP addresses—the IPv4 addresses of the DNS servers for your domain
 Service account username—the user name of the service account in your existing Active Directory.
Do not include a domain prefix or suffix.
 Service account password—the password for the service account.
 Confirm password—the password for the service account.
For Encryption, keep the default Encryption key setting of aws/fsx (default).
For Auditing - optional, file access auditing is disabled by default..
For Access - optional, enter any DNS aliases that you want to associate with the file system. Each alias
name must be formatted as a fully qualified domain name (FQDN).
For Backup and maintenance - optional, keep the default settings.
For Tags - optional, enter a key and value to add tags to your file system. A tag is a case-sensitive key-
value pair that helps you manage, filter, and search for your file system.
Choose Next.
Review the file system configuration shown on the Create file system page. For your reference, note
which file system settings you can modify after file system is created. Choose Create file system.
After Amazon FSx creates the file system, choose the file system ID in the File Systems dashboard.
Choose Attach, and note the fully qualified domain name for your file system. You will need it in a
later step.

5.34 Storage Gateway Overview

AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually
unlimited cloud storage. You can use Storage Gateway to simplify storage management and reduce costs for
key hybrid cloud storage use cases. These include moving backups to the cloud, using on-premises file
shares backed by cloud storage, and providing low-latency access to data in AWS for on-premises
applications.
To support these use cases, the service provides four different types of gateways that seamlessly connect
on-premises applications to cloud storage, caching data locally for low-latency access:
1. Tape Gateway
2. File Gateway
3. File Gateway - FSx
4. Volume Gateway

1. Tape Gateway enables you to replace using physical tapes on premises with virtual tapes in AWS
without changing existing backup workflows. Tape Gateway supports all leading backup
applications and caches virtual tapes on premises for low-latency data access.
2. File gateway helps in storing a file interface inside Amazon S3. It combines a service and a virtual
software appliance. With the help of this, objects can be stored and retrieved from Amazon S3 using
industry-standard file protocols like Network File System (NFS), AND Server Message Block
(SMB).
3. FSx File Gateway optimizes on-premises access to fully managed, highly reliable file shares in
Amazon FSx for Windows File Server. Customers with unstructured or file data, whether from
SMB-based group shares, or business applications, may require on-premises access to meet low-
latency requirements. Amazon FSx File Gateway helps accelerate your file-based storage
migration to the cloud to enable faster performance, improved data protection, and reduced cost.
4. Volume Gateway offers cloud-backed storage to your on-premises applications using industry
standard iSCSI(Internet Small Computer System Interface) connectivity. You don't need to rewrite
your on-premises applications to use cloud storage. You can deploy Volume Gateway as a virtual
machine or on the Storage Gateway Hardware Appliance at your premises.

5.35 Lab - Storage Gateway

In this section, you can find instructions on how to create, deploy, and activate a File Gateway in AWS
Storage Gateway.
Topics

 Set up an Amazon S3 File Gateway


 Connect your Amazon S3 File Gateway to AWS
 Review settings and activate your Amazon S3 File Gateway
 Configure your Amazon S3 File Gateway

Set up an Amazon S3 File Gateway


To set up a new S3 File Gateway
1. Open the AWS Management Console at https://console.aws.amazon.com/storagegateway/home/, and
choose the AWS Region where you want to create your gateway.
2. Choose Create gateway to open the Set up gateway page.
3. In the Gateway settings section, do the following:
a.For Gateway name, enter a name for your gateway. After your gateway is created, you can search for
this name to find your gateway on the list pages in the AWS Storage Gateway console.
b. For Gateway time zone, choose the local time zone for the part of the world where you want
to deploy your gateway.
4. In the Gateway options section, for Gateway type, choose Amazon S3 File Gateway.
5. In the Platform options section, do the following:
a.For Host platform, choose the platform on which you want to deploy your gateway. Then follow the
platform-specific instructions displayed on the Storage Gateway console page to set up your host platform.
You can choose from the following options:
 VMware ESXi – Download, deploy, and configure the gateway virtual machine using VMware
ESXi.
 Microsoft Hyper-V – Download, deploy, and configure the gateway virtual machine using Microsoft
Hyper-V.
 Linux KVM – Download, deploy, and configure the gateway virtual machine using Linux Kernel-
based Virtual Machine (KVM).
 Amazon EC2 – Configure and launch an Amazon EC2 instance to host your gateway.
 Hardware appliance – Order a dedicated physical hardware appliance from AWS to host your
gateway.
b. For Confirm set up gateway, select the check box to confirm that you performed the
deployment steps for the host platform that you chose. This step is not applicable for the Hardware
appliance host platform.
6. Now that your gateway is set up, you must choose how you want it to connect and communicate with
AWS. Choose Next to proceed.

Connect your Amazon S3 File Gateway to AWS


To connect a new S3 File Gateway to AWS
1. If you have not done so already, complete the procedure described in Set up an Amazon S3 File
Gateway. When finished, choose Next to open the Connect to AWS page in the AWS Storage Gateway
console.
2. In the Endpoint options section, for Service endpoint, choose the type of endpoint that your gateway
will use to communicate with AWS. You can choose from the following options:
 Publicly accessible – Your gateway communicates with AWS over the public internet. If you select this
option, use the FIPS enabled endpoint check box to specify whether the connection must comply with
Federal Information Processing Standards (FIPS).
 VPC hosted – Your gateway communicates with AWS through a private connection with your virtual
private cloud (VPC), allowing you to control your network settings. If you select this option, you must
specify an existing VPC endpoint by choosing its VPC endpoint ID from the dropdown list. You can also
provide its VPC endpoint Domain Name System (DNS) name or IP address.
3. In the Gateway connection options section, for Connection options, choose how to identify your
gateway to AWS. You can choose from the following options:
 IP address – Provide the IP address of your gateway in the corresponding field. This IP address must be
public or accessible from within your current network, and you must be able to connect to it from your
web browser.You can obtain the gateway IP address by logging into the gateway's local console from
your hypervisor client, or by copying it from your Amazon EC2 instance details page.
 Activation key – Provide the activation key for your gateway in the corresponding field. You can
generate an activation key using the gateway's local console. If your gateway's IP address is unavailable,
choose this option.
4. Now that you have chosen how you want your gateway to connect to AWS, you must activate the
gateway. Choose Next to proceed.

Review settings and activate your Amazon S3 File Gateway


To review settings and activate a new S3 File Gateway
1. If you have not done so already, complete the procedures described in the following topics:
 Set up an Amazon S3 File Gateway
 Connect your Amazon S3 File Gateway to AWS
When finished, choose Next to open the Review and activate page in the AWS Storage Gateway console.
2. Review the initial gateway details for each section on the page.
3. If a section contains errors, choose Edit to return to the corresponding settings page and make changes.
Important : You cannot modify the gateway options or connection settings after your gateway is activated.
4. Now that you have activated your gateway, you must perform the first-time configuration to allocate
local storage disks and configure logging. Choose Next to proceed.

Configure your Amazon S3 File Gateway


To perform the first-time configuration on a new S3 File Gateway
1. If you have not done so already, complete the procedures described in the following topics:
 Set up an Amazon S3 File Gateway
 Connect your Amazon S3 File Gateway to AWS
 Review settings and activate your Amazon S3 File Gateway
When finished, choose Next to open the Configure gateway page in the AWS Storage Gateway console.
2. In the Configure storage section, use the dropdown lists to allocate at least one local disk with at least
150 gibibytes (GiB) capacity to Cache. The local disks listed in this section correspond to the physical
storage that you provisioned on your host platform.
3. In the CloudWatch log group section, choose how to set up Amazon CloudWatch Logs to monitor the
health of your gateway. You can choose from the following options:
 Create a new log group – Set up a new log group to monitor your gateway.
 Use an existing log group – Choose an existing log group from the corresponding dropdown
list.
 Deactivate logging – Do not use Amazon CloudWatch Logs to monitor your gateway.
4. In the CloudWatch alarms section, choose how to set up Amazon CloudWatch alarms to notify you
when your gateway's metrics deviate from defined limits. You can choose from the following options:
 Create Storage Gateway's recommended alarms – Create all recommended CloudWatch
alarms automatically when the gateway is created.
 Create a custom alarm – Configure a new CloudWatch alarm to notify you about your gateway's
metrics. Choose Create alarm to define metrics and specify alarm actions in the Amazon CloudWatch
console.
 No alarm – Don't receive CloudWatch notifications about your gateway's metrics.
5. Choose Configure to finish creating your gateway.
To check the status of your new gateway, search for it on the Gateway overview page of the AWS
Storage Gateway console.

5.36 - AWS Transfer Family

AWS Transfer Family is a secure transfer service that enables you to transfer files into and out of AWS
storage services. Transfer Family is part of the AWS Cloud platform:
AWS Transfer Family supports transferring data from or to the following AWS storage services.
 Amazon Simple Storage Service (Amazon S3) storage.
 Amazon Elastic File System (Amazon EFS) Network File System (NFS) file systems.
AWS Transfer Family supports transferring data over the following protocols:
 Secure Shell (SSH) File Transfer Protocol (SFTP): version 3
 File Transfer Protocol Secure (FTPS)

 File Transfer Protocol (FTP)

 Applicability Statement 2 (AS2)


File transfer protocols are used in data exchange workflows across different industries such as financial
services, healthcare, advertising, and retail, among others. Transfer Family simplifies the migration of file
transfer workflows to AWS.
The following are some common use cases for using Transfer Family with Amazon S3:
 Data lakes in AWS for uploads from third parties such as vendors and partners.
 Subscription-based data distribution with your customers.
 Internal transfers within your organization.
The following are some common use cases for using Transfer Family with Amazon EFS:
 Data distribution
 Supply chain
 Content management
 Web serving applications
The following are some common use cases for using Transfer Family with AS2:
Workflows with compliance requirements that rely on having data protection and security features built
into the protocol
 Supply chain logistics
 Payments workflows
 Business-to-business (B2B) transactions
 Integrations with enterprise resource planning (ERP) and customer relationship management (CRM)
systems
With Transfer Family, you get access to a file transfer protocol-enabled server in AWS without the need to
run any server infrastructure. You can use this service to migrate your file transfer-based workflows to AWS
while maintaining your end users' clients and configurations as is. You first associate your hostname with
the server endpoint, then add your users and provision them with the right level of access. After you do this,
your users' transfer requests are serviced directly out of your Transfer Family server endpoint.
Transfer Family provides the following benefits:
 A fully managed service that scales in real time to meet your needs.
 You don't need to modify your applications or run any file transfer protocol infrastructure.
 With your data in durable Amazon S3 storage, you can use native AWS services for processing,
analytics, reporting, auditing, and archival functions.
With Amazon EFS as your data store, you get a fully managed elastic file system for use with AWS
Cloud services and on-premises resources. Amazon EFS is built to scale on demand to petabytes without
disrupting applications, growing and shrinking automatically as you add and remove files. This helps
eliminate the need to provision and manage capacity to accommodate growth.
A fully managed, serverless File Transfer Workflow service that makes it easy to set up, run, automate,
and monitor processing of files uploaded using AWS Transfer Family.
There are no upfront costs, and you pay only for the use of the service.

5.37 - Compare AWS Storage options


Elastic File Storage (EFS), Elastic Block Storage (EBS), and Simple Storage Service (S3) are AWS’s
three different storage types that can be used for different types of workload needs.

You might also like