Week 5 Cloud Storage Service
Week 5 Cloud Storage Service
Q.Why is aws cloud storage important? Or Why aws cloud storage is required?
Cost effectiveness
With cloud storage, there is no hardware to purchase, no storage to provision, and no extra capital being
used for business spikes. You can add or remove storage capacity on demand and only pay for storage that
you actually use. As data becomes infrequently and rarely accessed, you can even automatically move it to
lower-cost storage, thus creating even more cost savings.
Increased agility
With cloud storage, resources are only a click away. You reduce the time to make those resources
available to your organization from weeks to just minutes. This results in a dramatic increase in agility for
your organization. Your staff is largely freed from the tasks of procurement, installation, administration, and
maintenance.
Faster deployment
Cloud storage services allow IT to quickly deliver the exact amount of storage needed, whenever and
wherever it's needed. Your developers can focus on solving complex application problems instead of having
to manage storage systems.
Business continuity
Cloud storage providers store your data in highly secure data centers, protecting your data and ensuring
business continuity.You can further protect your data by using versioning and replication tools to more
easily recover from both unintended user actions or application failures.
What are the types of cloud storage?
There are three main cloud storage types: object storage, file storage, and block storage. Each offers its
own advantages and has its own use cases.
1.Object storage
Organizations have to store a massive and growing amount of unstructured data, such as photos, videos,
machine learning (ML), sensor data, audio files, and other types of web content. Object storage is a data
storage architecture for large stores of unstructured data. Objects store data in the format it arrives in and
makes it possible to customize metadata in ways that make the data easier to access and analyze. Instead of
being organized in files or folder hierarchies, objects are kept in secure buckets that deliver virtually
unlimited scalability. It is also less costly to store large data volumes.
2.File storage
File-based storage or file storage is widely used among applications and stores data in a hierarchical folder
and file format. This type of storage is often known as a network-attached storage (NAS) server with
common file level protocols of Server Message Block (SMB) used in Windows instances and Network File
System (NFS) found in Linux.
3.Block storage
Enterprise applications like databases or enterprise resource planning (ERP) systems often require
dedicated, low-latency storage for each host. This is analogous to direct-attached storage (DAS) or a storage
area network (SAN). In this case, you can use a cloud storage service that stores data in the form of blocks.
Each block has its own unique identifier for quick storage and retrieval.
You can have folders within folders, but not buckets within buckets. You can upload and copy objects
directly into a folder. Folders can be created, deleted, and made public, but they cannot be renamed. Objects
can be copied from one folder to another.
This section describes how to use the Amazon S3 console to create a folder.
To create a folder
1. Sign in to the AWS Management Console and open the Amazon S3 console
at https://console.aws.amazon.com/s3/.
2. In the left navigation pane, choose Buckets.
3. In the Buckets list, choose the name of the bucket that you want to create a folder
in.
4. If your bucket policy prevents uploading objects to this bucket without encryption,
you must choose Enable under Server-side encryption.
5. Choose Create folder.
6. Enter a name for the folder (for example, favorite-pics). Then choose Create folder.
This section describes how to use the Amazon S3 console to calculate a folder's size.
To calculate a folder's size
1. Sign in to the AWS Management Console and open the Amazon S3 console
at https://console.aws.amazon.com/s3/.
2. In the left navigation pane, choose Buckets.
3. In the Buckets list, choose the name of the bucket in which your folder is stored.
4. In the Objects list, select the check box next to the name of the folder.
5. Choose Actions, and then choose Calculate total size.
Deleting folders
To delete folders from an S3 bucket
1. Sign in to the AWS Management Console and open the Amazon S3 console
at https://console.aws.amazon.com/s3/.
2. In the Buckets list, choose the name of the bucket that you want to delete folders from.
3. In the Objects list, select the check box next to the folders and objects that you want to delete.
4. Choose Delete.
5. On the Delete objects page, verify that the names of the folders you selected for deletion are listed.
6. In the Delete objects box, enter delete, and choose Delete objects.
5.4 S3 Versioning
Versioning in Amazon S3 is a means of keeping multiple variants of an object in the same bucket. You
can use the S3 Versioning feature to preserve, retrieve, and restore every version of every object stored in
your buckets. With versioning you can recover more easily from both unintended user actions and
application failures. After versioning is enabled for a bucket, if Amazon S3 receives multiple write requests
for the same object simultaneously, it stores all of those objects.
Versioning-enabled buckets can help you recover objects from accidental deletion or overwrite. For
example, if you delete an object, Amazon S3 inserts a delete marker instead of removing the object
permanently. The delete marker becomes the current object version. If you overwrite an object, it results in a
new object version in the bucket. You can always restore the previous version.
By default, S3 Versioning is disabled on buckets, and you must explicitly enable it. Buckets can be in one
of three states:
1. Unversioned (the default)
2. Versioning-enabled
3. Versioning-suspended
You enable and suspend versioning at the bucket level. After you version-enable a bucket, it can never
return to an unversioned state. But you can suspend versioning on that bucket.
The versioning state applies to all (never some) of the objects in that bucket. When you enable versioning
in a bucket, all new objects are versioned and given a unique version ID. Objects that already existed in the
bucket at the time versioning was enabled will thereafter always be versioned and given a unique version ID
when they are modified by future requests.
5.5 Lab - S3 Versioning
You can use S3 Versioning to keep multiple versions of an object in one bucket. This section provides
examples of how to enable versioning on a bucket using the console, REST API, AWS SDKs, and AWS
Command Line Interface (AWS CLI).
Note : If you enable versioning on a bucket for the first time, it might take a short amount of time for the
change to be fully propagated. We recommend that you wait for 15 minutes after enabling versioning before
issuing write operations (PUT or DELETE) on objects in the bucket.
To learn more about how to use S3 Versioning to protect data, see Tutorial: Protecting data on Amazon S3
against accidental deletion or application bugs using S3 Versioning, S3 Object Lock, and S3 Replication
Each S3 bucket that you create has a versioning subresource associated with it. By default, your bucket is
unversioned, and the versioning subresource stores the empty versioning configuration, as follows.
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
</VersioningConfiguration>
To enable versioning, you can send a request to Amazon S3 with a versioning configuration that includes a
status.
<VersioningConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Status>Enabled</Status>
</VersioningConfiguration>
To suspend versioning, you set the status value to Suspended.
The bucket owner and all authorized users can enable versioning. The bucket owner is the AWS account
that created the bucket (the root account). For more information about permissions, see Identity and access
management in Amazon S3.
The following sections provide more detail about enabling S3 Versioning using the console & AWS CLI.
Using the S3 console
Follow these steps to use the AWS Management Console to enable versioning on an S3 bucket.
To enable or disable versioning on an S3 bucket
1. Sign in to the AWS Management Console and open the Amazon S3 console at
https://console.aws.amazon.com/s3/
2. In the Buckets list, choose the name of the bucket that you want to enable versioning for.
3. Choose Properties.
4. Under Bucket Versioning, choose Edit.
5. Choose Suspend or Enable, and then choose Save changes.
Using the AWS CLI
The following example enables versioning on an S3 bucket.
aws s3api put-bucket-versioning --bucket DOC-EXAMPLE-BUCKET1 --versioning-configuration
Status=Enabled
The following example enables S3 Versioning and multi-factor authentication (MFA) delete on a bucket.
aws s3api put-bucket-versioning --bucket DOC-EXAMPLE-BUCKET1 --versioning-configuration
Status=Enabled,MFADelete=Enabled --mfa "SERIAL 123456"
5.6 S3 Encryption
Data encryption is a process for securing data by encoding information. Data is encoded using a password
or an encryption (cypher) key and special encryption algorithms. The encrypted data can then be accessed
by using the correct password or encryption (decryption) key. Data encryption is used to protect digital data
confidentiality even if an unauthorized person gains logical or physical access to that data. If an
unauthorized person gets access to the encrypted data, the data is unreadable without the key or password.
Amazon recommends the use of S3 encryption when storing data in Amazon S3 buckets. The first reason
for this recommendation is security. Encryption increases the level of security and privacy. However, there
is another reason for why data stored in the cloud should be encrypted. Amazon stores data of users from
different countries. Sometimes a country can request data be submitted for an investigation if a client or an
organization is suspected of violating the law. However, Amazon must respect the license agreement and
laws of other countries (countries whose citizens are Amazon customers) and a conflict can occur.
If a user’s data is encrypted and Amazon doesn’t have the encryption keys, the user’s data cannot be
provided to third party organizations or persons (even if the encrypted data is provided, it is a useless and
unreadable set of bits). Imagine a situation in which the USA requests data from a European Amazon
customer for investigation. What to do in this case? As you may already know, the personal data of
European citizens is protected by the General Data Protection Regulation (GDPR).
Amazon S3 Encryption Types
How does S3 encryption work? Amazon provides several encryption types for data stored in Amazon S3.
Is S3 encrypted? By default, data stored in an S3 bucket is not encrypted, but you can configure the AWS S3
encryption settings.
You should define which encryption method to use after answering the following questions:
Who encrypts and decrypts the data?
Who stores the secret key?
Who manages the secret key?
Let’s look at the available AWS encryption methods for S3 objects stored in a bucket.
Server-side encryption
Server-Side Encryption (SSE) is the simplest data encryption option. All heavy encryption operations are
performed on the server side in the AWS cloud. You send raw (unencrypted) data to AWS and then data is
encrypted on the AWS side when recorded on the cloud storage. When you need to get your data back,
Amazon reads the encrypted data, decrypts the needed data on the Amazon server side, and then sends the
unencrypted data to you over the network. This process is transparent for end-users.
SSE-S3 is the simplest method – the keys are managed and handled by AWS to encrypt the data you have
selected.
SSE-KMS is a slightly different method from SSE-S3. AWS Key Management Service (KMS) is used to
encrypt S3 data on the Amazon server side. The data key is managed by AWS, but a user manages the
customer master key (CMK) in AWS KMS.
Client-side encryption
When using S3 client-side encryption, the client is responsible for all encryption operations. In this case,
data is not encrypted by AWS but rather it is encrypted on the user’s side. Data encrypted in the user’s
datacenter is uploaded directly to AWS. Two options are provided for S3 client-side encryption – a master
key can be stored on the client side or on the server side. If a master key is stored on the client side, the
client takes full responsibility for encryption. The advantage of this approach is that Amazon never knows
the encryption keys of the user and data is never stored on Amazon servers in an unencrypted state. A user
encrypts data before sending data to Amazon S3 and decrypts data after retrieving it from Amazon S3.
3. Select your bucket or create a new bucket for which you want to configure encryption settings.
4. On the page with the bucket settings, click the Properties tab and then click Default encryption.
5. The encryption settings are now open. By default, S3 bucket encryption option is disabled.
6. Select the needed option, for example, AES-256. This is server-side encryption with Amazon S3-
managed keys (SSE-S3).You can view the bucket policy. Click Save to save the encryption settings
for the bucket. The settings will be used as the default S3 encryption settings for objects added to this
bucket in future.
.
7. Click Save.
8. Now default encryption is set. All new objects stored in the S3 bucket will be encrypted according to
the set configuration. It is recommended that you enable encryption when creating a bucket. You can
also enable encryption later at the bucket level. However, if you configure encryption settings later,
these setting won’t affect unencrypted files that have been already uploaded to the bucket.
9. If you want to select the AWS-KMS encryption, click the appropriate option. In this case, select a
key from the drop-down list.
Note: Currently this option is only available via use the AWS CLI, AWS SDK, or the Amazon S3 REST
API.
Note:- if you want to know regards to versioning please follow me on YouTube where i have covered
all detail.
Note:- Enabling versioning on the S3 buckets can be done using IAM users But activating and de-
activating MFA delete can only be done using Root account.
3- You need to pass root account MFA device serial number and current MFA token value. (Created a
separate CLI profile for my root account).
Once we execute above command , it will put the bucket into versioning mode with MFA Delete layer on
for that bucket.
Note:- The MFA delete is exclusively for versioned objects, which means that if you delete a file, it will
be deleted but all of its versions will be kept.
Let’s say you want to keep the versioning, but want to delete the MFA delete on the S3 bucket.
This can be done with the command listed below.
S3 provides a great way to easily host static websites. Many professionals want to have their portfolio or a
basic website for their business. AWS S3 will be an awesome fit for the same.
5.14 S3 CORS
Cross-origin resource sharing (CORS) defines a way for client web applications that are loaded in one
domain to interact with resources in a different domain. With CORS support, you can build rich client-side
web applications with Amazon S3 and selectively allow cross-origin access to your Amazon S3 resources.
This section provides an overview of CORS. The subtopics describe how you can enable CORS using the
Amazon S3 console, or programmatically by using the Amazon S3 REST API and the AWS SDKs.
The following are example use case scenarios for using CORS.
Scenario 1
Suppose that you are hosting a website in an Amazon S3 bucket named website as described in Hosting a
static website using Amazon S3. Your users load the website endpoint:
http://website.s3-website.us-east-1.amazonaws.com
Now you want to use JavaScript on the webpages that are stored in this bucket to be able to make
authenticated GET and PUT requests against the same bucket by using the Amazon S3 API endpoint for the
bucket, website.s3.us-east-1.amazonaws.com. A browser would normally block JavaScript from allowing
those requests, but with CORS you can configure your bucket to explicitly enable cross-origin requests from
website.s3-website.us-east-1.amazonaws.com.
Scenario 2
Suppose that you want to host a web font from your S3 bucket. Again, browsers require a CORS check
(also called a preflight check) for loading web fonts. You would configure the bucket that is hosting the web
font to allow any origin to make these requests.
Important
In the new S3 console, the CORS configuration must be JSON. For examples CORS configurations in
JSON and XML, see CORS configuration.
AWS S3 provides log information regarding access to buckets and their objects. AWS S3 logs record the
following: bucket owner, bucket, time, remote IP, requester, operation, request ID, request URI, key, error
code, bytes sent, HTTP status, total time, object size, turnaround time, user agent, referrer, host ID, version
ID, cipher suite, signature version, authentication type, TLS version, and host header.
When access to an object is requested, users can use this information to identify the origin of the requester.
You can check if unauthorized agents have accessed any resources or identify a resource with an unusually
high number of downloads. You can also determine whether the turnaround time for receiving a file is
within the expectations of applications and users. In addition, this information can help you understand how
an application has been used by showing the resource and version that has a request pending.
After you’ve created the buckets, go to the Properties of the bucket that will store the files to associate it
with the bucket for logs. On the Properties page, click on the Edit button in the Server access logging box.
In this form, select Enable to allow the bucket to provide log data about stored objects, then click
on Browse S3 to select the log bucket.
In the modal, select the proper bucket and click on Choose path. Back in the form, click on Save
changes to apply the association between the buckets. Clicking that button is all you need to do to start
saving object usage logs.
Increase operational efficiency – If you have compute clusters in two different AWS Regions that
analyze the same set of objects, you might choose to maintain object copies in those Regions.
Configure live replication between production and test accounts – If you or your customers have
production and test accounts that use the same data, you can replicate objects between those multiple
accounts, while maintaining object metadata.
Abide by data sovereignty laws – You might be required to store multiple copies of your data in
separate AWS accounts within a certain Region. Same-Region Replication can help you
automatically replicate critical data when compliance regulations don't allow the data to leave your
country.
The AWS S3 Replication process can be easily carried out by using Replication Rule .
Setting up AWS S3 Replication to another S3 bucket can be performed by adding a Replication rule to the
source bucket. In case you need to Replicate to a bucket belonging to a different account, you will need to
set up certain bucket policies at the destination bucket also. Let us begin adding a Replication rule.
Step 1: Sign in to the AWS S3 management console and choose the name of the bucket you want.
Step 2: Select Replication in the management section as below. And click Add rule.
Step 3: We will Replicate the whole bucket in this case. Choose the entire bucket as given below.
In case you choose to Replicate buckets encrypted using AWS Key management service, you will need to
select the correct key at this stage.
Step 4: The next step is to select the destination. Select buckets in this account using the radio button as
below.
In case you require Replicating to another account, select the other option. In this case, AWS will warn
you about the bucket policies that should exist at the other end, since it cannot verify them. You will be
provided with a bucket policy that you need to ensure at the destination.
Step5: If you need to change the storage class of the destination object, do it through the drop-down in
destination options as below.
You will also find a checkbox to enable Replication time control. This option ensures that 99.99 % of all
objects will be Replicated under a service level agreement of 15 minutes. Please note that this incurs
additional fees.
Step 7: Set the status of the Replication rule and click next to create the rule.
As soon as you create the rule with enabled status, the Replication will start working. You can go into your
destination bucket after a few minutes and ensure that the Replication is indeed working.
https://bucket.s3.region.amazonaws.com/Ninjafile.jpg?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-
Credential=random-aws-credential-to-identify-the-signer&X-Amz-Date=timestamp-of-generation-of-
url&X-Amz-Expires=validity-from-generation-timestamp&X-Amz-Signature=6zzca349-f6c2-38bd-98ce-
4bs464fb45cc&X-Amz-SignedHeaders=host
Parameters
Looking carefully at the above URL, we can see the following parameters. AWS Software Development
Kit automatically generates these.
X-AMZ-Algorithm: Specifies the encryption algorithm used for authentication in AWS requests.
X-AMZ-Credential: Contains the AWS access key and security token used to authenticate the request.
X-AMZ-Date: The date and time at which the request was made, formatted according to AWS
standards.
X-AMZ-Expires: Specifies the expiration time for the request, after which it is no longer valid.
X-AMZ-Signature: The cryptographic signature generated using the request data, credentials, and
specified algorithm, used for request authentication.
X-AMZ-SignedHeaders: Lists the headers included in the request that is part of the signature,
ensuring their integrity and authenticity.
When a user attempts to access S3 files using a Presigned URL, S3 validates the signature by computing it
with the provided credentials, including any optional SignedHeaders parameter. It then verifies the
signature's validity and checks if the link has expired before granting access to the requested resource.
Command
pip install boto3
Now type the following command in Python IDE to generate a Presigned URL:
Code
import boto3
AWS_S3_REGION = 'ap-south-1'
AWS_S3_BUCKET_NAME = "Ninja_s3_bucket"
AWS_S3_FILE_NAME = "Ninjafile.jpg"
PRESIGNED_URL_EXPIRY = 3600 # in seconds
if presigned_url:
print("Presigned URL: ", presigned_url)
Explanation
Let’s see what is happening in the above code:
The boto3 library is imported to interact with AWS services.
An S3 client is created using boto3.client() by passing in the necessary parameters such as the service
name ('s3'), AWS access key ID, AWS secret access key, and region name.
Then the generate_presigned_url method is invoked. The method is called with the operation name
('get_object') and a dictionary containing the parameters 'Bucket' (the S3 bucket name) and 'Key' (the
S3 object key/file name). The 'ExpiresIn' parameter specifies the duration for which the Presigned URL
will be valid.
1. S3 Standard
o Standard storage class stores the data redundantly across multiple devices in multiple facilities.
o It is designed to sustain the loss of 2 facilities concurrently.
o Standard is a default storage class if none of the storage class is specified during upload.
o It provides low latency and high throughput performance.
o It designed for 99.99% availability and 99.999999999% durability
2. S3 Standard IA
o IA stands for infrequently accessed.
o Standard IA storage class is used when data is accessed less frequently but requires rapid access
when needed.
o It has a lower fee than S3, but you will be charged for a retrieval fee.
o It is designed to sustain the loss of 2 facilities concurrently.
o It is mainly used for larger objects greater than 128 KB kept for at least 30 days.
o It provides low latency and high throughput performance.
o It designed for 99.99% availability and 99.999999999% durability
Lifecycle Management is used so that objects are stored cost-effectively throughout their lifecycle.
A lifecycle configuration is a set of rules that define the actions applied by S3 to a group of objects.
The lifecycle defines two types of actions:
o Transition actions: When you define the transition to another storage class. For example, you
choose to transit the objects to Standard IA storage class 30 days after you have created them or
archive the objects to the Glacier storage class 60 days after you have created them.
o Expiration actions: You need to define when objects expire, the Amazon S3 deletes the expired
object on your behalf.
Suppose business generates a lot of data in the form of test files, images, audios or videos and the data is
relevant for 30 days only. After that, you might want to transition from standard to standard IA as storage
cost is lower. After 60 days, you might want to transit to Glacier storage class for the longtime archival.
Perhaps you want to expire the object after 60 days completely, so Amazon has a service known as
Lifecycle Management, and this service exist within S3 bucket.
Lifecycle policies:
o Use Lifecycle rules to manage your object: You can manage the Lifecycle of an object by using a
Lifecycle rule that defines how Amazon S3 manages objects during their lifetime.
o Automate transition to tiered storage: Lifecycle allows you to transition objects to Standard IA
storage class automatically and then to the Glacier storage class.
o Expire your objects: Using Lifecycle rule, you can automatically expire your objects.
o Set the permissions. I leave all the permissions as default and then click on the Next button.
o Click on the Create bucket button.
o Finally, the new bucket is created whose name is "javatpointlifecycle".
o Add Lifecycle rule and then enter the rule name. Click on the Next.
o You can create the storage class transition in both the current version and the previous version.
Initially, I create the transition in the current version. Check the current version and then click on
the Add transition.
First transition: 30 days after the creation of an object, object's storage class is converted to Standard
Infrequently access storage class.
o Similarly, we can do with the previous version objects. Check the "previous version" and
then "Add transitions". Click on the Next.
o Now, we expire the object after its creation. Suppose we expire the current and previous version
objects after 425 days of its creation. Click on the Next.
5.26 S3 Analytics
By using Amazon S3 analytics -Storage Class Analysis you can analyze storage access patterns to help
you decide when to transition the right data to the right storage class. This new Amazon S3 analytics feature
observes data access patterns to help you determine when to transition less frequently accessed
STANDARD storage to the STANDARD_IA (IA, for infrequent access) storage class.
After storage class analysis observes the infrequent access patterns of a filtered set of data over a period of
time, you can use the analysis results to help you improve your lifecycle configurations. You can configure
storage class analysis to analyze all the objects in a bucket. Or, you can configure filters to group objects
together for analysis by common prefix (that is, objects that have names that begin with a common string),
by object tags, or by both prefix and tags. You'll most likely find that filtering by object groups is the best
way to benefit from storage class analysis.
Important : Storage class analysis only provides recommendations for Standard to Standard IA classes.
You can have multiple storage class analysis filters per bucket, up to 1,000, and will receive a separate
analysis for each filter. Multiple filter configurations allow you analyze specific groups of objects to
improve your lifecycle configurations that transition objects to STANDARD_IA.
Storage class analysis provides storage usage visualizations in the Amazon S3 console that are updated
daily. You can also export this daily usage data to an S3 bucket and view them in a spreadsheet application,
or with business intelligence tools, like Amazon QuickSight.
5.27 S3 Performance
AWS S3 provides a great performance. It automatically scales to high request rates, with a very low latency
of 100–200 milliseconds.
Your application can achieve at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests
per second per prefix in a bucket. And there is no limit in prefix number. A prefix is nothing but an object
path. For example for bucket /folder1/sub1/file -> prefix is folder1/sub1.
Let’s talk about improving S3 performance more. Below are the ways available to achieve higher AWS S3
performance:
Multi-Part Upload:
For files, greater in size than 5GB, it is mandatory to use multi-part upload. But for file greater than
100MB, it has been recommended to use it as well. What does multi-part upload do? It helps in parallelize
upload, hence, speeding up transfers.
Multi-Part Upload
S3 Transfer Acceleration:
S3 Transfer Acceleration is for both upload and download. It increases transfer speed by transferring the
file to an AWS edge location which will forward the data to the S3 bucket in the target region. With this, it is
compatible with the multi-part upload. For transferring to Edge location, it uses public network and then from
Edge Location to S3 bucket, it uses private AWS network which is very fast. Hence, it reduces the use of
public network and maximizes the use of AWS private network to improve S3 performance.
S3 Transfer Acceleration
To implement this, just open up the s3 console, scroll to your bucket of choice. Click on the Properties Tab.
Then find the transfer acceleration tab from there. All you have to do is hit enabled and save.
Properties Tab
S3 Byte-Range Fetches:
How about reading a file in the most efficient way? AWS has an amazing option called S3 Byte-Range
Fetches to do so. It is to parallelize GETs by requesting a specific byte range. In case of failure, it has better
resilience. Hence, it could be used to speed up downloads.
S3 Byte-Range Fetches
The second use-case in which S3 Byte-Range Fetches could be used is to retrieve only partial data. For
example, when you know the first XX bytes is the header of a file, in this case, it could be used.
S3 Select:
Apart from this, we can also use S3 select. It retrieves fewer data using SQL by performing server-side
filtering. This results in less network transfer, therefore, less CPU cost client-side.
You can use the Amazon S3 Event Notifications feature to receive notifications when certain
events happen in your S3 bucket. To enable notifications, add a notification configuration that
identifies the events that you want Amazon S3 to publish. Make sure that it also identifies the
destinations where you want Amazon S3 to send the notifications. You store this configuration
in the notification subresource that's associated with a bucket.
Amazon S3 can send event notification messages to the following destinations. You specify the Amazon
Resource Name (ARN) value of these destinations in the notification configuration.
Amazon Simple Notification Service (Amazon SNS) topics
Amazon Simple Queue Service (Amazon SQS) queues
AWS Lambda function
Amazon Athena is a serverless, interactive query service that allows you to analyze data in Amazon S3
using SQL. It enables you to analyze large amounts of data stored in S3 with a pay-per-query pricing model,
making it cost-effective for querying data sets that are infrequently accessed. Athena supports several data
formats including CSV(Comma Separated Value), JSON(JavaScript Object Notation), ORC(Optimized Row
Columnar), Avro, and Parquet. It also integrates with other AWS services such as Amazon QuickSight for
data visualization, and AWS Glue for data cataloging. With Athena, you can query data in S3 without the
need to move or load the data into a separate data store, making it easy to analyze large amounts of data
stored in S3.
Features of Athena
Some of its features include:
Serverless: Athena is a fully managed service, so there is no infrastructure to provision or manage.
Interactive querying: Athena allows you to run ad-hoc queries and get results in seconds.
Standard SQL: Athena supports standard SQL, making it easy for users who are familiar with SQL to
get started.
Scalable: Athena can handle large amounts of data and concurrent queries, and automatically scales up
and down based on query demand.
Integrations: Athena integrates with other AWS services such as Amazon QuickSight, Amazon
Redshift Spectrum, and AWS Glue.
Low cost: Athena charges only for the amount of data scanned per query, so you pay only for what
you use.
Secure: Athena encrypts data at rest and in transit, and integrates with AWS Identity and Access
Management (IAM) for fine-grained access control.
AWS Athena is a serverless, interactive query service that allows you to analyze data in Amazon S3 using
standard SQL. Here is an example of how to use Athena to query data stored in S3:
1. Create an S3 bucket and upload your data files to it.
2. Create a new table in Athena using the CREATE TABLE statement. You will need to specify the S3
bucket location and the format of your data. For example:
CREATE EXTERNAL TABLE mydatabase.mytable
(col1 INT, col2 STRING, col3 DOUBLE)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3://mybucket/data/'
3. Run a query against your table using the SELECT statement. For example:
SELECT col1, col2, col3 FROM mydatabase.mytable WHERE col1 > 10;
4. Athena will return the results of your query, which can then be displayed or saved to a new table.
You can also use AWS Glue Crawlers to create, update and delete the table in Athena automatically.
AWS Athena charges for the amount of data scanned by each query, so it is recommended to use query
optimization techniques, such as partitioning and compressing data, to minimize the amount of data scanned
and reduce costs.
AWS Snow Family is a group of devices that transport data in and out of AWS. AWS Snow Family
devices are physical devices.They can transfer up to exabytes of data.
1.AWS Snowcone
AWS Snowcone is a secure and small device.It transfers data.It is made out of 8 TB of storage space, 4
GB of memory, and 2 CPUs.
2. AWS Snowball
AWS Snowball has 2 types of devices, described in the table below.
Snowball Edge Storage Optimized devices Snowball Edge Compute Optimized devices
Great for large-scale data migrations Great for services that require a large amount of
computing resources.
Have 80 TB of HDD storage space for object Have 42 TB of HDD storage for object storage, and
storage 7.68 TB of NVMe SSD storage space for AWS EBS
block volumes.
Have 1 TB of SSD storage for block volumes Work on 208 Gib of memory and 52 vCPUs.
3.AWS Snowmobile
AWS Snowmobile moves large amounts of data to AWS. It can transfer up to 100 petabytes of data
One petabyte is 1 000 000 000 megabytes.
Amazon FSx makes it easy and cost effective to launch, run, and scale feature-rich, high-performance file
systems in the cloud. It supports a wide range of workloads with its reliability, security, scalability, and
broad set of capabilities. Amazon FSx is built on the latest AWS compute, networking, and disk
technologies to provide high performance and lower TCO.( Total Cost of Ownership).TCO is the
combination of the buying price (purchase price) of a product and the total cost of operation.) And as a fully
managed service, it handles hardware provisioning, patching, and backups -- freeing you up to focus on your
applications, your end users, and your business.
Amazon FSx lets you choose between four widely-used file systems: NetApp ONTAP, OpenZFS,
Windows File Server, and Lustre. This choice is typically based on your familiarity with a given file system
or by matching the file system's feature sets, performance profiles, and data management capabilities to the
requirements of your workload.
To create your Amazon FSx file system, you must create your Amazon Elastic Compute Cloud (Amazon
EC2) instance and the AWS Directory Service directory. If you don't have that set up already.
Rules Ports
UDP 53, 88, 123, 389, 464
TCP 53, 88, 135, 389, 445, 464, 636, 3268, 3269, 5985, 9389, 49152-65535
b. Add from and to IP addresses or security group IDs associated with the client compute instances that you
want to access your file system from.
c. Add outbound rules to allow all traffic to the Active Directory that you're joining your file system to. To
do this, do one of the following:
Allow outbound traffic to the security group ID associated with your AWS Managed AD directory.
Allow outbound traffic to the IP addresses associated with your self-managed Active Directory
domain controllers.
If you have a Multi-AZ deployment (see step 5), choose a Preferred subnet value for the primary file
server and a Standby subnet value for the standby file server. A Multi-AZ deployment has a
primary and a standby file server, each in its own Availability Zone and subnet.
For Windows authentication, you have the following options:
If you want to join your file system to a Microsoft Active Directory domain that is managed by AWS,
choose AWS Managed Microsoft Active Directory, and then choose your AWS Directory Service
directory from the list.
If you want to join your file system to a self-managed Microsoft Active Directory domain, choose Self-
managed Microsoft Active Directory, and provide the following details for your Active Directory.
The fully qualified domain name of your Active Directory.
Important : For Single-AZ 2 and all Multi-AZ file systems, the Active Directory domain name cannot
exceed 47 characters. This limitation applies to both AWS managed and self-managed Active Directory
domain names.
Amazon FSx requires a direct connection or internal traffic to your DNS IP Address. Connection via an
internet gateway is not supported. Instead, use a VPN, VPC peering, Direct Connect or a transit gateway
association.
DNS server IP addresses—the IPv4 addresses of the DNS servers for your domain
Service account username—the user name of the service account in your existing Active Directory.
Do not include a domain prefix or suffix.
Service account password—the password for the service account.
Confirm password—the password for the service account.
For Encryption, keep the default Encryption key setting of aws/fsx (default).
For Auditing - optional, file access auditing is disabled by default..
For Access - optional, enter any DNS aliases that you want to associate with the file system. Each alias
name must be formatted as a fully qualified domain name (FQDN).
For Backup and maintenance - optional, keep the default settings.
For Tags - optional, enter a key and value to add tags to your file system. A tag is a case-sensitive key-
value pair that helps you manage, filter, and search for your file system.
Choose Next.
Review the file system configuration shown on the Create file system page. For your reference, note
which file system settings you can modify after file system is created. Choose Create file system.
After Amazon FSx creates the file system, choose the file system ID in the File Systems dashboard.
Choose Attach, and note the fully qualified domain name for your file system. You will need it in a
later step.
AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually
unlimited cloud storage. You can use Storage Gateway to simplify storage management and reduce costs for
key hybrid cloud storage use cases. These include moving backups to the cloud, using on-premises file
shares backed by cloud storage, and providing low-latency access to data in AWS for on-premises
applications.
To support these use cases, the service provides four different types of gateways that seamlessly connect
on-premises applications to cloud storage, caching data locally for low-latency access:
1. Tape Gateway
2. File Gateway
3. File Gateway - FSx
4. Volume Gateway
1. Tape Gateway enables you to replace using physical tapes on premises with virtual tapes in AWS
without changing existing backup workflows. Tape Gateway supports all leading backup
applications and caches virtual tapes on premises for low-latency data access.
2. File gateway helps in storing a file interface inside Amazon S3. It combines a service and a virtual
software appliance. With the help of this, objects can be stored and retrieved from Amazon S3 using
industry-standard file protocols like Network File System (NFS), AND Server Message Block
(SMB).
3. FSx File Gateway optimizes on-premises access to fully managed, highly reliable file shares in
Amazon FSx for Windows File Server. Customers with unstructured or file data, whether from
SMB-based group shares, or business applications, may require on-premises access to meet low-
latency requirements. Amazon FSx File Gateway helps accelerate your file-based storage
migration to the cloud to enable faster performance, improved data protection, and reduced cost.
4. Volume Gateway offers cloud-backed storage to your on-premises applications using industry
standard iSCSI(Internet Small Computer System Interface) connectivity. You don't need to rewrite
your on-premises applications to use cloud storage. You can deploy Volume Gateway as a virtual
machine or on the Storage Gateway Hardware Appliance at your premises.
In this section, you can find instructions on how to create, deploy, and activate a File Gateway in AWS
Storage Gateway.
Topics
AWS Transfer Family is a secure transfer service that enables you to transfer files into and out of AWS
storage services. Transfer Family is part of the AWS Cloud platform:
AWS Transfer Family supports transferring data from or to the following AWS storage services.
Amazon Simple Storage Service (Amazon S3) storage.
Amazon Elastic File System (Amazon EFS) Network File System (NFS) file systems.
AWS Transfer Family supports transferring data over the following protocols:
Secure Shell (SSH) File Transfer Protocol (SFTP): version 3
File Transfer Protocol Secure (FTPS)