0% found this document useful (0 votes)
14 views

CCS Module 4

Amazon Web Services (AWS) is a cloud platform that offers a variety of on-demand computing resources and services, including computing power, storage, and databases, on a pay-as-you-go basis. AWS provides scalable solutions for web hosting, application development, and data processing, supported by a range of core and additional services, such as EC2 for compute capacity and various instance types tailored for different workloads. The platform's infrastructure includes multiple availability zones within regions to ensure high availability and fault tolerance for applications.

Uploaded by

darishdias30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

CCS Module 4

Amazon Web Services (AWS) is a cloud platform that offers a variety of on-demand computing resources and services, including computing power, storage, and databases, on a pay-as-you-go basis. AWS provides scalable solutions for web hosting, application development, and data processing, supported by a range of core and additional services, such as EC2 for compute capacity and various instance types tailored for different workloads. The platform's infrastructure includes multiple availability zones within regions to ensure high availability and fault tolerance for applications.

Uploaded by

darishdias30
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Amazon Web Services

WHAT IS AWS?
AWS (Amazon Web Services) is a comprehensive and widely adopted cloud platform offered by Amazon.
It provides on-demand cloud computing resources and services such as computing power, storage, and
databases to individuals, companies, and governments on a pay-as-you-go basis.

AWS enables scalable and flexible solutions, ranging from web hosting and application development to
data processing and machine learning. It supports a vast ecosystem of tools and integrations, making it
a popular choice for various cloud computing needs.

Core AWS services


Compute: Services that run your code, such as for web servers and server-side languages
Storage: Services for storing data, such as Amazon S3
Networking: Services for handling traffic, such as Elastic Load Balancing (ELB)

Other AWS services
Developer tools
Services that help you build and deploy applications, such as AWS CodeDeploy
Container orchestration
Services that help you run, scale, and secure Docker applications, such as Amazon ECS
Serverless compute
Services that allow you to run workloads without managing servers, such as AWS Lambda and AWS
Fargate
Cost management
Tools for managing and tracking costs and usage

Other AWS service categories:


Analytics, Blockchain, Game technology, Robotics, Internet of things (IoT), Machine learning, and
Security.

AWS Region and Availability Zone:


What are AWS Regions and Availability Zones?
Availability zones are highly available data centers within each AWS region. A region represents a
separate geographic area. Each availability zone has independent power, cooling and networking. When
an entire availability zone goes down, AWS is able to failover workloads to one of the other zones in the
same region, a capability known as “Multi-AZ” redundancy.

Each AWS region is isolated and operates independently from other regions but the availability zones
within each region are connected via low-latency links to provide replication and fault tolerance. If you
host all your data and instances in a single availability zone, which is affected by a failure, they would not
be available.
The purpose of this isolation is to serve workloads with high data sovereignty and compliance
requirements that do not permit user data to pass outside of a specific geographic region. These types of
workloads benefit from the structure of the AWS availability zones with low-latency and complete
separation from other regions.

Purpose of AZs:
The AZs within a Region are designed to be isolated from each other. This means if one AZ experiences
an outage, the others in the same Region can continue to operate, ensuring your application's
availability.
Benefits:
By using multiple AZs, you can achieve high availability, disaster recovery, and data redundancy

Amazon Compute Service


A Compute Service refers to a cloud-based service that provides computing power for running
applications, processing data, and executing workloads. These services allow users to deploy, manage,
and scale applications without needing to maintain physical hardware.
Benefits of Compute Services
Scalability – Can scale up or down based on demand.
Cost Efficiency – Pay for what you use (pay-as-you-go pricing).
Flexibility – Supports various operating systems, frameworks, and programming languages.
Managed Services – Reduces operational overhead by handling maintenance, security, and updates.

What is Amazon EC2 (Elastic Compute Cloud)?


Amazon Web service offers EC2 which is a short form of Elastic Compute Cloud (ECC) it is a cloud
computing service offered by the Cloud Service Provider AWS. You can deploy your applications in
EC2 servers without any worrying about the underlying infrastructure.

You configure the EC2-Instance in a very secure manner by using the VPC, Subnets, and Security
groups. You can scale the configuration of the EC2 instance you have configured based on the
demand of the application by attaching the autoscaling group to the EC2 instance. You can scale up
and scale down the instance based on the incoming traffic of the application.
Amazon EC2 (Elastic Compute Cloud) Instance
Amazon EC2 is a web service that provides resizable compute capacity in the cloud. It allows users
to launch virtual servers (EC2 instances) to run applications on-demand, without needing to invest in
physical hardware.

Amazon Machine Imgae


An Amazon Machine Image is a special type of virtual appliance that is used to instantiate (create) a
virtual machine within EC2.
It serves as the basic unit of deployment for services delivered using EC2. Whenever you want to
launch an instance, you need to specify AMI.
To launch instances, you can also use different AMIs. If you want to launch multiple instances from a
single AMI, then you need multiple instances of the same configuration.

Properties of AMI
An AMI contains:
1. A root volume template
Think of this as the base for your instance, which includes things like the operating system and
applications.
2. Launch permissions
These control who can use the AMI to create instances.
3. Block device mapping

Why do we need AMI?


Let us suppose that we want to launch 5 servers with the same configuration. One way of doing that
would be to launch a new EC2 instance every time and install the required packages every time.
While the other way of doing it would be to configure your EC2 instance once and then create an
image of that instance. Using that image you can deploy four more EC2 servers

Amazon EC2 – Instance Types


Different Amazon EC2 instance types are designed for certain activities. Consider the unique
requirements of your workloads and applications when choosing an instance type. This might
include needs for computing, memory, or storage.
. General-Purpose Instances
The computation, memory, and networking resources in general-purpose instances are balanced.
Scenarios, where you can use General Purpose Instances, are gaming servers, small databases,
personal projects, etc. Assume you have an application with a kind of equal computing, memory, and
networking resource requirements. Because the program does not require optimization in any
particular resource area, you can use a general-purpose instance to execute it.

Examples:
The applications that require computing, storage, networking, server performance, or want
something from everything, can utilize general-purpose instances.
If high-performance CPUs are not required for your applications, you can go for general-purpose
instances.

EC2 General-Purpose Instance Types


Here are several general-purpose examples from which we can pick:
T2. micro: The most well-known instance in AWS is t2.micro, which gives 1 CPU and 1 GB of memory
with low to moderate network performance. It is also free and highly helpful for individuals first
starting AWS.​
.

M6a Instance: The third-generation AMD EPYC processors used in the M6 instance are perfect for
general-purpose tasks. In m6a there are different sizes like m6a.large, m6a.2xlarge, m6a.4xlarge, and
so on. m6a.large offers 2 CPUs, 8GiB memory, and network performance up to 12.5 Gigabit.​

M5 instance: The newest generation of general-purpose instances, known as M5, are powered by
Intel’s Xeon Platinum 8175 processors. Its M5 divisions include m5. large, m5.12xlarge, and m5.24
large, and the sort of M5 service we select will depend on memory, CPUs, storage, and network
speed

Features
Powered by specifically designed AWS Graviton3 processors.
Default optimized with EBS.
It consists of dedicated hardware and a lightweight hypervisor.
The bandwidth is higher when compared to other types.

Applications
Web Servers: The web servers can be hosted in General-purpose instances. EC2 instances provide a
flexible and scalable platform for web applications.
Development and Test Environment: The developers can use these General-purpose instances to
build, test and deploy the applications. It is a cost-effective solution for running this environment.
Content delivery: The hosting of content delivery networks (CDNs) that distribute content to users
all over the world is possible using general-purpose instances. EC2 instances can be set up to provide
content with low latency and great performance.

Compute-Optimized Instances
Compute-optimized instances are appropriate for applications that require a lot of computation and
help from high-performance CPUs. You may employ compute-optimized instances for workloads
including web, application, and gaming servers just like general-purpose instances. This instance type
is best suited for high-performance applications like web servers, Gaming servers.

Examples
Applications that require high server performance or that employ a machine-learning model will
benefit from compute-optimized instances.
If you have some batch processing workloads or high-performance computing.

Compute-Optimized Some Instance Types


c5d.24large: The c5d instance, which has 96 CPUs, 192 GiB of RAM, 3600 GB of SSD storage, and 12
Gigabit of network performance, was selected primarily for its excellent web server performance.
There are other instance types, including giant and extra-large. Depending on our needs, we will
choose between the c5a big and extra-large instance types.

Features
Powered by specifically designed AWS Graviton3 processors.
It will use DDR5 memory, by which it will get 50% more bandwidth than DDR4.
By default EBS optimisation

Applications
Machine learning: Machine learning operations can be performed on Compute-optimized instances
because it will manage heavy workloads. The processing capacity required to swiftly and effectively
train massive machine learning models can be provided by compute-optimized instances.
Gaming: Compute-optimised is well suited for heavy workloads so it can easily manage the Gaming
operations easily. Compute-optimized will decrease the latency and it can deliver a high-quality
gaming experience.

. Memory-Optimized Instances
Memory-optimized instances are geared for workloads that need huge datasets to be processed in
memory. Memory here defines RAM which allows us to do multiple tasks at a time. Data stored is
used to perform the central processing unit (CPU) tasks it loads from storage to memory to run.
This process of preloading gives the CPU direct access to the computer program. Assume you have a
workload that necessitates the preloading of significant volumes of data prior to executing an
application.
A high-performance database or a task that requires real-time processing of a significant volume of
unstructured data might be involved in this scenario. In this case, consider using a
memory-optimized instance. It is used to run applications that require a lot of memory with high
performance.

Examples:
Helpful for databases that need to handle quickly.
Processes that do not need a large quantity of data yet require speedy and real-time processing.

Memory-Optimized Some Instance Types


The R and X categories belong to the memory-optimized family. Let’s discuss any one-off them.
R7g.medium: It is run on AWS Gravitation processors with ARM architecture. with 1 CPU, 8 (GiB) of
memory, an EBS storage type, and a maximum of 12.5% network bandwidth.
x1: X1 is mainly suited for enterprise edition databases with memory applications and comes with 64
vCPU, 976 GiB of memory, 1 x 1,920 GB of SSD storage, 7,000 Mbps of dedicated EBS bandwidth, and
10 Gbps of network performance.

Features
Elastic Fabric Adapter (EFA) is supported on the r7g.16xlarge and r7g.metal instances.
Includes the newest DDR5 memory, which provides 50% more bandwidth than DDR4.
Compared to R6g instances, improved networking bandwidth is 20% more.

Applications
In-Memory Databases: Memory-optimized instances are mostly suited for databases that contain
high bandwidth and memory capacity is high.
Big Data Processing: For big data processing workloads like Apache Spark and Apache Hadoop that
demand high memory capacity and bandwidth, memory-optimized instances can be deployed.
Instances that have been optimized for memory can offer the memory space and bandwidth
required to process huge amounts of data fast and effectively.
4. Storage Optimized Instances
Storage-optimized instances are made for workloads that demand fast, sequential read and write
access to huge datasets.
Distributed file systems, data warehousing applications, and high-frequency online transaction
processing (OLTP) systems are examples of workloads that are suited for storage-optimized
instances.
Storage-optimized instances are built to provide applications with the lowest latency while accessing
the data.

4. Storage Optimized Instances


Storage-optimized instances are made for workloads that demand fast, sequential read and write
access to huge datasets. Distributed file systems, data warehousing applications, and high-frequency
online transaction processing (OLTP) systems are examples of workloads that are suited for
storage-optimized instances. Storage-optimized instances are built to provide applications with
the lowest latency while accessing the data.

Storage Optimized Instance Types


Im4gn: Because Im4gn is powered by AWS Graviton processors, it offers the best pricing
performance for workloads in Amazon EC2 that demand a lot of storage. Im4gn.large’s base
configuration has 2 CPUs, 8 GiB of memory, and EBS storage with a network bandwidth of up to 25
Gbps. It offers some other instance types of ls4gn, l4i, D, and H.

Features
Using AWS Graviton2 processors, which provide the best price/performance for workloads in
Amazon EC2.
Geared at tasks that correspond to 4 GB of RAM per vCPU.
Improved Networking (ENA)-based Elastic Network Adapter (ENA)-based up to 100 Gbps of network
bandwidth.

Applications
Amazon EC2 C5d Instance: It is suitable for applications which are having very high intensive
workloads. It can deliver high input and output performance with low latency.
Amazon EC2 I3 instance: The storage-optimized instance is well-suited for applications with high
storage needs. It also provides local NVMe storage.

5. Accelerated Computing Instances


Coprocessors are used in accelerated computing instances to execute specific operations more
effectively than software running on CPUs. Floating-point numeric computations, graphics
processing, and data pattern matching are examples of these functions. A Hardware-Accelerator/
Co-processor is a component in computing that may speed up data processing. Graphics
applications, game streaming, and application streaming are all good candidates for accelerated
computing instances

Examples:
If the application utilizes floating-point calculations or graphics processing, accelerated computing
instances will be the best among all.
Also, data pattern matching can be done more efficiently with this instance type.

Accelerated Computing Instance Types


Accelerated computing consists of mainly P1, Inf2, G5, G5g, G4dn, G4ad, G3, F1 and VT1.
P4: It offers 3.0 GHz 2nd Generation Intel Xeon Processors. of 8 GPUs, 96 CPUs, and memory of
1152(GiB) with network bandwidth of 400ENA and EFA

Features
2nd Generation Intel Xeon Scalable processors, 3.0 GHz (Cascade Lake P-8275CL).
8 NVIDIA A100 Tensor Core GPUs maximum.
400 Gbps instance networking with support for NVIDIA GPUDirect RDMA and Elastic Fabric Adapter
(EFA) (remote direct memory access).

Applications
Amazon EC2 P3 Instances: High-performance computing, rendering, and machine learning
workloads are all well-suited to these instances. Its NVIDIA V100 GPUs enable them to deliver up to
1 petaflop of mixed-precision performance per instance, which makes them perfect for simulations
of computational fluid dynamics, molecular dynamics, and complicated deep learning models

Amazon EC2 G4 Instances: These instances are designed for graphically demanding tasks like video
transcoding, virtual desktops, and gaming. They provide up to 65 teraflops of single-precision
performance per instance and are driven by NVIDIA T4 GPUs.

What is Amazon EC2 Instance Lifecycle?


AWS EC2 instance lifecycle helps to identify EC2 instance transformation through its different stages
from launch to termination. The below figure illustrates the different transitions of EC2 instance
between different instance states.
AWS EC2 Instance Transition States
EC2 instance lifecycle consists of the following transition states. Continue reading to go through the
EC2 Instance lifecycle.

Instance Launch
The moment you launch an instance, it goes to the pending phase. The type of instance specified by
you at launch decides the host computer’s hardware for your AWS EC2 instance.

To boot the EC2 instance, Amazon uses the Amazon Machine Image (AMI), specified at the launch.
Once the EC2 instance has become ready, it goes to the running state. After that, it lets you connect
to your instance that was running to use it just like you use your computer.

Instance Stop and Start


In case your instance fails to run applications as it was expected or fails a status check, and its root
volume is an Amazon EBS volume, you have to restart your instance to resolve the issue. Your
instance goes into the stopping state, and after that, it goes into the stopped state once you stop it.
The stopped state of the instance enables you to modify specific attributes of your instance, such as
the type of instance.

Your instance again goes into the pending phase as soon as you start it, and is shifted to another host
system (in a few cases, it may remain on the same host). Your data on the former host will be lost
while you stop and start your instance.

Instance Hibernate
While hibernating your instance, your operating system performs hibernation. Hibernating the
instance saves data from its memory to Amazon Elastic Block Storage root volume. When you start
your instance again, it loads the data after restoring the Amazon Elastic Block Storage root volume to
its former state.

At this stage, the process in the instance stop and start phase is repeated. The usage of the instance
that is hibernated is not charged by Amazon while being in the stopped state. However, they charge
for it while in the stopping state.

Instance Reboot
To reboot your instance, you can use an Amazon EC2 console, API, or a command line tool. To reboot
your instance, it is recommended to use Amazon EC2 rather than running the reboot command of
your operating system. Rebooting an instance is similar to that of an OS reboot process.
The instance stays on the host system. While it may take some minutes to reboot, the time rebooting
takes depends on your instance configuration.

Instance Retirement
When Amazon encounters any irreparable fault of the implicit hardware that hosts the instance, it
schedules an instance retirement. Once the EC2 instance has reached its retirement date that was
scheduled before, Amazon will stop or terminate it. If you were using Amazon EBS volume as your
EC2 instance root tool, start your instance any time again. If your device was instance store volume,
it will terminate the instance, and you can never use it.

Instance Termination
The moment you don’t need your instance anymore, you can terminate that instance. Once your
instance’s status changes to terminated, you will be stopped billing. You can’t end your instance
using CLI, console, or API if you have enabled termination protection.
The instance will remain visible for a short time period in the console after you have terminated it.
You can also define it using the API or CLI. As resources gradually disassociate from the instance that
is terminated, they may not be evident once the instance is terminated.

Storage Service
What is Amazon S3?
Amazon S3 is a Simple Storage Service in AWS that stores files of different types like Photos, Audio,
and Videos as Objects providing more scalability and security to.
It allows the users to store and retrieve any amount of data at any point in time from anywhere on
the web. It facilitates features such as extremely high availability, security, and simple connection
to other AWS Services.

What is Amazon S3 Used for?


Amazon S3 is used for various purposes in the Cloud because of its robust features with scaling and
Securing of data. It helps people with all kinds of use cases from fields such as Mobile/Web
applications, Big data, Machine Learning and many more. The following are a few Wide Usage of
Amazon S3 service.
Data Storage: Amazon s3 acts as the best option for scaling both small and large storage
applications. It helps in storing and retrieving the data-intensitive applications as per needs in ideal
time.
Backup and Recovery: Many Organizations are using Amazon S3 to backup their critical data and
maintain the data durability and availability for recovery needs.
Hosting Static Websites: Amazon S3 facilitates in storing HTML, CSS and other web content from
Users/developers allowing them for hosting Static Websites benefiting with low-latency access and
cost-effectiveness. To know more detailing refer this Article – How to host static websites using
Amazon S3

Data Archiving: Amazon S3 Glacier service integration helps as a cost-effective solution for long-term
data storing which are less frequently accessed applications.
Big Data Analytics: Amazon S3 is often considered as data lake because of its capacity to store large
amounts of both structured and unstructured data offering seamless integration with other AWS
Analytics and AWS Machine Learning Services.

How Does Amazon S3 works?


Amazon S3 works on organizing the data into unique S3 Buckets, customizing the buckets with
Acccess controls. It allows the users to store objects inside the S3 buckets with facilitating features
like versioning and lifecycle management of data storage with scaling.

1. Amazon S3 Buckets and Objects


Amazon S3 Bucket: Data, in S3, is stored in containers called buckets. Each bucket will have its own
set of policies and configurations. This enables users to have more control over their data. Bucket
Names must be unique. Can be thought of as a parent folder of data. There is a limit of 100 buckets
per AWS account. But it can be increased if requested by AWS support.

Amazon S3 Objects: Fundamental entity type stored in AWS S3.You can store as many objects as you
want to store. The maximum size of an AWS S3 bucket is 5TB. It consists of the following:
Key
Version ID
Value
Metadata
Subresources
Access control information
Tags

. Amazon S3 Versioning and Access Control


S3 Versioning: Versioning means always keeping a record of previously uploaded files in S3. Points to
Versioning are not enabled by default. Once enabled, it is enabled for all objects in a bucket.
Versioning keeps all the copies of your file, so, it adds cost for storing multiple copies of your data
For example, 10 copies of a file of size 1GB will have you charged for using 10GBs for S3 space.
Versioning is helpful to prevent unintended overwrites and deletions. Objects with the same key can
be stored in a bucket if versioning is enabled (since they have a unique version ID). To know more
about versioning refer this article – Amazon S3 Versioning

Access control lists (ACLs): A document for verifying access to S3 buckets from outside your AWS
account. An ACL is specific to each bucket. You can utilize S3 Object Ownership, an Amazon S3
bucket-level feature, to manage who owns the objects you upload to your bucket and to enable or
disable ACLs.

Bucket policies and Life Cycles


Bucket Policies: A document for verifying the access to S3 buckets from within your AWS account,
controls which services and users have what kind of access to your S3 bucket. Each bucket has its
own Bucket Policies.

Lifecycle Rules: This is a cost-saving practice that can move your files to AWS Glacier (The AWS Data
Archive Service) or to some other S3 storage class for cheaper storage of old data or completely
delete the data after the specified time. To know more about refer this article – Amazon S3 Life Cycle
Management

Keys and Null Objects


Keys: The key, in S3, is a unique identifier for an object in a bucket. For example in a bucket ‘ABC’
your GFG.java file is stored at javaPrograms/GFG.java then ‘javaPrograms/GFG.java’ is your object
key for GFG.java.

Null Object: Version ID for objects in a bucket where versioning is suspended is null. Such objects
may be referred to as null objects.List) and Other settings for managing data efficiently.
Working with Amazon Bucket
Step 1: Login into the Amazon account with your credentials and search form S3 and click on the S3.
Now click on the option which is “Create bucket” and configure all the options which are shown
while configuring.

Step 2: After configuring the AWS bucket now upload the objects into the buckets based upon your
requirement. By using the AWS console or by using AWS CLI following is the command to upload the
object into the AWS S3 bucket.
aws s3 cp <local-file-path> s3://<bucket-name>/

Step 3: You can control the permissions of the objects which was uploaded into the S3 buckets and
also who can access the bucket. You can make the bucket public or private by default the S3 buckets
will be in private mode.

Step 4: You can manage the S3 bucket lifecycle management by transitioning. Based upon the rules
that you defined S3 bucket will be transitioning into different storage classes based on the age of the
object which is uploaded into the S3 bucket.

Step 5: You need to turn to enable the services to monitor and analyze S3. You need to enable the S3
access logging to record who was requesting the objects which are in the S3 buckets.
S3 Bucket Properties
1. Bucket Versioning:
Amazon S3 bucket versioning is a feature that allows you to store multiple versions of an object in
the same bucket. This can help protect your data from accidental deletion or overwrites.
How it works
S3 assigns a unique version ID to each object uploaded to the bucket
You can use the version ID to identify and manage different versions of the object
You can revert to older versions of an object

Benefits
Data recovery: Recover from unintended user actions, application bugs, and application failures
Data retention and archiving: Maintain a history of changes to objects
Compliance: Meet compliance requirements by maintaining a history of changes to objects
Security: Protect against accidental or malicious deletion

S3 Server Access Logging


Amazon S3 Access Logging is a feature that enables you to capture detailed records of requests
made to your S3 bucket. These logs help in security analysis, auditing, and monitoring access
patterns.
How S3 Access Logging Works
Source Bucket: The bucket whose access requests you want to log.
Target Bucket: The bucket where logs will be stored. It must be in the same AWS region as the
source bucket.
Log File Format: Logs are delivered in a text file format, containing details like request type,
requester IP, time of request, and response status.

Steps to Enable S3 Access Logging


Create a Target Bucket (if not already created):
This bucket will store the logs and should not be the same as the source bucket.
AWS recommends enabling versioning and lifecycle policies to manage logs efficiently.
Grant Permissions to the Target Bucket:
The target bucket must have a bucket policy granting write access to the S3 logging service.

Enable Logging on the Source Bucket:


Open the S3 console.
Select the source bucket.
Go to the Properties tab.
Under Server access logging, click Edit.
Enable logging and specify the target bucket.
Save changes.
Verify Logging:
AWS writes logs with a delay of a few hours.
Logs appear as objects in the target bucket with a prefix like AWSLogs/your-source-bucket/.

Static Web hosting


Amazon S3 Static Website Hosting allows you to host a website using an S3 bucket as the origin. This
is useful for hosting simple static sites (HTML, CSS, JavaScript) without needing a web server.

Steps to Set Up S3 Static Website Hosting


1. Create an S3 Bucket
The bucket name must match your domain name (e.g., example.com).
Enable public access (since website hosting requires public read access).
2. Upload Website Files
Upload your index.html, error.html, CSS, and JavaScript files.

3. Enable Static Website Hosting


Go to the S3 console.
Select your bucket.
Click Properties → Static website hosting.
Select "Enable".
Set:
Index document → index.html
Error document (optional) → error.html
Copy the Bucket Website
Endpoint (e.g., http://example-bucket.s3-website-us-east-1.amazonaws.com).

Set Bucket Policy for Public Read Access


Go to the Permissions tab.
Add the following Bucket Policy to allow public access:json
CopyEdit
Save changes.

(Optional) Configure a Custom Domain with Route 53


Use Amazon Route 53 (or another DNS provider) to point your custom domain to the S3 website.
Create an A record (alias) pointing to the S3 website endpoint.

(Optional) Add HTTPS with CloudFront


S3 does not support HTTPS directly, but you can use Amazon CloudFront (CDN) to enable HTTPS.
Create a CloudFront distribution with:
Origin: Your S3 Bucket URL (not the website endpoint).
Viewer Protocol Policy: Redirect HTTP to HTTPS.
Attach an SSL certificate via AWS Certificate Manager (ACM).
S3 Bucket Policy
An Amazon S3 bucket policy is a set of permissions that controls access to an S3 bucket and its
objects. It's an Identity and Access Management (IAM) policy that's resource-based.

What does an S3 bucket policy do?


It allows or denies actions requested by a user or role
It's a key part of securing S3 buckets from unauthorized access
It's used to grant access to other resources and users
It's used to create or modify an S3 bucket to receive log files

How to create an S3 bucket policy?


Only the bucket owner can associate a policy with a bucket
You can use the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDK for
Java to view your bucket policy

What are some examples of S3 bucket policies?


Allowing an object to be uploaded to a bucket that uses KMS as the S3 Server Side Encryption
method
Restricting access to a specific IAM role
Blocking access to the bucket and its objects for users that are not using the IAM role credentials

S3 Bucket Policy JSON file


{ "Version": "2012-1017”-"Statement": [ { "Effect": "Allow",
"Principal": { "AWS":"arn:aws:iam::123456789012:user/YourUser" },
"Action": "s3:*",
"Resource": [ "arn:aws:s3:::your-bucket-name","arn:aws:s3:::your-bucket-name/*" ] } ] }

S3 Public Access Overview


By default, Amazon S3 blocks public access to buckets and objects to prevent unauthorized
exposure. However, you can enable public access if needed, such as for static website
hosting or public file sharing.

S3 Public Access Overview


By default, Amazon S3 blocks public access to buckets and objects to prevent unauthorized
exposure. However, you can enable public access if needed, such as for static website
hosting or public file sharing.

. Enable Public Access (S3 Console)


Steps:
Go to the AWS S3 Console: https://s3.console.aws.amazon.com
Select Your Bucket: Click on the bucket name.
Go to "Permissions": Scroll down to the "Block Public Access" settings.
Disable Block Public Access:
Click Edit.
Uncheck "Block all public access".
Confirm and save changes.

2. Add a Bucket Policy for Public Read Access


Once public access is enabled, you need to set a bucket policy to allow public users to read objects.
3. Make Individual Files Public (Optional)
If you want only specific files to be public:
Steps:
Go to the S3 Console.
Open your bucket and navigate to the object (file).
Click the file name → Permissions tab.
Under "Public access", click Edit.
Check "Grant public-read access", then Save.

4. Test Public Access


After making a file public, try opening it in a browser:
https://your-bucket-name.s3.amazonaws.com/your-file.jpg
If you see the file, public access is working.

Amazon S3 ACL (Access Control List) Guide


Amazon S3 Access Control Lists (ACLs) manage access permissions at the bucket and object levels.
ACLs can grant specific permissions to AWS accounts, IAM users, or public users.

Object ACL Permissions


Permission Description

READ Allows reading the object data & metadata

READ_ACP Allows viewing the ACL of the object

WRITE_ACP Allows modifying the ACL of the object


FULL_CONTR
Grants all permissions on the object
OL

2. Default ACLs
By default:
Buckets are private, only the owner has full control.
Objects inherit the ACL from the bucket unless explicitly set.
Permission Description

READ Allows reading the object data & metadata

READ_ACP Allows viewing the ACL of the object

WRITE_ACP Allows modifying the ACL of the object

FULL_CONTROL Grants all permissions on the object

3. Enabling or Disabling ACLs


By default, ACLs are disabled if S3 Block Public Access is turned on.
To enable ACLs:
Go to AWS S3 Console.
Select your bucket → Permissions.
Disable "Block all public access".

Disabling ACLs (Recommended)


AWS recommends using IAM Policies and Bucket Policies instead of ACLs.​
To disable ACLs:
Go to AWS S3 Console → Select Bucket.
Permissions → Disable ACLs (Object Ownership).
Use Bucket Policies for access control

S3 Encryption
Amazon S3 provides encryption at rest and encryption in transit to protect data from unauthorized
access.
Server-side encryption
SSE-S3
The default encryption method for S3 buckets, where Amazon manages the encryption keys
SSE-C
The user manages the encryption keys, and Amazon manages the encryption and decryption
DSSE-KMS
A dual-layer encryption option that uses keys stored in the AWS Key Management Service

Client-side encryption
The user encrypts the data before uploading it to S3
The user provides the encryption keys, which Amazon does not store or transmit
SSE-KMS
Similar to SSE-S3, but the user manages the encryption keys using the Key Management Service
(KMS)

S3 Event Notification
The Amazon S3 notification feature enables you to receive notifications when a certain event occurs
inside your bucket. To get notifications, first, add a notification configuration that reads the event
you want Amazon S3 to publish and the destinations where Amazon S3 will send the notifications.
This configuration is stored in the notification sub-resource that is associated with a bucket.

Types of Event Notifications:


Currently, Amazon S3 can publish notifications for the following supported events:
New object created events — Amazon S3 sends a notification when an object is created. It supports
multiple APIs to create objects such as Put, Post, Copy, and Multipart Upload. We can also use a
wildcard (s3:ObjectCreated:*) if any of the objects create an event happens.

Object removal events — Amazon S3 sends a notification upon deletion of an object. It supports two
delete options. One is Permanently Delete and the other is Delete Marker Created. We can also use a
wildcard (s3:ObjectRemoved:*) if any of the objects delete event happens.

Restore object events — Amazon S3 allows restoration of objects archived to the S3 Glacier storage
classes. Your request to notified upon completion of object restoration. It is of two types. The first is
Restore Initiated and other is Restore Completed. we can also use a wildcard (s3:ObjectRestore:*) if
any of the objects restore events occurs.

Reduced Redundancy Storage (RRS) object lost events — Amazon S3 notifies by delivering a
message when it detects that an object of the RRS storage class has been lost.

Replication events — Amazon S3 sends two event notifications. One, when an object fails replication
when an object exceeds the 15-minute threshold, when an object is replicated after the 15-minute
threshold, and when an object is no longer tracked by replication metrics.

Another when that object replicates to the destination Region. We can also use a wildcard
(s3:Replication:*) if any of the object replication events happens.

Elastic Storage Block


Amazon Elastic Block Store (EBS) is a cloud-based storage service that allows users to store data on
Amazon EC2 instances. It's a block-level storage solution that's similar to a physical hard drive.

Key Features of EBS:


Durable and Persistent storage.
Supports SSD (gp3, io2) and HDD (st1, sc1) volumes.
Supports snapshots for backups and recovery.
Can be encrypted for data security.
Scalable and high-performance storage.
Can be detached and re-attached to instances as needed.

EBS Types
Solid state drive (SSD) volumes
SSD-backed volumes are optimized for transactional workloads involving frequent read/write
operations with small I/O size, where the dominant performance attribute is IOPS. SSD-backed
volume types include General Purpose SSD and Provisioned IOPS SSD .
HDD
Hard disk drive (HDD) volumes
HDD-backed volumes are optimized for large streaming workloads where the dominant performance
attribute is throughput. HDD volume types include Throughput Optimized HDD and Cold HDD.

Previous generation volumes


Magnetic (standard) volumes are previous generation volumes that are backed by magnetic drives.
They are suited for workloads with small datasets where data is accessed infrequently and
performance is not of primary importance. These volumes deliver approximately 100 IOPS on
average, with burst capability of up to hundreds of IOPS, and they can range in size from 1 GiB to 1
TiB.

Object storage vs. block storage: How are they different?


Object storage and block storage are two types of cloud storage — meaning, remote data storage
that can be accessed via an Internet connection. Object storage is highly scalable and customizable,
but not always fast.

Block storage is fast, but usually more expensive than object storage. Which one better fits an
organization's use case depends on a number of factors.
Overall, object storage is typically used for large volumes of unstructured data, while block storage
works best with transactional data and small files that need to be retrieved often.

Block storage divides files and data into equally sized blocks. Each block has a unique identifier,
stored in a data lookup table. When data needs to be retrieved, the data lookup table is used to find
the required blocks, which are then reassembled into their original form.

Object storage is a method for saving large volumes of unstructured data, including sensor data,
audio files, logs, video and photo content, webpages, and emails. Each file or segment of data is
saved as an "object," and each object includes metadata and a unique name or identifier for data
retrieval.

Archives vs Backups
Amazon Glacier
AWS provides a variety of storage solutions designed to meet different data access needs. Whether
you require high-performance storage for frequently accessed data or cost-efficient options for
long-term retention.
AWS has a solution. Key storage options include Amazon S3 for scalable object storage, Amazon EFS
for shared file storage, Amazon EBS for fast block storage, AWS Backup for automated backups,
and Amazon Glacier for low-cost archival storage.

Additionally, AWS offers data migration services like AWS Snowball and AWS DataSync to help
move large datasets efficiently.

Key Features and Benefits:


Cost-Effective Storage:
S3 Glacier is designed for data that doesn't need to be accessed frequently, making it a cost-effective
solution for archiving and long-term storage.
Durable and Secure:
Data stored in S3 Glacier is designed to be highly durable and secure, with multiple copies stored
across multiple facilities.
Scalable:
S3 Glacier scales to store any amount of data, offering options to store it in various AWS regions.

Flexible Retrieval Options:


S3 Glacier offers different storage classes optimized for various access patterns and retrieval
needs:S3 Glacier Instant Retrieval: Offers the fastest access to archive storage, with milliseconds
retrieval, ideal for performance-sensitive use cases.
S3 Glacier Flexible Retrieval (formerly S3 Glacier): Provides retrieval in minutes or free bulk
retrievals in 5-12 hours, suitable for backup and disaster recovery use cases.
S3 Glacier Deep Archive: The lowest-cost storage in the cloud with data retrieval from 12-48 hours,
ideal for long-lived archive storage like compliance archives and digital media preservation
Backup Vs Archive

Feature Backup Archive

Purpose Data recovery from loss or corruption Long-term data retention and access

Data Current, actively used data Historic, infrequently accessed data

Frequenc Frequent, often daily or hourly Less frequent, potentially monthly or


y yearly

Retention Short-term, overwritten regularly Long-term, often indefinite

Cost Higher storage cost due to frequent access Lower storage cost due to infrequent
access

Example Restoring a deleted file, recovering from a Keeping old financial records for
crash compliance

Object Storage vs Block Storage.

Feature Block Storage Object Storage

Data Data is divided into fixed-size Data is stored as objects with


Organization blocks. metadata.

Use Cases Databases, virtual machine file Large volumes of unstructured


systems, applications needing low data (e.g., media, backups,
latency and high performance. archives).
Performance High performance and low latency High throughput and durability
for block access. for large datasets.

Scalability Limited scalability and can be Highly scalable and


costly to expand. cost-effective for large storage
needs.

Metadata Metadata management is often Metadata is included within each


handled externally. object.

Cost Can be more expensive for large Generally more cost-effective for
volumes of data. storing large volumes of data.

Data Blocks can be modified in place. Objects are typically immutable


Modification and require recreation upon
modification.

Best for Structured data, transactional Unstructured data, large


data, applications requiring fast archives, media storage.
random access.

AWS VPC:

With Amazon Virtual Private Cloud (Amazon VPC), you can launch AWS resources in a logically
isolated virtual network that you've defined. This virtual network closely resembles a traditional
network that you'd operate in your own data center, with the benefits of using the scalable
infrastructure of AWS
VPC Components

1. Subnets:
Subnets are logical segments of a VPC's IP address range.

They allow you to organize and isolate resources within your VPC.

Each subnet must be in a single Availability Zone.

You can associate multiple subnets with the same route table.

2. Route Tables:

Route tables define the routing rules for network traffic within a VPC.

They determine where network traffic should be sent, based on its destination.

Each subnet in a VPC is associated with a route table.

Route tables contain local routes for communication within the VPC, which are added by default.

3. Internet Gateway:

An internet gateway is a horizontally scaled, redundant, and highly available component that enables
your VPC to communicate with the internet.

It allows instances in your VPC to access the internet.


An internet gateway must be attached to a VPC to enable external connectivity.

4. NAT Gateway:

NAT gateways allow instances in a private subnet to connect to the internet without a public IP address.

They also enable instances to communicate with other AWS services that don't support private IPs.

NAT gateways are a managed service by AWS, ensuring high availability and scalability.

5. Security Groups:

Security groups act as virtual firewalls for your instances, controlling inbound and outbound traffic.

They use a list of rules to allow or deny traffic based on IP address, protocol, and port.

6. Network ACLs:

Network ACLs are another layer of security for your VPC, providing an additional layer of firewall
functionality.

They are associated with subnets and control traffic in and out of those subnets.

Network ACLs provide more granular control than security groups, allowing you to apply rules at the
subnet level.

7. VPC Endpoints:

VPC endpoints enable private connectivity between your VPC and supported AWS services.

They provide a secure and private connection without using the public internet.

VPC endpoints are virtual devices that are horizontally scaled, redundant, and highly available.

The following features help you configure a VPC to provide the connectivity that your applications
need:
Virtual private clouds (VPC)A VPC is a virtual network that closely resembles a traditional network
that you'd operate in your own data center. After you create a VPC, you can add subnets.
SubnetsA subnet is a range of IP addresses in your VPC. A subnet must reside in a single Availability
Zone. After you add subnets, you can deploy AWS resources in your VPC.
IP addressingYou can assign IP addresses, both IPv4 and IPv6, to your VPCs and subnets. You can also
bring your public IPv4 addresses and IPv6 GUA addresses to AWS and allocate them to resources in
your VPC, such as EC2 instances, NAT gateways, and Network Load Balancers.

RoutingUse route tables to determine where network traffic from your subnet or gateway is
directed.
Gateways and endpointsA gateway connects your VPC to another network. For example, use
an internet gateway to connect your VPC to the internet. Use a VPC endpoint to connect to AWS
services privately, without the use of an internet gateway or NAT device.
Peering connectionsUse a VPC peering connection to route traffic between the resources in two
VPCs.

Traffic MirroringCopy network traffic from network interfaces and send it to security and monitoring
appliances for deep packet inspection.
Transit gatewaysUse a transit gateway, which acts as a central hub, to route traffic between your
VPCs, VPN connections, and AWS Direct Connect connections.
VPC Flow LogsA flow log captures information about the IP traffic going to and from network
interfaces in your VPC.
VPN connectionsConnect your VPCs to your on-premises networks using AWS Virtual Private
Network (AWS VPN).

Subnet
A subnet is a subdivision of a VPC's IP address range (CIDR block). Subnets allow you to segment
your network logically, separating resources for security, routing, and management.

Why Subnetting is Important?


Let’s consider a company that follows classful addressing, it has a Class C network
(192.168.1.0/24) with 256 IP addresses. It has three departments:
Sales: 20 devices
HR: 10 devices
IT: 50 devices

Without subnetting, all departments share the same network, and all 256 IP addresses are available
to everyone, which leads to:
IP Waste: Only 80 devices are needed (20 + 10 + 50), but all 256 addresses are allocated,
wasting 176 addresses.
Performance Issues: Since all departments are on the same network, any data sent between devices
floods the entire network, slowing communication for everyone. For example, heavy data transfer in
IT can impact Sales and HR.

With Subnetting, we split the network into three subnets, allocating just enough IP addresses for
each department:
Sales: 192.168.1.0/27 → 32 IPs (for 20 devices, 12 spare)
HR: 192.168.1.32/28 → 16 IPs (for 10 devices, 6 spare)
IT: 192.168.1.48/26 → 64 IPs (for 50 devices, 14 spare)

By subnetting, we:
Save IP addresses (Efficiency): Only 112 addresses are used (80 + some spare), leaving 144 unused
for future growth.
Keep networks faster (Better Performance): Data within each department stays in its subnet. For
example, HR traffic stays in HR, reducing network congestion for Sales and IT.
Protect sensitive data (Improved Security): Each department is isolated. If someone in Sales tries to
access HR systems, subnet restrictions block them.

AWS Subnets in VPC


Key Characteristics:
Availability Zone: Each subnet must reside entirely within one Availability Zone and cannot span
zones.
IP Address Range: You define the IP address range (CIDR block) for a subnet when you create it.
Public vs. Private: You can have both public and private subnets.
Public subnets: have resources accessible to the public, while private subnets have resources
accessible only within the private network.
Routing: You use route tables to control where network traffic is directed to and from your subnets.
Security: You can use security groups to control traffic to and from resources within a subnet

Elastic Network Interface


An elastic network interface is a logical networking component in a VPC that represents a virtual
network card. You can create and configure network interfaces and attach them to instances that
you launch in the same Availability Zone.
The attributes of a network interface follow it as it's attached or detached from an instance and
reattached to another instance. When you move a network interface from one instance to another,
network traffic is redirected from the original instance to the new instance.

Network interface attributes


A network interface can include the following attributes:
A primary private IPv4 address from the IPv4 address range of your subnet
A primary IPv6 address from the IPv6 address range of your subnet
Secondary private IPv4 addresses from the IPv4 address range of your subnet
One Elastic IP address (IPv4) for each private IPv4 address

One public IPv4 address


Secondary IPv6 addresses
Security groups
A MAC address
A source/destination check flag
A description

Why Use Elastic Network Interfaces?


High availability and failover: You can move an ENI from one instance to another in case of failure.
Multi-homed instances: Attach multiple ENIs to a single instance for complex network configurations
(e.g., segregating public and private traffic).
Network traffic segregation: Assign different security groups and subnets.
Elastic IP movement: Quickly remap an Elastic IP to a new instance via ENI.
Scaling and modularity: Easily attach/detach to adapt to workload changes.
Common Use Cases:
Building high-availability failover solutions.
Network appliances (firewalls, NAT instances, load balancers).
Creating management networks separated from production.
Running software with multiple network interfaces (e.g., databases, monitoring agents).

Internet Gateway
In AWS, an Internet Gateway (IGW) is a horizontally scaled, redundant, and highly available VPC
component that enables two-way communication between your VPC and the internet, allowing
resources in public subnets to connect to and be accessed from the internet

Key Functionality:
Two-way communication:
IGWs facilitate both outbound connections (from your VPC to the internet) and inbound connections
(from the internet to your VPC).
Public IP Addresses:
Resources in public subnets with public IP addresses (like EC2 instances with Elastic IPs) can use the
IGW to access the internet.

NAT (Network Address Translation):


For IPv4 traffic, the IGW performs NAT, allowing instances with private IP addresses to access the
internet.
Routing:
You need to configure your route tables to direct internet-bound traffic to the IGW.
Security:
Remember to configure security groups and network ACLs to allow the desired internet traffic flow

Route Table
In AWS, a route table is a set of rules (called routes) that determines where network traffic from your
VPC is directed, specifying which network interface to send traffic to based on its destination. Each
subnet in your VPC must be associated with a route table, and a subnet can only be associated with
one route table at a time.

Purpose:
Route tables control the flow of network traffic within your VPC and to external networks.
Routes:
Each route in a table specifies a destination (IP address or CIDR block) and a target (e.g., internet
gateway, network interface, or another route table).
Local Routes:
These routes enable communication within the VPC, allowing traffic to flow between subnets.
Internet Gateway Routes:
These routes allow traffic to flow between the VPC and the internet

Internet Gateway Routes:


These routes allow traffic to flow between the VPC and the internet.
Subnet Association:
Each subnet in a VPC must be associated with a route table, which controls the traffic for that
subnet.
Route Table Association:
A route table can be associated with multiple subnets.
Example:
If a subnet's route table has a route pointing to an internet gateway, traffic from that subnet to the
internet is allowed

Custom Route Tables:


You can create custom route tables and associate them with subnets to explicitly control how each
subnet routes traffic.
Default Route Table:
By default, a route table contains a local route for communication within the VPC.
Longest Prefix Match:
AWS uses the most specific route in your route table that matches the traffic to determine how to
route the traffic.
Route Propagation:
You can enable route propagation for your route table to automatically propagate your network
routes to the table.

Transit Gateway:
In the context of Transit Gateways, route tables are used to control how traffic is routed to all the
connected networks.
VPC Peering:
You can use VPC peering connections to route traffic between VPCs.
Security Groups
In AWS, Security Groups act as virtual firewalls, controlling inbound and outbound traffic for EC2
instances, allowing granular access control and enhancing cloud security.
Virtual Firewalls:
Security Groups function as virtual firewalls, inspecting and controlling network traffic to and from
your EC2 instances.
Instance-Level Security:
They operate at the instance level, meaning you can assign different security rules to different
instances or groups of instances.
Inbound and Outbound Rules:
You define rules that specify which traffic is allowed in (inbound) and out (outbound) of your
instances, based on IP addresses, ports, and protocols. ​

Stateful:
AWS Security Groups are stateful, meaning they track the direction of the traffic and allow traffic
that matches a previously established connection.
Associated with EC2 Instances:
You associate Security Groups with EC2 instances when you launch them.
Traffic Control:
Security Groups filter traffic based on the rules you define, allowing or denying connections based on
these rules.
Default Security Group:
When you create a VPC, AWS automatically creates a default security group that allows all
traffic. You can modify this default security group or create your own.
Multiple Security Groups:
You can assign multiple security groups to an instance, allowing for more complex traffic control
scenarios

Benefits:
Enhanced Security:
Security Groups are a crucial part of AWS's security architecture, helping to protect your EC2
instances and the data they store.
Granular Control:
You can fine-tune your network traffic control, allowing only necessary traffic to reach your
instances.
Simplified Management:
Security Groups make it easier to manage network access policies for your EC2 instances

AWS Cloudwatch
Amazon CloudWatch monitors your Amazon Web Services (AWS) resources and the applications you
run on AWS in real time. You can use CloudWatch to collect and track metrics, which are variables
you can measure for your resources and applications.
The CloudWatch home page automatically displays metrics about every AWS service you use. You
can additionally create custom dashboards to display metrics about your custom applications, and
display custom collections of metrics that you choose.

You can create alarms that watch metrics and send notifications or automatically make changes to
the resources you are monitoring when a threshold is breached. For example, you can monitor the
CPU usage and disk reads and writes of your Amazon EC2 instances and then use that data to
determine whether you should launch additional instances to handle increased load. You can also
use this data to stop under-used instances to save money.

Here's a more detailed breakdown of the benefits:


1. Enhanced Visibility and Insights:
Real-time Monitoring:
CloudWatch provides real-time monitoring of metrics and logs, allowing you to track the
performance and health of your AWS resources and applications.
Custom Metrics:
You can monitor custom metrics generated by your applications and services, gaining deeper insights
into their performance and health.
Dashboards:
Create custom dashboards to visualize your metrics, alarms, and logs in a single place, providing a
unified view of your AWS environment.
Log Analysis:
CloudWatch Logs allows you to collect, monitor, and store log data from AWS services, applications,
and systems, enabling you to analyze logs for troubleshooting and auditing purposes.
Anomaly Detection:
CloudWatch can identify unusual patterns or anomalies in your metrics, helping you proactively
address potential issues before they impact your applications.

2. Improved Reliability and Performance:


Alarms and Notifications:
Set up alarms to monitor metrics and receive notifications when thresholds are breached, enabling
quick responses to changes in your applications.
Automated Actions:
Configure alarms to trigger automated actions, such as scaling resources or restarting services, when
thresholds are exceeded.
Resource Optimization:
By monitoring metrics and logs, you can identify bottlenecks and optimize resource utilization,
leading to improved performance and reduced costs

Cost Optimization and Efficiency:


Proactive Issue Detection:
Early detection of issues through monitoring and alarms can prevent costly downtime and resource
wastage.
Resource Optimization:
By monitoring resource utilization, you can identify opportunities to optimize your infrastructure and
reduce costs.
Compliance and Auditing:
CloudWatch Logs can help you maintain compliance by keeping a detailed record of activities in your
AWS environment, which can be archived and analyzed for auditing purposes.
Stream Metrics:
CloudWatch Metric Streams enables you to create continuous, near-real-time streams of metrics to a
destination of your choice, allowing you to power dashboards, alarms, and other tools that rely on
accurate and timely metric data

Amazon RDS
Amazon Relational Database Service (RDS) is a managed relational database service that simplifies
setting up, operating, and scaling relational databases in the AWS cloud, offering cost-efficient,
resizable capacity and managing common database administration tasks

RDS is a managed service, meaning AWS handles many of the complex tasks involved in running and
maintaining relational databases, such as provisioning, patching, backups, and recovery.

It supports various popular database engines, including MySQL, PostgreSQL, MariaDB, SQL Server,
Oracle, and Db2, as well as Amazon Aurora (a MySQL and PostgreSQL-compatible database).

RDS is designed to be used in the AWS cloud, allowing you to easily deploy and manage databases
without worrying about the underlying infrastructure

Amazon RDS is a managed database service. It's responsible for most management tasks. By
eliminating tedious manual processes, Amazon RDS frees you to focus on your application and your
users.

Amazon RDS provides the following principal advantages over database deployments that aren't fully
managed:
You can use database engines that you are already familiar with: IBM Db2, MariaDB, Microsoft SQL
Server, MySQL, Oracle Database, and PostgreSQL.
Amazon RDS manages backups, software patching, automatic failure detection, and recovery.

You can turn on automated backups, or manually create your own backup snapshots. You can use
these backups to restore a database. The Amazon RDS restore process works reliably and efficiently.
You can get high availability with a primary DB instance and a synchronous secondary DB instance
that you can fail over to when problems occur. You can also use read replicas to increase read
scaling.
In addition to the security in your database package, you can control access by using AWS Identity
and Access Management (IAM) to define users and permissions. You can also help protect your
databases by putting them in a virtual private cloud (VPC).

AWS Features:

As of Sunday, April 20, 2025, 11:51 AM IST, here are the key features of Amazon Relational Database
Service (RDS):

Amazon Relational Database Service (RDS) is a managed database service that makes it easy to set up,
operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity
while automating time-consuming administration tasks such as hardware provisioning, database setup,
patching, and backups. Here's a breakdown of its features:

1. Ease of Use and Management:

Easy Setup and Deployment: Launch a production-ready relational database in minutes using the AWS
Management Console, CLI, SDKs, or APIs.

Managed Administrative Tasks: AWS handles routine tasks like software patching, backups, provisioning,
and maintenance. You have optional control over patching schedules.

Database Parameter Groups: Provides granular control and fine-tuning of your database configurations.

Integration with AWS CloudFormation: Model, provision, and manage RDS resources alongside other
AWS infrastructure using CloudFormation templates.

Event Notifications: Receive email or SMS notifications for over 40 different database events via Amazon
SNS.
Configuration Governance: Integrates with AWS Config to record and audit configuration changes to your
DB instances and related resources for compliance and security.

2. Performance and Scalability:

Choice of Database Engines: Supports popular relational database engines: Amazon Aurora (MySQL and
PostgreSQL-compatible), PostgreSQL, MySQL, MariaDB, Oracle, and SQL Server.

Scalable Compute and Storage: Easily scale compute and storage resources up or down based on your
application's needs.

Storage Auto Scaling: Automatically increase storage capacity when needed, with no downtime.

Provisioned IOPS (IOPS): For high-performance OLTP applications requiring consistent I/O performance.

Read Replicas: Create read-only copies of your database to serve read traffic, improving performance
and availability. Cross-Region Read Replicas are also available for disaster recovery and global scaling.

Amazon Aurora: Offers significantly improved performance and availability compared to standard MySQL
and PostgreSQL. Features like Aurora Serverless provide on-demand, auto-scaling database capacity.

3. High Availability and Durability:

Multi-AZ Deployments: Enhance availability and durability for production database workloads by running
a synchronous standby instance in a different Availability Zone with automatic failover.

Automated Backups: Amazon RDS automatically backs up your database and retains them for a
user-specified period.

Point-in-Time Recovery: Restore your database to any point in time within your backup retention period.

Cross-Region Automated Backups: Replicate automated backups to another AWS Region for disaster
recovery.

4. Security and Compliance:

Network Isolation: Launch DB instances in an Amazon Virtual Private Cloud (VPC) for network isolation.

Encryption at Rest and in Transit: Encrypt data at rest using AWS Key Management Service (KMS) and
data in transit using SSL/TLS.

IAM Integration: Control access to your RDS resources using AWS Identity and Access Management (IAM)
roles and policies.

Security Groups: Control inbound and outbound traffic at the instance level.
Compliance: Meets numerous industry compliance standards.

IAM Database Authentication: Authenticate to your database instances using IAM users and roles instead
of password-based authentication.

AWS Secrets Manager Integration: Easily manage database credentials securely.

Amazon GuardDuty RDS Protection: Detect threats targeting your RDS databases.

5. Monitoring and Logging:

Amazon CloudWatch Integration: Monitor key operational metrics, including CPU utilization, memory
usage, storage capacity, I/O activity, and instance connections, at no additional charge.

Enhanced Monitoring: Provides access to over 50 CPU, memory, file system, and disk I/O metrics with
higher granularity.

Amazon RDS Performance Insights: An easy-to-use tool to quickly detect and troubleshoot database
performance problems.

Amazon CloudWatch Database Insights: Consolidates logs and metrics from your applications, databases,
and operating systems into a unified view for root-cause analysis and performance monitoring.

Database Activity Streams: Provides a near real-time stream of database activity for auditing and security
purposes.

Log File Access: View, download, and publish database log files to Amazon CloudWatch Logs.

6. Cost-Effectiveness:

Pay-as-you-go Pricing: Pay only for the resources you consume, with no long-term commitments for
On-Demand Instances.

Reserved Instances: Save on costs by committing to a consistent amount of usage over a 1 or 3-year
term.

Savings Plans: Offer even more flexibility and potential cost savings compared to Reserved Instances.

Stop/Start Instances: Easily stop and start database instances for development and test environments to
save costs.

Free Tier: Eligible new AWS customers can use certain RDS instance types for free for a limited period.

DB Engine
DB engine" stands for Database Engine, which is the underlying software component that
a database management system (DBMS) uses to create, read, update, and delete (CRUD) data in a
database. It's essentially the "core" that handles how data is stored, indexed, and retrieved.

Key Aspects of a DB Engine:


Storage Management: Manages how data is stored on disk or memory.
Query Processing: Interprets and executes SQL or other query languages.
Transaction Management: Ensures ACID properties (Atomicity, Consistency, Isolation, Durability).
Indexing: Optimizes data retrieval.
Concurrency Control: Manages access to data when multiple users are interacting with the
database.

DB Instance Class
An AWS DB Instance is a managed database instance running on Amazon RDS. Instead of managing
your own database servers, RDS handles much of the heavy lifting like backups, patching, scaling,
and replication — allowing you to focus on using the database, not running it.

The DB instance class determines the computation and memory capacity of an Amazon RDS DB
instance. The DB instance class that you need depends on your processing power and memory
requirements.
A DB instance class consists of both the DB instance class type and the size. For example, db.r6g is a
memory-optimized DB instance class type powered by AWS Graviton2 processors. Within the db.r6g
instance class type, db.r6g.2xlarge is a DB instance class. The size of this class is 2xlarge.

Amazon RDS Backup and Restore


By default, Amazon RDS creates and saves automated backups of your DB instance securely in
Amazon S3 for a user-specified retention period. In addition, you can create snapshots, which are
user-initiated backups of your instance that are kept until you explicitly delete them.
You can create a new instance from a database snapshots whenever you desire. Although database
snapshots serve operationally as full backups, you are billed only for incremental storage use

Automated Backups
Turned on by default, the automated backup feature of Amazon RDS will backup your databases and
transaction logs. Amazon RDS automatically creates a storage volume snapshot of your DB instance,
backing up the entire DB instance and not just individual databases.
This backup occurs during a daily user-configurable 30 minute period known as the backup window.
Automated backups are kept for a configurable number of days (called the backup retention period).
Your automatic backup retention period can be configured to up to thirty-five days.

Point-in-time Restores
You can restore your DB instance to any specific time during the backup retention period, creating a
new DB instance. To restore your database instance, you can use the AWS Console or Command Line
Interface.​

To determine the latest restorable time for a DB instance, use the AWS Console or Command Line
Interface to look at the value returned in the LatestRestorableTime field for the DB instance. The
latest restorable time for a DB instance is typically within 5 minutes of the current time.

Database Snapshots
Database snapshots are user-initiated backups of your instance stored in Amazon S3 that are kept
until you explicitly delete them. You can create a new instance from a database snapshots whenever
you desire. Although database snapshots serve operationally as full backups, you are billed only for
incremental storage use.

Snapshot Copies
With Amazon RDS, you can copy DB snapshots and DB cluster snapshots. You can copy automated or
manual snapshots. After you copy a snapshot, the copy is a manual snapshot. You can copy a
snapshot within the same AWS Region, you can copy a snapshot across AWS Regions, and you can
copy a snapshot across AWS accounts.

Snapshot Sharing
Using Amazon RDS, you can share a manual DB snapshot or DB cluster snapshot with other AWS
accounts. Sharing a manual DB snapshot or DB cluster snapshot, whether encrypted or unencrypted,
enables authorized AWS accounts to copy the snapshot.
Sharing an unencrypted manual DB snapshot enables authorized AWS accounts to directly restore a
DB instance from the snapshot instead of taking a copy of it and restoring from that. This isn't
supported for encrypted manual DB snapshots.
Sharing a manual DB cluster snapshot, whether encrypted or unencrypted, enables authorized AWS
accounts to directly restore a DB cluster from the snapshot instead of taking a copy of it and
restoring from that.

Non-Relational databases
A non-relational database is a database that does not use the tabular schema of rows and columns
found in most traditional database systems. Instead, non-relational databases use a storage model
that is optimized for the specific requirements of the type of data being stored

What all of these data stores have in common is that they don't use a relational model. Also, they
tend to be more specific in the type of data they support and how data can be queried. For example,
time series data stores are optimized for queries over time-based sequences of data. However, graph
data stores are optimized for exploring weighted relationships between entities. Neither format
would generalize well to the task of managing transactional data

The term NoSQL refers to data stores that do not use SQL for queries. Instead, the data stores use
other programming languages and constructs to query the data. In practice, "NoSQL" means
"non-relational database," even though many of these databases do support SQL-compatible
queries. However, the underlying query execution strategy is usually very different from the way a
traditional relational database management system (RDBMS) would execute the same SQL query.
Types of Non relational Databases
Document Store

A document data store manages a set of named string fields and object data values in an entity
that's referred to as a document. These data stores typically store data in the form of JSON
documents. Each field value could be a scalar item, such as a number, or a compound element, such
as a list or a parent-child collection. The data in the fields of a document can be encoded in various
ways, including XML, YAML, JSON, binary JSON (BSON), or even stored as plain text.

Typically, a document contains the entire data for an entity. What items constitute an entity are
application-specific. For example, an entity could contain the details of a customer, an order, or a
combination of both. A single document might contain information that would be spread across
several relational tables in a relational database management system (RDBMS). A document store
does not require that all documents have the same structure. This free-form approach provides a
great deal of flexibility.

The application can retrieve documents by using the document key. The key is a unique identifier for
the document, which is often hashed, to help distribute data evenly. Some document databases
create the document key automatically. Others enable you to specify an attribute of the document to
use as the key. The application can also query documents based on the value of one or more fields.
Some document databases support indexing to facilitate fast lookup of documents based on one or
more indexed fields.
Columnar data stores
A columnar or column-family data store organizes data into columns and rows. In its simplest form, a
column-family data store can appear very similar to a relational database, at least conceptually. The
real power of a column-family database lies in its denormalized approach to structuring sparse data,
which stems from the column-oriented approach to storing data.

You can think of a column-family data store as holding tabular data with rows and columns, but the
columns are divided into groups known as column families. Each column family holds a set of
columns that are logically related and are typically retrieved or manipulated as a unit. Other data
that is accessed separately can be stored in separate column families. Within a column family, new
columns can be added dynamically, and rows can be sparse (that is, a row doesn't need to have a
value for every column).
Unlike a key/value store or a document database, most column-family databases physically store
data in key order, rather than by computing a hash. The row key is considered the primary index and
enables key-based access via a specific key or a range of keys. Some implementations allow you to
create secondary indexes over specific columns in a column family. Secondary indexes let you
retrieve data by columns value, rather than row key.

Key value store


A key/value store is essentially a large hash table. You associate each data value with a unique key,
and the key/value store uses this key to store the data by using an appropriate hashing function. The
hashing function is selected to provide an even distribution of hashed keys across the data storage

Most key/value stores only support simple query, insert, and delete operations. To modify a value
(either partially or completely), an application must overwrite the existing data for the entire value.
In most implementations, reading or writing a single value is an atomic operation. If the value is
large, writing may take some time.

An application can store arbitrary data as a set of values, although some key/value stores impose
limits on the maximum size of values. The stored values are opaque to the storage system software.
Any schema information must be provided and interpreted by the application. Essentially, values are
blobs and the key/value store simply retrieves or stores the value by key
Graph data stores
A graph data store manages two types of information, nodes and edges. Nodes represent entities,
and edges specify the relationships between these entities. Both nodes and edges can have
properties that provide information about that node or edge, similar to columns in a table. Edges can
also have a direction indicating the nature of the relationship.

The purpose of a graph data store is to allow an application to efficiently perform queries that
traverse the network of nodes and edges, and to analyze the relationships between entities. The
following diagram shows an organization's personnel data structured as a graph. The entities are
employees and departments, and the edges indicate reporting relationships and the department in
which employees work. In this graph, the arrows on the edges show the direction of the
relationships.

This structure makes it straightforward to perform queries such as "Find all employees who report
directly or indirectly to Sarah" or "Who works in the same department as John?" For large graphs
with lots of entities and relationships, you can perform complex analyses quickly. Many graph
databases provide a query language that you can use to traverse a network of relationships
efficiently
AWS DynamoDB
Amazon DynamoDB is a cloud-native NoSQL primarily key-value database. Let’s define each of those
terms.
DynamoDB is cloud-native in that it does not run on-premises or even in a hybrid cloud; it only runs
on Amazon Web Services (AWS). This enables it to scale as needed without requiring a customer’s
capital investment in hardware. It also has attributes common to other cloud-native applications,
such as elastic infrastructure deployment (meaning that AWS will provision more servers in the
background as you request additional capacity).

DynamoDB is NoSQL in that it does not support ANSI Structured Query Language (SQL). Instead, it
uses a proprietary API based on JavaScript Object Notation (JSON). This API is generally not called
directly by user developers, but invoked through AWS Software Developer Kits (SDKs) for
DynamoDB written in various programming languages (C++, Go, Java, JavaScript, Microsoft .NET,
Node.js, PHP, Python and Ruby).

DynamoDB is primarily a key-value store in the sense that its data model consists of key-value pairs
in a schemaless, very large, non-relational table of rows (records). It does not support relational
database management systems (RDBMS) methods to join tables through foreign keys. It can also
support a document store data model using JavaScript Object Notation (JSON).
Features of DynamoDB

1. Fully Managed Service:


Amazon DynamoDB is a fully managed serverless database.
DynamoDB handles setup, infrastructure, maintenance, and scaling automatically.
2. NoSQL Data Model:
In DynamoDB, data can be stored and retrieved as key-value pairs, documents, or wide column
stores.
It is a fast and flexible NoSQL database service

3. Scalability:
DynamoDB is a fully managed serverless database that automatically scales to fit developer’s needs.
DynamoDB scales to zero.
It has no cold starts, no version upgrades, no maintenance, no downtime.
DynamoDB global tables is a multi-region, multi active database with 99.999% availability.
4.Consistency Models:
DynamoDB offers a two-consistency model:
Strongly Consistent Reads.
Eventually Consistent Reads.
It consistently handles more than 10 trillion request per day.
DynamoDB allowing developers to choose between strict consistency or high performance
depending on their requirements

5. Flexible Querying:
DynamoDB supports various querying methods, containing key-value lookups, range queries,
Secondary indexes which is optional and enabling efficient data retrieval based on different access
patterns.
6. Backups:
Point-in-time [PITR] recovery provides continuous backup and protect DynamoDB tables data from
accidental or delete operations

OnDemand backup and recover provides full backup of DynamoDB tables data.
Developer can backup data from a few megabytes to hundreds of terabytes of data.
The Backup and restore operation do not affect DynamoDB’s performance.

7. Integrated with AWS Ecosystem:


DynamoDB is a part of Amazon ecosystem.
The platform allows developers to build serverless applications, leverage infrastructure code, and
build real-time data-driven applications through integrations with other services such as AWS
Lambda, AWS CloudFormation, and AWS AppSync.
8. On-Demand and Provisioned Throughput:
Amazon DynamoDB has two pricing models, pricing for on-demand capacity mode and pricing for
provisioned capacity mode.
On-demand capacity mode: Simplified billing, pay only for the read and writes your application
performs.
Provisioned capacity mode: Manage and optimize the pricing by allocating reads and writes capacity
in advance.

Highly Available:
In a Region, DynamoDB replicates data across multiple availability zones, ensuring high availability.
DynamoDB automatically handles replication, recovery, reducing the possibility of data loss or
service disruption.
The availability SLA of DynamoDB is up to 99.999%.

Partition and Hash Keys


Amazon DynamoDB stores data in partitions. A partition is an allocation of storage for a table,
backed by solid state drives (SSDs) and automatically replicated across multiple Availability Zones
within an AWS Region. Partition management is handled entirely by DynamoDB—you never have to
manage partitions yourself

When you create a table, the initial status of the table is CREATING. During this phase, DynamoDB
allocates sufficient partitions to the table so that it can handle your provisioned throughput
requirements. You can begin writing and reading table data after the table status changes to ACTIVE.

DynamoDB allocates additional partitions to a table in the following situations:


If you increase the table's provisioned throughput settings beyond what the existing partitions can
support.
If an existing partition fills to capacity and more storage space is required.
Partition management occurs automatically in the background and is transparent to your
applications. Your table remains available throughout and fully supports your provisioned
throughput requirements.

Global secondary indexes in DynamoDB are also composed of partitions. The data in a global
secondary index is stored separately from the data in its base table, but index partitions behave in
much the same way as table partitions.

Data distribution: Partition key


If your table has a simple primary key (partition key only), DynamoDB stores and retrieves each item
based on its partition key value.
To write an item to the table, DynamoDB uses the value of the partition key as input to an internal
hash function. The output value from the hash function determines the partition in which the item
will be stored.

To read an item from the table, you must specify the partition key value for the item. DynamoDB
uses this value as input to its hash function, yielding the partition in which the item can be found.

The following diagram shows a table named Pets, which spans multiple partitions. The table's
primary key is AnimalType (only this key attribute is shown). DynamoDB uses its hash function to
determine where to store a new item, in this case based on the hash value of the string Dog. Note
that the items are not stored in sorted order. Each item's location is determined by the hash value of
its partition key.

If the table has a composite primary key (partition key and sort key), DynamoDB calculates the hash
value of the partition key in the same way as described in Data distribution: Partition key.

However, it tends to keep items which have the same value of partition key close together and in
sorted order by the sort key attribute's value. The set of items which have the same value of
partition key is called an item collection. Item collections are optimized for efficient retrieval of
ranges of the items within the collection.

If your table doesn't have local secondary indexes, DynamoDB will automatically split your item
collection over as many partitions as required to store the data and to serve read and write
throughput.

To write an item to the table, DynamoDB calculates the hash value of the partition key to determine
which partition should contain the item. In that partition, several items could have the same
partition key value. So DynamoDB stores the item among the others with the same partition key, in
ascending order by sort key.

To read an item from the table, you must specify its partition key value and sort key value.
DynamoDB calculates the partition key's hash value, yielding the partition in which the item can be
found.

You can read multiple items from the table in a single operation (Query) if the items you want have
the same partition key value. DynamoDB returns all of the items with that partition key value.
Optionally, you can apply a condition to the sort key so that it returns only the items within a certain
range of values.

Suppose that the Pets table has a composite primary key consisting of AnimalType(partition key)
and Name (sort key). The following diagram shows DynamoDB writing an item with a partition key
value of Dog and a sort key value of Fido.

To read that same item from the Pets table, DynamoDB calculates the hash value of Dog, yielding the
partition in which these items are stored. DynamoDB then scans the sort key attribute values until it
finds Fido.

To read all of the items with an AnimalType of Dog, you can issue a Query operation without
specifying a sort key condition. By default, the items are returned in the order that they are stored
(that is, in ascending order by sort key). Optionally, you can request descending order instead.
To query only some of the Dog items, you can apply a condition to the sort key (for example, only
the Dog items where Name begins with a letter that is within the range Athrough K).

You might also like