AWS1-1
AWS1-1
Relational database management systems (RDBMS) methods to join tables through foreign keys
Relational Database The two tables are related based on the shared customer ID, which
means you can query both tables to create formal reports or use the
data for other applications.
For instance, a retail branch manager could generate a report about all
customers who made a purchase on a specific date or figure out which
customers had orders that had a delayed delivery date in the last
month.
SQL is the most common language for extracting and organizing data that is stored
in a relational database.
It facilitates retrieving specific information from databases that are further used for
analysis.
Even when the analysis is being done on another platform like Python or R, SQL
would be needed to extract the data that you need from a company’s database.
Uses of SQL
•Execute queries against a database
•Retrieve data from a database
•Insert records into a database
•Update records in a database
•Delete records from a database
•Create new databases, or new tables in a database
•Create stored procedures & views in a database
•Set permissions on tables, procedures, and views
Non-Relational database (NoSQL)
•They offer scalability when dealing with large volumes of data and high load
factors. They were designed when data was expected to be partitioned across
multiple machines to scale, in contrast to relational databases, which assumed
the data would stay on a single machine.
The benefits of a non-relational database
•Scalability: Non-relational databases are designed to horizontally scale across clusters of cheap commodity hardware, offering
seamless scalability as data volumes and user loads increase.
•Flexibility in Data Models: Unlike rigid table-based structures in relational databases, non-relational databases support flexible
data models like document stores (e.g., JSON in MongoDB), key-value pairs (e.g., Redis), and wide-column stores (e.g.,
Cassandra), making it easier to store and manage unstructured or semi-structured data.
•Performance: Non-relational databases are optimized for specific use cases such as real-time data ingestion, high-speed
transactions, and rapid access to large volumes of data. They often outperform relational databases in these scenarios due to
their distributed architecture and optimized data storage formats.
•Schemaless Design: Non-relational databases typically do not enforce a rigid schema, allowing developers to evolve the data
structure over time without downtime or complex migrations. This advantage is particularly beneficial in agile development
environments and for handling diverse and unpredictable data types.
•High Availability and Fault Tolerance: Many non-relational databases are designed with built-in replication and automatic
failover capabilities, ensuring high availability and data redundancy. This makes them suitable for mission-critical applications
where continuous uptime is essential.
•Cost-Effectiveness: By using commodity hardware and open-source software, non-relational databases often provide a more
cost-effective solution compared to traditional relational databases, especially at scale.
https://docs.aws.amazon.com/amazondynamodb/latest/devel
operguide/HowItWorks.CoreComponents.html
The following are the basic DynamoDB components:
•Tables – Similar to other database systems, DynamoDB stores data in tables. A table is a collection of
data. For example, table called People that you could use to store personal contact information about
friends, family, or anyone else of interest.
•Items – Each table contains zero or more items. An item is a group of attributes that is uniquely
identifiable among all of the other items. In a People table, each item represents a person. For a Cars
table, each item represents one vehicle. Items in DynamoDB are similar in many ways to rows, records, or
tuples in other database systems. In DynamoDB, there is no limit to the number of items you can store in
a table.
•Attributes – Each item is composed of one or more attributes. An attribute is a fundamental data
element, something that does not need to be broken down any further. For example, an item in a People
table contains attributes called PersonID, LastName, FirstName, and so on. For a Department table, an
item might have attributes such as DepartmentID, Name, Manager, and so on. Attributes in DynamoDB
are similar in many ways to fields or columns in other database systems.
The BASE Model
The rise of NoSQL databases provided a flexible and fluid way to manipulate data. As a result, a new
database model was designed, reflecting these properties.
The acronym BASE is slightly more confusing than ACID. However, the words behind it suggest ways in
which the BASE model is different
Marketing and customer service companies who deal with sentiment analysis will prefer the elasticity of
BASE when conducting their social network research. Social network feeds are not well structured but
contain huge amounts of data which a BASE-modeled database can easily store.
Just as SQL databases are almost uniformly ACID compliant, NoSQL databases tend to conform to BASE
principles. MongoDB, Amazon DynamoDB, Cassandra and Redis are among the most popular NoSQL
solutions
Other forms of NoSQL Databases :Document-
oriented databases
Wide-column stores store data in tables, rows, and dynamic columns. The data is stored in tables.
However, unlike traditional SQL databases, wide-column stores are flexible, where different rows can have
different sets of columns.
These databases can employ column compression techniques to reduce the storage space and enhance
performance.
The wide rows and columns enable efficient retrieval of sparse and wide data. Some examples of wide-
column stores are Apache Cassandra and HBase. A typical example of how data is stored in a wide-column is
as follows:
Image Storage Strategies in NoSQL Databases:
DynamodB has 400kB limit for object size. Larger images can be stored in AWS S3 Buckets.
•GridFS efficiently handles files that exceed MongoDB’s 16MB limit for BSON documents. It splits files into
chunks (default 255KB) and stores them in fs.chunks. Metadata is stored in fs.files, enabling easy
reconstruction.
•Performance Considerations: GridFS is ideal for scenarios where reading and writing large files is sporadic. It
offers the benefit of MongoDB’s scalability and data distribution features, making it suitable for applications that
need to store large media files or backups.
Amazon Bedrock
Use case of GenAI powered by AWS Bedrock
https://www.youtube.com/watch?v=ecLCFxDlVNI
What is Amazon VPC?
With Amazon Virtual Private Cloud (Amazon VPC), you can launch AWS resources in a logically isolated virtual
network that you've defined. This virtual network closely resembles a traditional network that you'd operate in
your own data center, with the benefits of using the scalable infrastructure of AWS.
The following diagram shows an example VPC. The VPC has one subnet in each of the Availability Zones in the
Region, EC2 instances in each subnet, and an internet gateway to allow communication between the resources in
your VPC and the internet.
Features of Virtual private clouds (VPC)
closely resembles a traditional network that you'd operate in your own data center.
Subnets
A subnet is a range of IP addresses in your VPC. A subnet must reside in a single Availability Zone. After you
add subnets, you can deploy AWS resources in your VPC.
IP addressing
You can assign IP addresses,
Routing
Use route tables to determine where network traffic from your subnet or gateway is directed.
Gateways and endpoints
A gateway connects your VPC to another network. For example, use an internet gateway to connect your
VPC to the internet. Use a VPC endpoint to connect to AWS services privately, without the use of an internet
gateway.
Peering connections
Use a VPC peering connection to route traffic between the resources in two VPCs.
Transit gateways
Use a transit gateway, which acts as a central hub, to route traffic between your VPCs,
VPN connections
Connect your VPCs to your on-premises networks using AWS Virtual Private Network (AWS VPN).
Amazon S3
Amazon S3 is an object storage service that stores data as objects within buckets. An
object is a file and any metadata that describes the file. A bucket is a container for objects.
To store your data in Amazon S3, you first create a bucket and specify a bucket name and
AWS Region. Then, you upload your data to that bucket as objects in Amazon S3.
Each object has a key (or key name), which is the unique identifier for the object within the
bucket.
S3 provides features that you can configure to support your specific use case. For example,
you can use S3 Versioning to keep multiple versions of an object in the same bucket, which
allows you to restore objects that are accidentally deleted or overwritten.
Buckets and the objects in them are private and can be accessed only if you explicitly grant
access permissions. You can use bucket policies, AWS Identity and Access Management
(IAM) policies, access control lists (ACLs), and S3 Access Points to manage access.
Amazon S3
Buckets
A bucket is a container for objects stored in Amazon S3. You can store any number of objects in a
bucket.
Every object is contained in a bucket. For example, if the object named photos/puppy.jpg is stored in
the amzn-s3-demo-bucket bucket in the US West (Oregon) Region, then it is addressable by using the
URL https://amzn-s3-demo-bucket.s3.us-west-2.amazonaws.com/photos/puppy.jpg.
Objects
Objects are the fundamental entities stored in Amazon S3. Objects consist of object data and metadata. The
metadata is a set of name-value pairs that describe the object.
These pairs include some default metadata, such as the date last modified and Content-Type.
An object is uniquely identified within a bucket by a key (name) and a version ID (if S3 Versioning is enabled on
the bucket).
Amazon S3
Key
An object key (or key name) is the unique identifier for an object within a bucket. Every
object in a bucket has exactly one key.
The combination of a bucket, object key, and optionally, version ID (if S3 Versioning is
enabled for the bucket) uniquely identify each object.
Every object in Amazon S3 can be uniquely addressed through the combination of the web
service endpoint, bucket name, key, and optionally, a version.
• Using Amazon EC2 reduces hardware costs so you can develop and deploy applications faster. You
can use Amazon EC2 to launch as many or as few virtual servers as you need, configure security
and networking, and manage storage.
• You can add capacity (scale up) to handle compute-heavy tasks, such as monthly or yearly
processes, or spikes in website traffic. When usage decreases, you can reduce capacity (scale
down) again.
• An EC2 instance is a virtual server in the AWS Cloud. When you launch an EC2 instance, the
instance type that you specify determines the hardware available to your instance.