Managing and Visualisation of RDS Database Cloud Computing Project PDF
Managing and Visualisation of RDS Database Cloud Computing Project PDF
Consistency: Containerized databases are built to run consistently across different environments, which means that they
can be deployed more easily and with greater reliability than traditional databases.
Scalability: Containerization allows for easy horizontal scaling, meaning that you can easily add more instances of your
database to handle increased demand.
Portability: Containerized databases can be easily moved from one environment to another, making it easy to test and
deploy new versions of your application.
Isolation: By running the database in a container, you can ensure that it is isolated from the rest of the host system. This
makes it easier to manage dependencies and avoid conflicts.
Efficiency: Containers are lightweight and consume fewer resources than traditional virtual machines, making them more
efficient in terms of both memory and CPU usage.
Security: Containerization provides an additional layer of security by isolating the database from the rest of the system,
reducing the risk of security breaches.
PROBLEM DEFINITION
The primary aim of this project is to demonstrate the process of managing and visualizing data through the use of containers. The project
involves setting up a database to store information relevant to a computer store, such as customer lists, product lists, and order histories.
Two containers will be deployed to manage the database and visualize its contents.
To handle the database management, we will make use of a Database Management System (DBMS) tool. In this project, we have opted
for phpMyAdmin.
Data visualization tools are utilized to create visual representations of the stored data, such as charts and graphs. In this project,
Metabase, an open-source business intelligence platform, will be used for this purpose. Metabase can be embedded into any
application, enabling customers to explore their data independently.
The database will be deployed on Amazon RDS, a fully-managed database service provided by Amazon Web Services (AWS). Amazon
EC2 will be used to host the containers responsible for managing and visualizing the data.
To regulate access to the resources, a Virtual Private Cloud (VPC) will be set up to enable users to define a virtual network within AWS,
isolating resources and regulating network traffic. Within the VPC, we will establish a public subnet and a private subnet. Resources that
require public access, such as EC2 and Elastic File System (EFS) instances, will be placed in the public subnet. On the other hand,
resources that need to be accessible internally, such as the RDS instance, will be placed in the private subnet
LITERATURE SURVEY
The paper titled "Impact and Implications of Big Data Analytics" [1] discusses the potential of big data analytics in various industries and its
impact on society. It emphasizes the importance of leveraging big data to drive innovation and improve decision-making. The paper also
examines the challenges associated with big data analytics, such as data privacy concerns and the need for skilled professionals. Finally, the
paper recommends strategies for organizations to overcome these challenges and effectively utilize big data analytics to achieve their goals.
The paper "Performance Analysis of Various Server Hosting Techniques" [2] investigates and compares the performance of different server
hosting techniques, including AWS EC2, AWS ECS with Faragate, and AWS Lambda
The paper "Performance of Containerized Database Management Systems " [4] evaluates the performance of containerized database
management systems (DBMS) and compares it with the traditional non-containerized approach. The study uses three different container
orchestration tools, Docker Swarm, Kubernetes, and Apache Mesos, to deploy the DBMS containers. The results show that containerized DBMS
performs comparably with non-containerized DBMS and can provide better scalability and fault tolerance. The paper also provides insights
into the trade-offs between performance and resource utilization for different container orchestration tools.
The article [3] Deploying Relational Databases in AWS provides an overview of how to deploy relational databases in Amazon Web Services
(AWS). It discusses different AWS database services, such as Amazon RDS and Amazon Aurora, and their features. The article also covers the
basics of setting up and configuring a database instance in AWS, including choosing the right database engine and selecting the appropriate
database instance type. Overall, the article serves as a useful guide for those looking to deploy and manage relational databases in AWS.
ARCHITECTURE
METHODOLOGY
Creation of VPC and Subnets
A virtual private cloud (VPC) is a secure, isolated private cloud hosted within a public cloud. VPC
customers can run code, store data, host websites, and do anything else they could do in an
ordinary private cloud, but the private cloud is hosted remotely by a public cloud provider.
Within the VPC, we will create two public subnets. These subnets will be used to interact with the
tools installed in the containers, such as the web-based GUIs for managing the database.
To ensure high availability and fault tolerance, we will choose two availability zones (AZs) for the
RDS database. This means that the database will be replicated across two different physical
locations, providing redundancy in case of a failure in one of the AZs.
METHODOLOGY
Creation of RDS MySQL Database
Amazon Relational Database Service (Amazon RDS) is a collection of managed services that
makes it simple to set up, operate, and scale databases in the cloud.
Select the newly created VPC and choose "Deny public access" for connectivity to ensure the
database is in a private subnet and not publicly accessible.
Choose the default VPC Security Group and validate the creation
METHODOLOGY
Deploy ECS Cluster in EC2
An Amazon ECS cluster is a logical grouping of tasks or services. Your tasks and services ae
run on infrastructure that is registered to a cluster.
Linux + Networking Cluster is created which will contain the containers of phpMyAdmin
and Metabase tools.
For networking, we have linked the instance to the project VPC and selected the public
subnet 1a in the same AZ as the RDS database.
METHODOLOGY
Task Definition of phpMyAdmin
A Task Definition is an element containing all the information related to the container (the
image used, the size of the container, the environment variables, etc.). It is therefore
necessary to create a Task Definition specific to the container, and it will be possible to
deploy the container from it.
A task definition for phpMyAdmin on Amazon ECS specifies how a container running
phpMyAdmin should be run, including which Docker image to use, CPU and memory
requirements, network and storage settings, and any container-specific configurations. It
provides a blueprint for how the container should be deployed and managed within the
ECS environment.
With this definition, ECS can manage the container and ensure that it is running as
expected, making it easy to deploy and manage phpMyAdmin in the cloud.
METHODOLOGY
Task Definition of Metabase
A Task Definition is an element containing all the information related to the container (the
image used, the size of the container, the environment variables, etc.). It is therefore
necessary to create a Task Definition specific to the container, and it will be possible to
deploy the container from it.
A task definition for Metabase in Amazon ECS is a blueprint that defines how to run the
Metabase container in the ECS environment. It specifies the Docker image to be used, the
resources required by the container, how to handle networking and storage, and any
additional settings, such as environment variables or command overrides. Once the task
definition is defined, it can be used to launch one or more containers as tasks in a ECS
service.
METHODOLOGY
Addition of EFS to Metabase Container
Amazon Elastic File System (Amazon EFS) is a simple, serverless, set-and-forget, elastic file
system. There is no minimum fee or setup charge. You pay only for the storage you use, for
read and write access to data stored in Infrequent Access storage classes, and for any
provisioned throughput.
By adding EFS to a Metabase container, you can store and access Metabase's data files and
configuration files in a centralized and highly available location that can be shared across
multiple containers and instances.
This can help to simplify deployment, improve scalability, and increase resilience of your
Metabase deployment by ensuring that your data is always accessible, even if an individual
container or instance fails. With EFS, you can also easily scale your storage capacity up or
down as your data needs evolve, without having to manage the underlying infrastructure.
METHODOLOGY
Addition of EFS to Metabase Container
Amazon Elastic File System (Amazon EFS) is a simple, serverless, set-and-forget, elastic file
system. There is no minimum fee or setup charge. You pay only for the storage you use, for
read and write access to data stored in Infrequent Access storage classes, and for any
provisioned throughput.
By adding EFS to a Metabase container, you can store and access Metabase's data files and
configuration files in a centralized and highly available location that can be shared across
multiple containers and instances.
This can help to simplify deployment, improve scalability, and increase resilience of your
Metabase deployment by ensuring that your data is always accessible, even if an individual
container or instance fails. With EFS, you can also easily scale your storage capacity up or
down as your data needs evolve, without having to manage the underlying infrastructure.
METHODOLOGY
Database Structure in phpMyAdmin
METHODOLOGY
Customers Table
METHODOLOGY
Products Table
METHODOLOGY
Orders Table
METHODOLOGY
Creating dashboards with Metabase
SQL Query
SQL Query
BEST CUSTOMERS
SQL Query
SELECT CONCAT(customers.first_name,'
',customers.last_name) AS customer,
COUNT(*) AS purchases FROM
customers, orders WHERE
customers.id=orders.customer_id
GROUP BY customer ORDER BY
COUNT(*) DESC
METHODOLOGY
Creating dashboards with Metabase
SALES REVENUE
SQL Query