0% found this document useful (0 votes)
15 views

AI-ML internship1115(1)

Uploaded by

abbinalavanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

AI-ML internship1115(1)

Uploaded by

abbinalavanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

INTERNSHIP REPORT

ON
AI-ML Virtual Internship

Submitted by

NAME: D.Vara Lakshmi

REG. NO.: 21K61A0618

Submitted to

DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY

SASI INSTITUTE OF TECHNOLOGY& ENGINEERING


(Approved by AICTE, New Delhi, Permanently Affiliated to JNTUK, Kakinada and SBTET-
Hyderabad, Accredited by NAAC with ‘A’ Grade, Ranked as "A" Grade by Govt. of A.P.,
Recognized by UGC 2(f) & 12(B)) Kadakatla, TADEPALLIGUDEM– 534101.

Academic Year 2023-24


DECLARATION

I, D.Vara Lakshmi 21K61A0618 , student of Computer science and technology at Sasi Institute
of Technology & Engineering, Tadepalligudem hereby declare that the Summer Training Report
entitled “AI-MLVirtual Internship" is an authentic record of my own work as requirements of
Industrial Training during the may 2023 to july 2023. I obtained the knowledge of
AI-ML through the selfless efforts of the Employee arranged to me by administration. A
Training Report was made on the same and the suggestions are given by the faculty were duly
incorporated

D.VARA LAKSHMI

21K61A0618

INCHARGE HEADOFDEPARTMENT

INTERNALEXAMINER EXTERNALEXAMINER
ABSTRACT

ABSTRACT
Advances in artificial intelligence and machine learning (Al-ML) algorithms are not only the
Fastest growing areas but also provide endless possibilities in many different science and
Engineering disciplines including computer communication networks.These technologies
are used by billions of people.
Advances in artificial
Machine intelligence
learning is an application ofand machine
AI.It's learning
the process of using (Al-ML) algorithms
mathematical models ofare not
data to help a computer learning without direct instruction.This enables a computer
only thesystem
fastest growinglearning
to continue areas and
butimproving
also provide on it'sendless
own,based possibilities in many different
on experience.
Machine learning is a vast area of reasearch that is primarily concerned with finding
sciencepatterns in empirical data.we
and engineering restrict our
disciplines attention computer
including to a limited number of core concepts
communication networks.
that are most relevant for quantum learning algorithms.
These technologies are used by billions of people. Machine learning is an application

of AI. It is the process of using the mathematical models of data to help a computer

learning without direct instruction. This enables the computer system to continue

learning and improving on its own, based on the experience. Machine learning is a

vast area of research that is primarily concerned with finding patterns in empirical

data. So we restrict our attention to a limited number of core concepts that are most

relevant for quantum learning algorithms.


16

18

List of contents

Topic Pg no

Chapter 1:Company Profile


1.1: Profile 1

1.1.1:Vision 1

1.1.2:Mission 2

1.1.3:Objectives 2

1.2: Eduskills Offered 2

1.2.1:CourseOffered 2

1.2.2:Howtoapply? 3

Chapter2: CloudConcepts Overview


4
2.1:Introductiontocloudcomputing
5
2.1.1:Advantages ofthe Cloud
6
2.1.2:IntroductiontoAWS 7
2.1.3 MovingtotheAWSCloud

Chapter 3: Cloud Economics and billing


8
3.1:Introduction
3.2:Keyaspectsofcloudeconomics 8

3.2.1:TotalCostofOwnership(TCO) 8
3.3:Casestudy 9
3.4: Awsorganization 9
3.5:Awsbillingandcostmanagement 10
3.6: Building dashboard 11
Chapter4: Awsglobalinfrastructure
4.1:Introduction 13

4.2:AWSServices and ServiceCategories


14
Chapter5:AWSCloudSecurity
16
5.1:AWSSharedResponsibility Model
5.2:NetworkingContent Delivery andNetworking 18
ii Basics
Chapter6:Compute
6.1: Cloudcomputeservices 23
6.2:AmazonEC2 23
6.3: Costoptimization 25
6.4:AWSLambda
26
Chapter7:Storage
7.1: AWS EBS 28

7.2:AmazonS3Glacier 31

Chapter8:DataBases
8.1:AmazonRDS 33
8.2:AmazonDynamoDB 34
8.3:Amazon RedShift 35
8.4:AmazonAURORA 36

Chapter9:CloudArchitecture
9.1:IntroductiontoAWSArchitecture 38
9.2: AWSWell-ArchitectedFrameworkDesignPrinciples 38

9.3:OperationalExcellence 38

9.4:Security 40

9.5:Reliable 40
9.6: CostOptimization 42

Chapter10: AutoScaling and Monitor


10.1:Introduction 44
10.2-ElasticLoadBalancing 44

10.3:AWSEC2AutoScaling 44

Chapter11:IntroductiontoMachineLearning

11.1:What isMachine Learning? 46


11.2: Businessproblems solvedwith machinelearning
46
11.3:Machinelearningprocess
47
11.4:MachineLearningtoolsoverview 47
11.5:MachineLearningchallenges 47
Chapter12:ImplementationofMachineLearning
12.1:FormulatingMachineLearningProblems 49

12.2:CollectingandSecuringdata 50

12.2.1:Extracting,transforming,andloadingdata
50
12.2.2: SecuringData 50
12.3:Evaluatingdata 51

12.3.1: Describingyourdata 51

12.3.2:FindingCorrelations
52
12.4:Featureengineering 52
12.4.1:CleaningyourData 52

12.4.2: Dealing withoutliers andselecting features 52


12.5: Evaluatingthe accuracyofthemodel 53
12.5.1:CalculatingClassification metrics 53

12.5.2:Selecting Classificationthresholds
54

Chapter13:IntroducingForecasting
54
13.1: Forecastingoverview
54
13.2: Processingtimeseriesdata 54
13.2.1:Special considerationsfortimeseriesdata 54

Chapter 14: Introduction to Computer Vision


14.1 : Introduction to Computer Vision 57
14.2 : Image and video analysis 56
14.2.1 :Facial Recognition 56
14.3: Preparing custom datasets for computer vision 56
14.3.1 : Creating the training dataset 57
14.3.2 : Creating the test dataset 57
14.3.3 : Evaluate and improve your model 58
Chapter 15:Introducing NLP
15.1 : Overview of natural language processing
60
15.2 : Natural language processing managed services
60

Conclusion 62

Reference 63
List of Figures

Sl.No Fig.No Name of the figure Page No

1. 1.1 Logo 1

2. 2.1.1 Application Infrastructure Storage 4

3. 2.1.1 Cloud Computing 5

4. 2.1.4 AWS Cloud Computing 7

5. 3.4 AWS Organisation 10

6. 4.1 AWS Infrastructure 13

7. 5.1 AWS IAM 17

8. 5.2.1 AWS Networking 19

9. 5.2.1 VPC Networking 19

10. 6.3 Amazon EC2 25

11. 6.4 AWS Lambda 26

12. 7.1 AWS EBS 28

13. 8.1 Amazon RDS 33

14. 8.2 Amazon DynamoDB 34

15. 8.3 Amazon Red Shift 35

16. 8.4 Amazon Aurora 36

17. 9.2 AWS pillars 39

18. 9.6 Cloud Architecture 43

vii
19. 10.2 AWS EC2 Auto Scaling 45

20. 11.1 Machine Learning 46

21 11.5 48
Types of Machine Learning

22 14.33 Natural language processing managed 61


services

viii
CHAPTER 1 Eduskills Profile

Eduskills, or education skills, are becoming increasingly important in the job market as
employers seek individuals who possess not only technical skills but also critical thinking,
problem-solving, and communication skills.
In today's rapidly changing world, it is essential that we develop our eduskills to stay
competitive and adapt to new challenges. Whether you are a student, a working professional, or
someone looking to enhance your skill set, eduskills can help you achieve your goals and
unlock new opportunities. So let's explore what eduskills are, why they are important, and how
Eduskills AICTE can help you develop them.

1.1 :Profile

Fig 1: Logo
Eduskills AICTE is a national-level initiative aimed at promoting eduskills or education
skills in India. The program was launched by the All India Council for Technical Education
(AICTE) with the goal of bridging the gap between academic knowledge and practical skills
required by the industry.
The program offers a wide range of skill development courses, internships, and
certification programs that are designed to equip students with the necessary skills to succeed
in today's job market. Eduskills AICTE has been successful in promoting skill-based education
across the country, with over 1 lakh students enrolled in its various programs.

1.1.1 :Vision
At Eduskills AICTE, they are committed to shaping the future of education in India and
beyond. Their vision is to create a world where every individual has access to quality education
that empowers them to reach their full potential.
They believe that education is the key to unlocking human potential and driving social
and economic progress. By leveraging technology and innovation, we aim to transform the way

1
people learn and teach, making education more accessible, affordable, and effective than ever
before.

1.1.2 :Mission
Their mission at Edu skills AICTE is to provide accessible and affordable education to
everyone, regardless of their background or location. They believe that education is a
fundamental right and that everyone should have the opportunity to learn and grow.
They do this by partnering with top universities and educators around the world to offer
high quality courses and programs online. Their platform allows students to learn at their own
pace and on their own schedule, making education more flexible and convenient than ever
before.

1.1.3 :Objectives
Our objectives at Edu skills AICTE are centered around providing quality education
and training to individuals, businesses, and organizations. We aim to equip our students with
the skills and knowledge necessary for success in their chosen fields.
To achieve this, we have set specific and measurable targets such as increasing enrollment
numbers by 20% within the next year, expanding our course offerings to include emerging
technologies, and partnering with industry leaders to provide hands-on learning experiences.

1.2 : Eduskills Offered

1.2.1 : Courses Offered


AICTE Edu Skills offers a wide range of courses that cater to the needs of students and
professionals alike. Our courses are designed to provide world-class skill training and
Internship Report On Data Analytics Department Of Electronics and Communication
Engineering 3 development, and are taught by expert faculty members with years of experience
in their respective fields.
Our courses cover a variety of subjects, including engineering, management, finance,
technology, and more. Whether you're looking to enhance your skills in a specific area or
broaden your knowledge base, we have a course that's right for you.
At AICTE Edu Skills, we pride ourselves on having a team of expert faculty members
who bring a wealth of knowledge and experience to the classroom. Our faculty members are
not only highly qualified in their respective fields, but also have years of practical experience
working in industry.
This combination of academic rigor and real-world experience ensures that our students receive
a well-rounded education that prepares them for success in their chosen careers. Our faculty
members are passionate about teaching and are dedicated to helping our students achieve their
full potential.

2
AICTE EduSkills is proud to have formed strong partnerships with some of the most
reputable companies in various industries. These partnerships allow us to provide our students
with access to cutting-edge technology and real-world experience, giving them an edge in their
future careers.
Through these partnerships, our students also have the opportunity to engage with industry
leaders and gain valuable insights into the latest trends and best practices. This helps them stay
ahead of the curve and be better prepared for the challenges of the ever-evolving job market.
AICTE EduSkills offers a wide range of courses that are designed to prepare students
for successful careers in various industries. Our courses are taught by expert faculty members
who have years of experience in their respective fields, and are updated regularly to reflect the
latest industry trends and best practices. By enrolling in one of our courses, students can gain
the skills and knowledge they need to succeed in their chosen profession.
Our courses are designed to be flexible and adaptable, so that students can tailor their
education to their specific needs and interests. Whether you're looking to start a new career or
advance in your current one, AICTE EduSkills has the training programs you need to achieve
your professional goals.

1.2.2 : How To Apply?


To be eligible for the AICTE Eduskills Internship program, applicants must meet
certain criteria. First and foremost, applicants must be enrolled in a recognized educational
institution. This means that students who have completed their studies are not eligible to apply.
Additionally, applicants must have a minimum cumulative grade point average of 7.5 on a 10-
point scale.
In addition to these academic requirements, applicants must also demonstrate
proficiency in English language skills. This includes both written and verbal communication
skills, as well as the ability to understand and follow instructions. Finally, applicants must
show a strong interest in their chosen field of study and a desire to learn and grow as
professionals.

To apply for the AICTE Eduskills Internship program, follow these simple steps:
Step 1: Visit the official website and create an account by providing your personal details,
educational qualifications, and contact information.
Step 2: Once you have created an account, log in and fill out the application form. Make sure to
provide accurate and complete information, as incomplete or incorrect applications will not be
considered.
Step 3: Upload all the required documents, including your resume, academic transcripts, and any
other relevant certificates or awards.
Step 4: Submit your application before the deadline. Late applications will not be accepted. It
is important to follow these steps carefully to ensure that your application is considered for the
AICTE Eduskills Internship program.
3
CHAPTER 2 CLOUD CONCEPTS OVERVIEW

2.1 :Introduction to cloud computing :

Cloud Computing is the delivery of computing services such as servers, storage,


databases, networking, software, analytics, intelligence, and more, over the Cloud (Internet).

Fig 2:Application Infrastructure Storage

Cloud Computing provides an alternative to the on-premises datacentre. With an on-premises


datacentre, we have to manage everything, such as purchasing and installing hardware,
virtualization, installing the operating system, and any other required applications, setting up the
network, configuring the firewall, and setting up storage for data. After doing all the setup, we
become responsible for maintaining it through its entire lifecycle.

But if we choose Cloud Computing, a cloud vendor is responsible for the hardware purchase and
maintenance. They also provide a wide variety of software and platform as a service. We can
take any required services on rent. The cloud computing services will be charged based on usage.

4
Fig 3: Cloud Computing

The cloud environment provides an easily accessible online portal that makes handy for the user
to manage the compute, storage, network, and application resources. Some cloud service
providers are in the following figure.

2.1.1 :Advantages of cloud computing

Cost: It reduces the huge capital costs of buying hardware and software.

Speed: Resources can be accessed in minutes, typically within a few clicks.

Scalability: We can increase or decrease the requirement of resources according to the business
requirements.

Productivity: While using cloud computing, we put less operational effort. We do not need to apply
patching, as well as no need to maintain hardware and software. So, in this way, the IT team can be more
productive and focus on achieving business goals.

Reliability: Backup and recovery of data are less expensive and very fast for business continuity.

Security: Many cloud vendors offer a broad set of policies, technologies, and controls that strengthen our
data security.

2.1.2-Introduction to AWS :
Amazon Web Services (AWS), a subsidiary of Amazon.com, has invested billions of dollars in
IT resources distributed across the globe. These resources are shared among all the AWS account
holders across the globe. These account themselves are entirely isolated from each other. AWS
provides on-demand IT resources to its account holders on a pay-as-you-go pricing model with
no upfront cost. Amazon Web services offers flexibility because you can only pay for services
you use or you need. Enterprises use AWS to reduce capital expenditure of building their own
private IT infrastructure (which can be expensive depending upon the enterprise’s size and
nature). AWS has its own Physical fiber network that connects with Availability zones, regions
5
and Edge locations. All the maintenance cost is also bared by the AWS that saves a fortune for
the enterprises.
Security of cloud is the responsibility of AWS but Security in the cloud is Customer’s
Responsibility. The Performance efficiency in the cloud has four main areas:-

• Selection
• Review
• Monitoring
• Tradeoff

Advantages of AWS :
• allows you to easily scale your resources up or down as your needs change, helping
you to save money and ensure that your application always has the resources it needs.
• AWS provides a highly reliable and secure infrastructure, with multiple data centers
and a commitment to 99.99% availability for many of its services.
• AWS offers a wide range of services and tools that can be easily combined to build
and deploy a variety of applications, making it highly flexible.
• AWS offers a pay-as-you-go pricing model, allowing you to only pay for the resources
you actually use and avoid upfront costs and long-term commitments.

AWS Cloud Computing Models :


There are three cloud computing models available on AWS.
1. Infrastructure as a Service (IaaS): It is the basic building block of cloud IT. It generally
provides access to data storage space, networking features, and computer hardware(virtual or
dedicated hardware). It is highly flexible and gives management controls over the IT
resources to the developer. For example, VPC, EC2, EBS.
2. Platform as a Service (PaaS): This is a type of service where AWS manages the underlying
infrastructure (usually operating system and hardware). This helps the developer to be more
efficient as they do not have to worry about undifferentiated heavy lifting required for running
the applications such as capacity planning, software maintenance, resource procurement,
patching, etc., and focus more on deployment and management of the applications. For
example, RDS, EMR, ElasticSearch.
3. Software as a Service(SaaS): It is a complete product that usually runs on a browser. It
primarily refers to end-user applications. It is run and managed by the service provider.
The end-user only has to worry about the application of the software suitable to its needs.
For example, Saleforce.com, Web-based email, Office 365

6
As you may already know, AWS stands for Amazon Web Services, and it's a cloud
computing platform that offers a wide range of services to help businesses run more efficiently
and effectively. In this presentation, we'll be exploring the many benefits of AWS, including its
scalability, flexibility, and cost-effectiveness. But first, let's take a closer look at what cloud
computing is and how AWS fits into this model. Cloud computing is a revolutionary technology
that has changed the way we store and access data. Instead of relying on physical hardware, cloud
computing allows us to use remote servers to store and process information. This means that we
can access our data from anywhere in the world, as long as we have an internet connection.

AWS is a leading provider of cloud computing services, offering a wide range of tools
and solutions for businesses of all sizes. With AWS, you can easily scale your infrastructure up
or down as needed, without having to worry about the costs and complexities of managing
physical hardware. Whether you're a small startup or a large enterprise, AWS has the tools you
need to succeed in the cloud.

AWS offers a wide range of services that can be used to build and deploy applications in
the cloud. These services include compute, storage, database, and networking. Compute services
like Amazon EC2 allow you to run virtual servers in the cloud, while storage services like
Amazon S3 provide scalable and durable object storage. Database services like Amazon RDS
offer managed relational databases, and networking services like Amazon VPC let you create
isolated virtual networks.

AWS also provides a number of specialized services for specific use cases, such as
machine learning, analytics, and IoT. For example, Amazon SageMaker is a fully-managed
service that makes it easy to build, train, and deploy machine learning models at scale. Amazon
Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to
analyze all your data using standard SQL and existing BI tools. And AWS IoT provides a platform
for connecting devices to the cloud and building IoT applications.

2.1.3:Moving to the AWS Cloud

Migrating to AWS can be a complex process that requires careful planning and execution.
It is important to assess your current infrastructure and identify which applications and services
are suitable for migration. Proper planning can help minimize downtime and ensure a smooth
transition to the cloud.
During the execution phase, it is important to closely monitor the migration process and
make any necessary adjustments. Post-migration tasks include testing and optimizing your new
infrastructure to ensure it meets your business needs. A successful migration to AWS can result
in increased scalability, flexibility, and cost-effectiveness for your organization.

7
Fig4:AWS Cloud Computing

CHAPTER-3 Cloud economics and billing

3.1-Introduction :
Cloud computing provides organizations with numerous benefits. These include
additional security of resources, scalable infrastructure, agility, and more. However, these
benefits come at a cost.

Cloud economics establishes the cost-benefit situation of an organization upon building


resources on the cloud. You pay for storage, backup, networking, load balancing, security, and
more with the cloud. In addition, you need the IT capability to architect the cloud properly. By
analyzing these facets, IT leaders can know whether the organization stands to leverage the
advantages of cloud computing.

Since cloud economics helps businesses determine if cloud computing is right for them, it’s
essential to take before getting on with migration aspects

3.2-Key aspects of cloud economics


Cloud economics deals with financial-related aspects such as returns and costs. Some of the
critical aspects of cloud economics include:

3.2.1 Total Cost of Ownership (TCO)


The total cost of ownership (TCO) is the cost incurred in cloud planning, migrating, architecting,
and operating the cloud infrastructure. It helps you understand how much your business will
incur following adopting a cloud model.

TCO defines all the direct and indirect costs involved. These include data centers, maintenance
and support, development, business continuity and disaster recovery, network, and more. This

8
analysis compares the cost of on-premise infrastructure with the cost of cloud computing,
enabling a business to make the right impact.

Businesses also learn about opportunity costs through TCO. The main aim is to attain a lower
TCO than when operating on-premise. A business can either pause migration efforts, pay the
extra costs if it wants to achieve other goals, or migrate in phases.

On-demand :
On-demand pricing is a major factor to consider when considering cloud migration. With
on-premise computing, you buy a fixed capacity that you own.

The fixed capacity charges, however, change when you migrate to the cloud and choose
ondemand pricing. Costs become elastic and can quickly spiral out of control if you don’t
regularly monitored or control them.

Cost fluctuations resulting from the pricing as you go model can cost you lots of money.
Therefore, you need a cost management tool to help you detect any anomalies.

3.3-Case study :
Amazon.com is the world’s largest online retailer. In 2011, Amazon.com switched from
tape backup to using Amazon Simple Storage Service (Amazon S3) for backing up the majority
of its Oracle databases. This strategy reduces complexity and capital expenditures, provides
faster backup and restore performance, eliminates tape capacity planning for backup and archive,
and frees up administrative staff for higher value operations. The company was able to replace
their backup tape infrastructure with cloud-based Amazon S3 storage, eliminate backup software,
and experienced a 12X performance improvement, reducing restore time from around 15 hours
to 2.5 hours in select scenarios

3.4-Aws organization :
This service is designed to simplify your life by providing a centralized management
system for multiple AWS accounts. With AWS Organizations, you can easily manage and govern
your AWS resources across all your accounts from a single location.

But that's not all - AWS Organizations also enables consolidated billing, fine-grained
access control, and service control policies. These features provide cost-saving advantages,

9
enhanced security, and centralized policy management. Let's dive deeper into each of these
benefits and see how they can help your organization achieve its goals. AWS Organizations
enables centralized management of multiple AWS accounts, providing a single pane of glass to
manage all of your resources.

This makes it easy to apply policies and monitor usage across your entire organization,
ensuring compliance and reducing the risk of security breaches. With AWS Organizations, you
can also create hierarchies of accounts, allowing for more granular control over access and
resources. This means that you can delegate management responsibilities to specific teams or
individuals, while still maintaining overall control and visibility. AWS Organizations enables
consolidated billing for multiple AWS accounts, allowing organizations to combine usage and
create a single, detailed bill that covers all their accounts.

This can help streamline the billing process, simplify cost allocation, and save time and
resources. In addition to simplifying billing, consolidated billing can also result in significant
cost savings. By combining usage across multiple accounts, organizations can benefit from
volume discounts, reduced data transfer fees, and lower per-unit pricing for certain services.
AWS Organizations provides a centralized and streamlined approach to managing multiple AWS
accounts.

Fig5:AWS Organisations

3.5-Aws billing and cost management :


The AWS Billing console contains features to pay your AWS bills and report your AWS cost and
usage. You can also use the AWS Billing console to manage your consolidated billing if you're a
part of AWS Organizations. Amazon Web Services automatically charges the credit card that you
provided when you sign up for an AWS account. You can view or update your credit card
information at any time, including designating a different credit card for AWS to charge. You can
do this on the Payment Methods page in the Billing console. For more details on the Billing
features available, see Features of AWS Billing console.

The AWS Cost Management console has features that you can use for budgeting and forecasting
costs and methods for you to optimize your pricing to reduce your overall AWS bill.

10
The AWS Cost Management console is integrated closely with the Billing console. Using both
together, you can manage your costs in a holistic manner. You can use Billing console resources
to manage your ongoing payments, and AWS Cost Management console resources to optimize
your future costs. For information about AWS resources to understand, pay, or organize your
AWS bills, see the AWS Billing User Guide.

3.6-Billing dashboard :
You can use the dashboard page of the AWS Billing console to gain a general view of your
AWS spending. You can also use it to identify your highest cost service or Region and view
trends in your spending over the past few months. You can use the dashboard page to see various
breakdowns of your AWS usage. This is especially useful if you're a Free Tier user. To view more
details about your AWS costs and invoices, choose Billing details in the left navigation pane.
You can customize your dashboard layout at any time by choosing the gear icon at the top of the
page to match your use case.

Understanding your dashboard page

Your AWS Billing console dashboard contains the following sections. To create your preferred
layout, drag and drop sections of the Dashboard page. To customize the visible sections and
layout, choose the gear icon at the top of the page. These preferences are stored for ongoing visits
to the Dashboard page. To temporarily remove sections from your view, choose the x icon for
each section. To make all sections visible, choose refresh at the top of the page.

AWS summary
This section is an overview of your AWS costs across all accounts, AWS Regions,
service providers, and services, and other KPIs. Total compared to prior period
displays your total AWS costs for the most recent closed month. It also provides a
comparison to your total forecasted costs for the current month. Choose the gear icon
on the card to decide which KPIs you want to display.

Highest cost and usage details


This section shows your top service, account, or AWS Region by estimated monthtodate
(MTD) spend. To choose which to view, choose the gear icon on the top right.
Cost trend by top five services
In this section, you can see the cost trend for your top five services for the most recent
three to six closed billing periods.

11
You can choose between chart types and time periods on the top of the section. You can
adjust additional preferences using the gear icon.

The columns provide the following information:

• Average: The average cost over the trailing three months.


• Total: The total for the most recent closed month.
• Trend: Compares the Total column with the Average column. Account cost trend
This section shows the cost trend for your account for the most recent three to six closed
billing periods. If you're a management account of AWS Organizations, the cost trend
by top five section shows your top five AWS accounts for the most recent three to six
closed billing periods. If invoices weren't already issued, the data isn't visible in this
section.

You can choose between chart types and time periods on the top of the section. Adjust
additional preferences using the gear icon.

The columns provide the following information:

• Average: The average cost over the trailing three months.


• Total: The total for the most recent closed month.
• Trend: Compares the Total column with the Average column.

12
CHAPTER-4 Awsgloballinfrastructure
4.1-Introduction AWSisacloudcomputingplatformwhichis

globally available.

o Global infrastructure is a region around the world in which AWS is based. Global
infrastructure is a bunch of high-level IT services which is shown below:
o AWS is available in 19 regions, and 57 availability zones in December 2018 and 5 more
regions 15 more availability zones for 2019.

The following are the components that make up the AWS infrastructure:

o Availability Zones o Region o Edge locations o Regional Edge Caches

Fig6-AWS Infrastructure

Availability zone as a Data Center

o An availability zone is a facility that can be somewhere in a country or in a city. Inside


this facility, i.e., Data Centre, we can have multiple servers, switches, load balancing,
firewalls. The things which interact with the cloud sits inside the data centers.
o An availability zone can be a several data centers, but if they are close together, they are
counted as 1 availability zone.

Region

o A region is a geographical area. Each region consists of 2 more availability zones.

13
o A region is a collection of data centers which are completely isolated from other regions.
o A region consists of more than two availability zones connected to each other through

links.

o Availability zones are connected through redundant and isolated metro fibers.

Edge Locations

o Edge locations are the endpoints for AWS used for caching content. o Edge locations
consist of CloudFront, Amazon's Content Delivery Network (CDN). o Edge locations are
more than regions. Currently, there are over 150 edge locations. o Edge location is not a
region but a small location that AWS have. It is used for caching the content. o Edge
locations are mainly located in most of the major cities to distribute the content to end
users with reduced latency. o For example, some user accesses your website from
Singapore; then this request would be redirected to the edge location closest to Singapore
where cached data can be read.

Regional Edge Cache o AWS announced a new type of edge location in November 2016, known

as a Regional Edge Cache. o Regional Edge cache lies between CloudFront Origin servers

and the edge locations. o A regional edge cache has a large cache than an individual edge

location. o Data is removed from the cache at the edge location while the data is retained at

the Regional Edge Caches.

o When the user requests the data, then data is no longer available at the edge location.
Therefore, the edge location retrieves the cached data from the Regional edge cache
instead of the Origin servers that have high latency.

4.2:AWS Services and Service Categories

Amazon Web Services (AWS) is a cloud computing platform that provides a wide range
ofservicestohelpbusinessesscaleandgrow.Theseservicesaregroupedintovariouscategories,
including compute, storage, database, networking, security, and more. Each category contains
multiple services that can be used individually or together to create a powerful cloud
infrastructure.

14
In this presentation, we will explore each of these categories in detail, highlighting the
key features and benefits of each service. By the end of this presentation, you will have a solid
understanding of AWS services and categories, and how they can be used to drive business
success.

AWS offers a range of compute services that can help businesses meet their computing
requirements in a flexible and cost-effective manner. One such service is Amazon Elastic
Compute Cloud (EC2), which provides resizable compute capacity in the cloud. With EC2, you
can quickly spin up new instances to handle increased demand or scale down when demand
decreases. This makes it an ideal choice for applications with unpredictable workloads or for
businesses that want to reduce their upfront infrastructure costs.

Another popular compute service offered by AWS is AWS Lambda, which allows you to
run code without provisioning or managing servers. With Lambda, you only pay for the compute
time that you consume, making it a cost-effective option for applications with sporadic usage
patterns. It also supports a wide range of programming languages, so you can choose the
language that best suits your needs.

CHAPTER 5 AWS Cloud Security

5.1 AWS Shared Responsibility Model :


Security and Compliance is a shared responsibility between AWS and the customer. This
shared model can help relieve the customer’s operational burden as AWS operates, manages and
controls the components from the host operating system and virtualization layer down to the
physical security of the facilities in which the service operates. The customer assumes
15
responsibility and management of the guest operating system (including updates and security
patches), other associated application software as well as the configuration of the AWS provided
security group firewall. Customers should carefully consider the services they choose as their
responsibilities vary depending on the services used, the integration of those services into their
IT environment, and applicable laws and regulations. The nature of this shared responsibility also
provides the flexibility and customer control that permits the deployment. As shown in the chart
below, this differentiation of responsibility is commonly referred to as Security “of” the Cloud
versus Security “in” the Cloud.
AWS responsibility “Security of the Cloud” - AWS is responsible for protecting the infrastructure
that runs all of the services offered in the AWS Cloud. This infrastructure is composed of the
hardware, software, networking, and facilities that run AWS Cloud services.

Customer responsibility “Security in the Cloud” – Customer responsibility will be determined


by the AWS Cloud services that a customer selects. This determines the amount of configuration
work the customer must perform as part of their security responsibilities. For example, a service
such as Amazon Elastic Compute Cloud (Amazon EC2) is categorized as Infrastructure as a
Service (IaaS) and, as such, requires the customer to perform all of the necessary security
configuration and management tasks. Customers that deploy an Amazon EC2 instance are
responsible for management of the guest operating system (including updates and security
patches), any application software or utilities installed by the customer on the instances, and the
configuration of the AWS-provided firewall (called a security group) on each instance. For
abstracted services, such as Amazon S3 and Amazon DynamoDB, AWS operates the
infrastructure layer, the operating system, and platforms, and customers access the endpoints to
store and retrieve data. Customers are responsible for managing their data (including encryption
options), classifying their assets, and using IAM tools to apply the appropriate permission.

AWS IAM :
AWS Identity and Access Management (IAM) is a web service that helps you securely control
access to AWS resources. With IAM, you can centrally manage permissions that control which
AWS resources users can access. You use IAM to control who is authenticated (signed in) and
authorized (has permissions) to use resources.

Fig6:AWS IAM
When you create an AWS account, you begin with one sign-in identity that has complete
access to all AWS services and resources in the account. This identity is called the AWS account
16
root user and is accessed by signing in with the email address and password that you used to
create the account. We strongly recommend that you don't use the root user for your everyday
tasks. Safeguard your root user credentials and use them to perform the tasks that only the root
user can perform. For the complete list of tasks that require you to sign in as the root user, see
Tasks that require root user credentials in the AWS Account Management Reference Guide.

IAM gives you the following features:

Shared access to your AWS account


You can grant other people permission to administer and use resources in your AWS
account without having to share your password or access key.
You can grant different permissions to different people for different resources. For
example, you might allow some users complete access to

Granular permissions
Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service
(Amazon S3), Amazon DynamoDB, Amazon Redshift, and other AWS services. For
other users, you can allow read-only access to just some S3 buckets, or permission to
administer just some EC2 instances, or to access your billing information but nothing
else.

Secure access to AWS resources for applications that run on Amazon EC2
You can use IAM features to securely provide credentials for applications that run on
EC2 instances. These credentials provide permissions for your application to access
other AWS resources. Examples include S3 buckets and DynamoDB tables.

Security in AWS Account Management :


Cloud security at AWS is the highest priority. As an AWS customer, you benefit from a data
center and network architecture that is built to meet the requirements of the most
securitysensitive organizations.

Security is a shared responsibility between AWS and you. The shared responsibility model
describes this as security of the cloud and security in the cloud:

• Security of the cloud – AWS is responsible for protecting the infrastructure that runs AWS
services in the AWS Cloud. AWS also provides you with services that you can use securely.

17
Third-party auditors regularly test and verify the effectiveness of our security as part of the AWS
Compliance Programs. To learn about the compliance programs that apply to Account
Management, see AWS services in scope by compliance program.
• Security in the cloud – Your responsibility is determined by the AWS service that you use. You
are also responsible for other factors including the sensitivity of your data, your company’s
requirements, and applicable laws and regulations

This documentation helps you understand how to apply the shared responsibility model when
using AWS Account Management. It shows you how to configure Account Management to meet
your security and compliance objectives. You also learn how to use other AWS services that help
you to monitor and secure your Account Management resources.

5.2 :Networking and Content Delivery and Networking Basics


Starting your cloud networking journey can seem overwhelming. Especially if you are
accustomed to the traditional on-premises way of provisioning hardware and managing and
configuring networks. Having a good understanding of core networking concepts like IP
addressing, TCP communication, IP routing, security, and virtualization will help you as you
begin gaining familiarity with cloud networking on AWS. In the following sections, we answer
common questions about cloud networking and explore best practices for building infrastructure
on AWS.

Cloud networking :
Similar to traditional on-premises networking, cloud networking provides the ability to build,
manage, operate, and securely connect your networks across all your cloud environments and
distributed cloud and edge locations. Cloud networking allows you to architect infrastructure
that is resilient and highly available, helping you to deploy your applications faster, at scale, and
closer to your end users when you need it.

18
Fig7: AWS Networking Amazon VPC

:
With Amazon Virtual Private Cloud (Amazon VPC), you can launch AWS resources in a
logically isolated virtual network that you've defined. This virtual network closely resembles a
traditional network that you'd operate in your own data center, with the benefits of using the
scalable infrastructure of AWS.

The following diagram shows an example VPC. The VPC has one subnet in each of the
Availability Zones in the Region, EC2 instances in each subnet, and an internet gateway to allow
communication between the resources in your VPC and the internet.

Fig8:VPC Networking VPC

Networking :
VPC (Virtual Private Cloud) is a fundamental networking service provided by Amazon
Web Services (AWS). It allows you to create a logically isolated section of the AWS cloud
where you can launch AWS resources, such as EC2 instances, databases, and load balancers.
With VPC, you have complete control over your virtual network environment, including
selecting your IP address range, creating subnets, and configuring route tables and network
gateways.

Here are some key aspects of VPC networking in AWS:

1. Virtual Private Cloud (VPC): When you create a VPC, it represents your private virtual
network in the AWS cloud. You can think of it as your own data center in the cloud.

19
2. Subnets: Within a VPC, you can create one or more subnets, each associated with a specific
Availability Zone in a region. Subnets help you logically segment your resources and provide
high availability and fault tolerance.

3. IP Addressing: You can define the IP address range for your VPC using CIDR (Classless
Inter-Domain Routing) notation. For example, you can choose a range like
10.0.0.0/16, which allows for up to 65,536 IP addresses.

4. Internet Gateway (IGW): An Internet Gateway is a horizontally scalable, redundant, and


highly available component that allows resources within your VPC to communicate with the
internet and vice versa.

5. Route Tables: Each subnet in a VPC is associated with a route table, which defines the rules
for routing traffic in and out of the subnet. By default, the main route table allows communication
within the VPC, but you can create custom route tables to control specific traffic patterns.

VPC Security :
VPC security is a critical aspect of AWS's Virtual Private Cloud (VPC) service. When setting
up and managing a VPC, it's essential to implement various security measures to protect your
cloud resources from unauthorized access and potential threats. Here are some key components
of VPC security in AWS:
Security Groups: Security groups act as virtual firewalls for your EC2 instances within a VPC.
You can specify inbound and outbound traffic rules for each security group, allowing you to
control what traffic is allowed to reach your instances. They operate at the instance level and can
be associated with one or more instances.
Network ACLs (Access Control Lists): Network ACLs are another layer of security that
operate at the subnet level. They control inbound and outbound traffic at the subnet level and
provide additional control over traffic flow between subnets. Unlike security groups, network
ACLs are stateless, meaning that you must define rules for both inbound and outbound traffic.
Public and Private Subnets: By carefully designing your VPC with public and private subnets,
you can control which resources are exposed to the internet and which remain private. Public
subnets typically have a route to an Internet Gateway, allowing instances within them to
communicate with the internet, while private subnets do not have direct internet access.
Internet Gateway: The Internet Gateway is a horizontally scalable, redundant, and highly
available component that enables resources within your VPC to access the internet and allows
the internet to reach your resources. Properly configuring access to the Internet Gateway ensures
secure internet connectivity for your public resources.
NAT Gateway/NAT Instance: For instances in private subnets to access the internet (e.g., for
software updates), you can use either a NAT Gateway (a managed service) or a NAT instance (a
self-managed EC2 instance acting as a NAT device). These resources allow outbound traffic
while restricting incoming traffic initiated from the internet.
20
VPN and Direct Connect: For secure communication between your on-premises data centers
and your VPC, AWS supports VPN (Virtual Private Network) and AWS Direct Connect
connections. These options can be used to create secure and encrypted communication channels.
IAM (Identity and Access Management): While not specific to VPC, IAM plays a crucial role
in controlling access to AWS resources. By properly configuring IAM roles and policies, you
can grant appropriate permissions to users, groups, and roles for managing your VPC and its
resources.

Route 53 :
Amazon Route 53 is a highly scalable and reliable Domain Name System (DNS) web service
provided by Amazon Web Services (AWS). It helps you manage the domain names (e.g.,
example.com) and route incoming requests to the appropriate AWS resources, such as EC2
instances, load balancers, or S3 buckets. Here's an overview of Amazon Route 53:
Domain Registration: Route 53 allows you to register new domain names or transfer existing
ones. When you register a domain with Route 53, it becomes available for use, and you can start
configuring its DNS settings.
DNS Management: Route 53 provides a fully featured DNS management service. You can
create various types of DNS records like A records (IPv4 addresses), AAAA records (IPv6
addresses), CNAME records (aliases), MX records (mail exchange servers), and more. These
records allow you to associate your domain names with specific IP addresses or other resources.
Routing Policies: Route 53 offers several routing policies that allow you to control how
incoming traffic is distributed among multiple resources. Some of the routing policies include:

Simple Routing: Directs traffic to a single resource.

Weighted Routing: Distributes traffic based on assigned weights to resources.

Latency-Based Routing: Routes traffic to the resource with the lowest latency for the user.

Geolocation Routing: Directs traffic based on the user's geographic location.


Health Checks: Route 53 enables you to set up health checks for your resources, such as EC2
instances or load balancers. Health checks monitor the health and availability of resources, and
Route 53 can automatically reroute traffic away from unhealthy resources.

Cloudfront :
Amazon CloudFront is a web service that speeds up distribution of your static and dynamic web
content, such as .html, .css, .js, and image files, to your users. CloudFront delivers your content
through a worldwide network of data centers called edge locations. When a user requests content

21
that you're serving with CloudFront, the request is routed to the edge location that provides the
lowest latency (time delay), so that content is delivered with the best possible performance.

• If the content is already in the edge location with the lowest latency, CloudFront delivers it
immediately.
• If the content is not in that edge location, CloudFront retrieves it from an origin that you've
defined—such as an Amazon S3 bucket, a MediaPackage channel, or an HTTP server (for
example, a web server) that you have identified as the source for the definitive version of your
content.

As an example, suppose that you're serving an image from a traditional web server, not from
CloudFront. For example, you might serve an image, sunsetphoto.png, using the URL
https://example.com/sunsetphoto.png.

Your users can easily navigate to this URL and see the image. But they probably don't know that
their request is routed from one network to another—through the complex collection of
interconnected networks that comprise the internet—until the image is found.

CloudFront speeds up the distribution of your content by routing each user request through the
AWS backbone network to the edge location that can best serve your content. Typically, this is a
CloudFront edge server that provides the fastest delivery to the viewer. Using the AWS network
dramatically reduces the number of networks that your users' requests must pass through, which
improves performance. Users get lower latency—the time it takes to load the first byte of the
file—and higher data transfer rates.

You also get increased reliability and availability because copies of your files (also known as
objects) are now held (or cached) in multiple edge locations around the world.

CHAPTER 6 COMPUTE

6.1-Cloud compute services :


In Amazon Web Services (AWS), compute services provide the infrastructure and
resources to run your applications and workloads in the cloud. AWS offers a variety of compute
services to suit different use cases and application requirements. Here are some of the key
compute services provided by AWS:

22
Amazon EC2 (Elastic Compute Cloud): Amazon EC2 is a web service that provides resizable
compute capacity in the cloud. It allows you to launch virtual machines, known as instances,
with various operating systems and configurations. EC2 offers flexibility in terms of instance
types, storage options, and networking capabilities.
Amazon ECS (Elastic Container Service): Amazon ECS is a fully managed container
orchestration service. It allows you to easily run and scale Docker containers on instances or
AWS Fargate (serverless compute for containers) without managing the underlying
infrastructure.
AWS Lambda: AWS Lambda is a serverless compute service that lets you run code without
provisioning or managing servers. You can upload your code and specify the triggering events,
and Lambda automatically scales and executes the code in response to those events.
Amazon EKS (Elastic Kubernetes Service): Amazon EKS is a fully managed Kubernetes
service that allows you to deploy, manage, and scale containerized applications using
Kubernetes. EKS takes care of the underlying Kubernetes infrastructure.
AWS Batch: AWS Batch enables you to run batch computing workloads at scale. It dynamically
provisions the optimal amount of compute resources based on the job's requirements.

6.2-AmazonEC2: Amazon Elastic


Compute Cloud (Amazon EC2) is a web service that

provides secure, resizable compute capacity in the cloud. It is designed to make web-scale computing
easier for developers.

The simple web interface of Amazon EC2 allows you to obtain and configure capacity with
minimal friction. It provides you with complete control of your computing resources and lets
you run on Amazon’s proven computing environment. Amazon EC2 reduces the time required
to obtain and boot new server instances (called Amazon EC2 instances) to minutes, allowing you
to quickly scale capacity, both up and down, as your computing requirements change. Amazon
EC2 changes the economics of computing by allowing you to pay only for capacity that you
actually use. Amazon EC2 provides developers and system administrators the tools to build
failure resilient applications and isolate themselves from common failure scenarios.

Instance types

Amazon EC2 passes on to you the financial benefits of Amazon scale. You pay a very low rate
for the compute capacity you actually consume. Refer to Amazon EC2 Instance Purchasing
Options for a more detailed description.

23
• On-Demand Instances — With On-Demand Instances, you pay for compute capacity by the hour
or the second depending on which instances you run. No longer-term commitments or upfront
payments are needed. You can increase or decrease your compute capacity depending on the
demands of your application and only pay the specified per hourly rates for the instance you use.
On-Demand Instances are recommended for:
o Users that prefer the low cost and flexibility of Amazon EC2 without any up-front payment or
long-term commitment
o Applications with short-term, spiky, or unpredictable workloads that cannot be interrupted o
Applications being developed or tested on Amazon EC2 for the first time
• Spot Instances —Spot Instances are available at up to a 90% discount compared to OnDemand
prices and let you take advantage of unused Amazon EC2 capacity in the AWS Cloud. You can
significantly reduce the cost of running your applications, grow your application’s compute
capacity and throughput for the same budget, and enable new types of cloud computing
applications. Spot Instances are recommended for: o Applications that have flexible start and
end times o Applications that are only feasible at very low compute prices o Users with urgent
computing needs for large amounts of additional capacity
• Reserved Instances — Reserved Instances provide you with a significant discount (up to 72%)
compared to On-Demand Instance pricing. You have the flexibility to change families, operating
system types, and tenancies while benefitting from Reserved Instance pricing when you use
Convertible Reserved Instances.
• C7g Instances — C7g Instances, powered by the latest generation AWS Graviton3 processors,
provide the best price performance in Amazon EC2 for compute-intensive workloads. C7g
instances are ideal for high performance computing, batch processing, electronic design
automation (EDA), gaming, video encoding, scientific modeling, distributed analytics,
CPUbased ML inference, and ad serving.
• Inf2 Instances — Inf2 Instances are purpose--built for deep learning inference. They deliver
high performance at the lowest cost in Amazon EC2 for generative AI models, including large
language models (LLMs) and vision transformers. Inf2 instances are powered by AWS
Inferentia2, the second-generation AWS Inferentia accelerator.
• M7g Instances — M7g instances, powered by the latest generation AWS Graviton3 processors,
provide the best price performance in Amazon EC2 for general purpose workloads. M7g
instances are ideal for applications built on open-source software such as application servers,
microservices, gaming servers, mid-size data stores, and caching fleets.

6.3-Cost optimization :

24
Cost optimization in Amazon EC2 is crucial to ensure that you are getting the most value out of
your cloud infrastructure while keeping your expenses under control. Here are some strategies
and best practices to optimize costs with Amazon EC2:
Right-Sizing Instances: Choose the instance type that best matches your workload
requirements. If your workload is not resource-intensive, consider using smaller or lower-cost
instance types to avoid overprovisioning.
Reserved Instances (RIs): Utilize Reserved Instances for stable workloads with predictable
usage. RIs offer significant cost savings compared to On-Demand Instances when you commit
to a one- or three-year term.
Spot Instances: Take advantage of Spot Instances for fault-tolerant or flexible workloads. Spot
Instances can be much cheaper than On-Demand Instances but are subject to availability and can
be terminated with little notice if the spot price exceeds your bid.
Scheduled Instances: Use Scheduled Instances for workloads that have predictable schedules.
Scheduled Instances allow you to reserve capacity for specific time periods, ensuring you have
the necessary resources when needed.
EC2 Instance Limits: Keep an eye on your EC2 instance limits. By default, new AWS accounts
have instance limits that may impact your ability to scale. Request a limit increase if needed.
Auto Scaling: Implement Auto Scaling to automatically adjust the number of instances based on
demand. This ensures you have enough capacity during peak periods and can reduce costs during
low-demand periods.
Amazon EC2 Instance Families: Choose instance families optimized for your specific use case.
AWS offers instance families tailored for general-purpose computing, memory-intensive tasks,
compute-optimized workloads, and more.

Fig9:AWS EC2

6.4-AWS Lambda :
AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part
of Amazon Web Services. Therefore you don’t need to worry about which AWS resources to
launch, or how will you manage them. Instead, you need to put the code on Lambda, and it runs.
In AWS Lambda the code is executed based on the response of events in AWS services such as
add/delete files in S3 bucket, HTTP request from Amazon API gateway, etc. However, Amazon
Lambda can only be used to execute background tasks.

25
AWS Lambda function helps you to focus on your core product and business logic instead of
managing operating system (OS) access control, OS patching, right-sizing, provisioning, scaling,
etc.

How it’s works?


The following AWS Lambda example with block diagram explains the working of AWS Lambda
in a few easy steps:

Fig10:AWS Lambda
6.5AWS Elastic Beanstalk:
AWS Elastic Beanstalk is an AWS-managed service for web applications. Elastic Beanstalk is
a pre-configured EC2 server that can directly take up your application code and environment
configurations and use it to automatically provision and deploy the required resources within
AWS to run the web application. Unlike EC2 which is Infrastructure as a service, Elastic
Beanstalk is a Platform As A Service (PAAS) as it allows users to directly use a pre-configured
server for their application. Of course, you can deploy applications without ever having to use
elastic beanstalk but that would mean having to choose the appropriate service from the vast
array of services offered by AWS, manually provisioning these AWS resources, and stitching
them up together to form a complete web application.
Elastic Beanstalk abstracts the underlying configuration work and allows you as a user to focus
on more pressing matters.
This raises a concern that if elastic Beanstalk configures most of the resources itself and abstracts
the underlying details. Can developers change the configuration if needed? The answer is Yes.
Elastic Beanstalk is provided to make application deployment simpler but at no level will it
restrict the developers from changing any configurations. How Elastic Beanstalk Works
Elastic Beanstalk is a fully managed service provided by AWS that makes it easy to deploy and
manage applications in the cloud without worrying about the underlying infrastructure. First,
create an application and select an environment, configure the environment, and deploy the
application.

26
CHAPTER-7 STORAGE
7.1:AWSEBS

Fig11:AWS EBS
It offers persistent block-level storage volumes that can be attached to Amazon EC2
instances, providing durable storage for your applications and data. Here are the key features and
characteristics of AWS Elastic Block Store:

Persistent Storage: EBS volumes provide durable and persistent block storage that persists
independently from the lifecycle of the EC2 instance. This means that data stored in EBS
volumes remains intact even if the associated EC2 instance is stopped, terminated, or fails.
Multiple Volume Types: AWS offers different EBS volume types to cater to various use cases
and performance requirements:
General Purpose SSD (gp2): Provides a balance of price and performance for most workloads.
Provisioned IOPS SSD (io1): Offers high-performance for I/O-intensive workloads with
consistent and low-latency performance.
Cold HDD (sc1): Optimized for low-cost, infrequently accessed workloads with
throughputoriented performance.
Throughput Optimized HDD (st1): Designed for large, frequently accessed workloads that
require sustained throughput.
Magnetic (standard): Legacy storage option with cost-effective magnetic disks suitable for
workloads with light I/O requirements.
EBS Snapshots: You can create point-in-time snapshots of EBS volumes, which are stored in
Amazon S3. These snapshots serve as backups and can be used to restore volumes or create new
volumes with the same data.
EBS Encryption: EBS volumes support encryption using AWS Key Management Service
(KMS) keys. Encryption provides an additional layer of data security, especially for sensitive
workloads.

28
EBS Volume Resizing: You can dynamically resize EBS volumes without disrupting the
associated EC2 instance. This allows you to adjust storage capacity as per your evolving
application needs.
EBS Multi-Attach: Some EBS volume types, like io1 and io2, support multi-attach. This
enables attaching a single EBS volume to multiple EC2 instances in the same Availability Zone,
allowing for shared storage for clustered or high-availability applications.
EBS-Backed EC2 Instances: When launching EC2 instances, you have the option to choose
between EBS-backed instances or instance-store-backed instances. EBS-backed instances use
EBS volumes as the root device, providing persistent storage for your instance.
Snapshots and Data Migration: EBS snapshots can be used for data migration, data backup,
and disaster recovery scenarios. You can copy snapshots across regions and create new volumes
from them.
EBS Performance Metrics: You can monitor the performance of your EBS volumes using
Amazon CloudWatch, which provides metrics such as read/write throughput, latency, and IOPS.

AWS S3 :
AWS EBS (Amazon Elastic Block Store) is a scalable block storage service provided by Amazon
Web Services (AWS). It offers persistent block-level storage volumes that can be attached to
Amazon EC2 instances, providing durable storage for your applications and data. Here are the
key features and characteristics of AWS Elastic Block Store:
Persistent Storage: EBS volumes provide durable and persistent block storage that persists
independently from the lifecycle of the EC2 instance. This means that data stored in EBS
volumes remains intact even if the associated EC2 instance is stopped, terminated, or fails.
Multiple Volume Types: AWS offers different EBS volume types to cater to various use cases
and performance requirements:
General Purpose SSD (gp2): Provides a balance of price and performance for most workloads.
Provisioned IOPS SSD (io1): Offers high-performance for I/O-intensive workloads with
consistent and low-latency performance.
Cold HDD (sc1): Optimized for low-cost, infrequently accessed workloads with
throughputoriented performance.
Throughput Optimized HDD (st1): Designed for large, frequently accessed workloads that
require sustained throughput.
Magnetic (standard): Legacy storage option with cost-effective magnetic disks suitable for
workloads with light I/O requirements.
EBS Snapshots: You can create point-in-time snapshots of EBS volumes, which are stored in
Amazon S3. These snapshots serve as backups and can be used to restore volumes or create new
volumes with the same data.

29
EBS Encryption: EBS volumes support encryption using AWS Key Management Service
(KMS) keys. Encryption provides an additional layer of data security, especially for sensitive
workloads.
EBS Volume Resizing: You can dynamically resize EBS volumes without disrupting the
associated EC2 instance. This allows you to adjust storage capacity as per your evolving
application needs.
EBS Multi-Attach: Some EBS volume types, like io1 and io2, support multi-attach. This
enables attaching a single EBS volume to multiple EC2 instances in the same Availability Zone,
allowing for shared storage for clustered or high-availability applications.
EBS-Backed EC2 Instances: When launching EC2 instances, you have the option to choose
between EBS-backed instances or instance-store-backed instances. EBS-backed instances use
EBS volumes as the root device, providing persistent storage for your instance.
Snapshots and Data Migration: EBS snapshots can be used for data migration, data backup,
and disaster recovery scenarios. You can copy snapshots across regions and create new volumes
from them.
EBS Performance Metrics: You can monitor the performance of your EBS volumes using
Amazon CloudWatch, which provides metrics such as read/write throughput, latency, and IOPS.

AWS EFS :
AWS EFS (Amazon Elastic File System) is a fully managed, scalable file storage service
provided by Amazon Web Services (AWS). It is designed to provide shared file storage across
multiple EC2 instances, making it ideal for applications that require shared access to files and
data. Here are the key features and characteristics of Amazon EFS:
Shared File System: Amazon EFS allows you to create a scalable and shared file system that
can be mounted simultaneously by multiple EC2 instances. This enables multiple instances to
read and write data to the file system concurrently, making it suitable for applications with shared
workloads.
Elastic and Scalable: EFS automatically scales its file systems as data storage needs grow or
shrink. It can accommodate an almost unlimited number of files and data, and there is no need
to pre-provision storage capacity.
Data Durability and Availability: EFS is designed for high durability and availability. It
automatically replicates data across multiple Availability Zones (AZs) within a region, ensuring
that data is protected against hardware failures and provides 99.99% availability.

Performance Modes: EFS offers two performance modes to cater to different application
requirements:
General Purpose Mode (default): Suitable for most workloads, providing a balance of low
latency and high throughput.

30
Max I/O Mode: Designed for applications with higher levels of aggregate throughput and higher
performance at the cost of slightly higher latency.
EFS Throughput Modes: EFS supports two throughput modes to optimize performance for
different types of workloads:
Bursting Throughput Mode: Suitable for workloads that occasionally require higher
throughput, using a credit system.
Provisioned Throughput Mode: Designed for applications that require consistent, high levels
of throughput.
Integration with EC2 Instances: EFS is natively integrated with Amazon EC2 instances. You
can mount EFS file systems on EC2 instances using standard file system interfaces such as NFS
(Network File System).
Encryption: EFS provides encryption at rest for data stored within the file system. It uses AWS
Key Management Service (KMS) to manage the encryption keys.
Data Backup and Restore: EFS supports automated backups through EFS backups and AWS
Backup. This enables you to create point-in-time backups and restore data from previous
snapshots.
Data Management Features: EFS allows you to set lifecycle policies to automatically move
files to lower-cost storage classes like Amazon S3 for infrequently accessed data.

7.2-Amazon S3 Glacier
Amazon S3 Glacier is a low-cost storage service provided by Amazon Web Services (AWS) for
data archiving and long-term backup. It is designed to store data that is infrequently accessed
and doesn't require real-time retrieval. Glacier provides a secure, durable, and scalable solution
for long-term storage of data, making it ideal for compliance, regulatory, and archival
requirements. Here are the key features and characteristics of Amazon S3 Glacier:
Archival Storage: Glacier is primarily used for data archiving rather than frequently accessed
data storage. It is an excellent option for data that needs to be retained for long periods without
the need for real-time retrieval.
Durability and Availability: Similar to Amazon S3, Glacier ensures data durability by
replicating data across multiple facilities within a region. It provides high availability to protect
your data against hardware failures.
Vaults and Archives: In Glacier, data is organized into "vaults." A vault is a container for storing
archives, which are individual objects stored in Glacier. Each archive can be up to 40 terabytes
in size.
Data Retrieval Options: Glacier offers three retrieval options, each with different costs and
retrieval times:
Expedited Retrieval: Provides real-time access to your data but comes with higher costs.

31
Standard Retrieval: The default option, which provides data retrieval within a few hours.

Bulk Retrieval: Designed for large data retrieval, typically taking 5-12 hours.
Data Lifecycle Policies: You can create data lifecycle policies to automatically transition data
from S3 to Glacier based on specific criteria, such as data age or access frequency. This helps
optimize storage costs by moving infrequently accessed data to Glacier.
Security and Encryption: Glacier provides data security through SSL (Secure Sockets Layer)
for data in transit and server-side encryption at rest. You can also use AWS Key Management
Service (KMS) to manage encryption keys for added security.
Cost-Effective Storage: Glacier offers a lower cost per gigabyte compared to Amazon S3,
making it an economical solution for long-term data retention.
Data Retrieval Costs: While storing data in Glacier is cost-effective, it's essential to consider
retrieval costs, especially for Expedited Retrieval and Standard Retrieval, which may have higher
associated costs compared to Bulk Retrieval.

Fig12: Amazon S3 Glacier

32
CHAPTER-8 DATABASES
8.1-Amazon RDS:
Amazon RDS (Relational Database Service) is a managed database service provided by Amazon
Web Services (AWS). It simplifies the process of setting up, operating, and scaling relational
databases in the cloud.

Fig13:Amazon RDS
RDS supports various database engines and takes care of routine database tasks, allowing you
to focus on your applications and data. Here are the key features and characteristics of Amazon
RDS:
Managed Database Service: With RDS, AWS takes care of database management tasks such as
database setup, patching, backups, and automatic failure detection. This allows you to offload
administrative burdens and focus on your application development.

Multiple Database Engines: RDS supports several popular database engines, including:
Amazon Aurora (MySQL and PostgreSQL-compatible): A high-performance, fully managed
database engine designed for the cloud.

MySQL: A widely used open-source relational database.

PostgreSQL: An open-source, object-relational database.

Oracle: A commercial database provided by Oracle Corporation.

Microsoft SQL Server: A commercial database provided by Microsoft.


Easy Scalability: RDS allows you to scale your database instance vertically (by increasing its
compute and memory resources) or horizontally (by creating Read Replicas for read-heavy
workloads).
Automated Backups and Point-in-Time Recovery: RDS automatically creates backups of
your database, allowing you to restore to any specific point in time within the retention period.
High Availability: Amazon RDS offers high availability through Multi-AZ (Availability Zone)
deployments. In a Multi-AZ configuration, a synchronous standby replica is created in a different
Availability Zone, providing automatic failover in case of a primary instance failure.

33
Security Features: RDS provides security features such as encryption at rest and in transit, IAM
database authentication, and network isolation within a VPC (Virtual Private Cloud).
Monitoring and Metrics: Amazon RDS integrates with Amazon CloudWatch, allowing you to
monitor database performance metrics and set up alarms to get notified about critical events.
Read Replicas: For read-intensive workloads, you can create Read Replicas of your primary
database to offload read traffic and improve performance.
Database Engine Upgrades: RDS makes it easy to upgrade your database engine to the latest
version with minimal downtime.
Parameter Groups: RDS allows you to customize database engine settings using parameter
groups, enabling you to optimize your database's behavior and performance.
Amazon RDS is an essential service for deploying and managing relational databases in the AWS
cloud. Whether you need a fully managed database solution, high availability, automatic
backups, or scalability, RDS provides a reliable and convenient way to set up and maintain your
databases, supporting a wide range of applications and use cases.

8.2-Amazon DynamoDB:
Amazon DynamoDB is a fully managed, NoSQL database service provided by Amazon Web
Services (AWS). It is designed to provide fast and scalable performance for both read and write
operations while maintaining low-latency responses.

Fig14:Amazon DynamoDB
DynamoDB is suitable for a wide range of applications, from small-scale web
applications to large-scale enterprise solutions.

Here are the key features and characteristics of Amazon DynamoDB:


Fully Managed: With DynamoDB, AWS takes care of the database management tasks, such as
hardware provisioning, setup, configuration, scaling, backups, and maintenance. This allows
developers to focus on building applications without worrying about database administration.
NoSQL Database: DynamoDB is a NoSQL database, which means it provides flexible schema
design and can handle unstructured or semi-structured data. It does not require a fixed schema
like traditional relational databases.

34
Scalable Performance: DynamoDB is designed for high scalability. It automatically scales both
read and write capacity to handle varying workloads. This makes it suitable for applications with
unpredictable or rapidly changing traffic patterns.
Low-Latency Response Times: DynamoDB offers single-digit millisecond latency for both
read and write operations, making it well-suited for applications that require real-time access to
data.
Data Replication and Availability: DynamoDB replicates data across multiple Availability
Zones (AZs) within a region to ensure high availability and fault tolerance. It offers strong
consistency for both read and write operations.
Data Encryption: DynamoDB provides encryption at rest using AWS Key Management Service
(KMS) to enhance data security.
Flexible Data Model: DynamoDB allows you to define primary keys, which can be simple
primary keys or composite primary keys to support various access patterns. You can also create
secondary indexes for efficient querying on non-key attributes.
Provisioned Throughput: In DynamoDB, you provision the read and write capacity units based
on your application's requirements. You can choose to auto-scale these capacity units to handle
traffic spikes automatically.
DynamoDB Streams: DynamoDB Streams captures changes to the data in real-time and allows
you to process these changes using AWS Lambda or other services. This feature is useful for
building event-driven applications or performing data analysis.
Integration with AWS Ecosystem: DynamoDB integrates seamlessly with other AWS services,
such as AWS Lambda, Amazon API Gateway, Amazon S3, Amazon EMR, and more, enabling
you to build sophisticated serverless architectures.

8.3-Amazon Redshift :
Amazon Redshift is a fully managed, petabyte-scale data warehousing service provided by
Amazon Web Services (AWS). It is designed for analyzing large volumes of data with high
performance and cost-efficiency.

Fig15:Amazon Redshift
Redshift is based on a columnar storage architecture and is optimized for online analytical
processing (OLAP) workloads. Here are the key features and characteristics of Amazon Redshift:

35
Columnar Storage: Redshift stores data in columns rather than rows, which allows for high
compression rates and improved query performance for analytical workloads. This columnar
storage reduces I/O and improves query execution times.
Scalability: Amazon Redshift is highly scalable and can easily scale up or down based on your
data volume and performance requirements. You can add or remove nodes to handle changing
workloads.
Fully Managed: Redshift is a fully managed service, meaning AWS takes care of the underlying
infrastructure, backups, patching, and other administrative tasks. This allows you to focus on
analyzing your data without worrying about managing the database.
Massive Parallel Processing (MPP): Redshift distributes data and query execution across
multiple nodes, enabling parallel processing for faster query performance. This allows Redshift
to handle large datasets and complex queries efficiently.
Column Compression and Encoding: Redshift uses various compression and encoding
techniques to reduce storage costs and improve query performance.
Integration with Other AWS Services: Redshift seamlessly integrates with other AWS services,
such as Amazon S3 for data loading, AWS Data Pipeline for data ETL (Extract, Transform,
Load), and AWS Glue for data cataloging and transformation.
Security and Encryption: Redshift supports various security features, including encryption at
rest and in transit. It also integrates with AWS Identity and Access Management (IAM) for access
control.

8.4-Amazon Aurora :
Amazon Aurora is a fully managed relational database service provided by Amazon Web
Services (AWS). It is designed to be compatible with MySQL and PostgreSQL, offering the
performance and availability of commercial-grade databases with the cost-effectiveness and ease
of management of open-source databases.

Fig16:Amazon Aurora

Amazon Aurora is a popular choice for applications that require high performance,
scalability, and durability. Here are the key features and characteristics of Amazon Aurora:
36
Compatibility: Amazon Aurora is compatible with MySQL and PostgreSQL, which means you
can use existing MySQL or PostgreSQL applications, drivers, and tools with Aurora without any
code changes.
Performance: Aurora is designed for high performance and can deliver up to five times the
throughput of standard MySQL and up to three times the throughput of standard PostgreSQL.
Scalability: Aurora can automatically scale both compute and storage resources to handle
increasing workloads. It can also create up to 15 read replicas, providing high read scalability
for read-heavy applications.
High Availability: Aurora offers high availability through Multi-AZ deployments. In a MultiAZ
configuration, Aurora automatically replicates data to a standby instance in a different
Availability Zone, providing automatic failover in case of a primary instance failure. Storage
Replication: Aurora replicates data across multiple Availability Zones within a region to ensure
data durability and fault tolerance. It uses a distributed, fault-tolerant storage system. Backup
and Restore: Aurora provides automated backups with a 1-day retention period, and you can
extend the retention period to up to 35 days. Point-in-time recovery allows you to restore your
database to any specific second within the retention period.
Performance Insights: Amazon Aurora Performance Insights allows you to monitor the
performance of your Aurora instances and helps you analyze database load and query
performance.
Serverless Aurora: Aurora Serverless is a mode that automatically adjusts the database's
compute capacity based on actual usage. It can be a cost-effective option for applications with
variable workloads.
Data Security: Aurora supports encryption at rest and in transit. You can use AWS Key
Management Service (KMS) to manage encryption keys.

37
Chapter-9

Cloud Architecture

9.1 :Introduction to Cloud Architecture:

Cloud architecture is the way technology components combine to build a cloud, in which
resources are pooled through virtualization technology and shared across a network. The
components of a cloud architecture include:

• A front-end platform (the client or device used to access the cloud)


• A back-end platform (servers and storage)
• A cloud-based delivery model
• A network

9.2-AWS Well-Architected Framework Design Principles:

These design principles direct where and how the Well-Architected program should be
implemented.

• Only use as much capacity as your workload necessitates


• Before deploying workloads and apps to production, test them in a large-scale test
environment
• Create a flexible architecture
• Make use of automation to make testing easier
• Make a data-driven structure
• Conduct live event simulations to aid with infrastructure improvement

Five Pillars of AWS Well-Architected Framework

38
The five AWS pillars are operational excellence, security, reliability, performance efficiency, and cost
optimization. Now, let us take a deeper look at them.

Fig 17-AWS pillars

9.3 :Operational Excellence

Operational excellence brings business value to the process by supporting the development and
efficient execution of workloads.

Design Principles:

• Operate as codes: Define and update the entire workload as code. Execute the operation

processes as code and automate the execution of processes by activating them. The benefit of

operating procedures as code is that it reduces human errors and provides consistent responses

to the events. It could be reused again for creating similar architectures.

• Build reversible changes frequently: Keep the components updated by designing workloads
appropriately. Build reversible changes gradually so that they can be altered later if they fail.

39
• Frequent revision of operational procedures: Improvise the operations frequently. Update the

processes along with the workload. Carry out periodic reviewing and validation of the

ongoing processes and keep the teams updated.

• Predict the possibilities of failure: Develop strategies to anticipate the possible breakdowns and

sort them accordingly. Evaluate the procedures repeatedly so that you get a proper

understanding of their impact to prevent future losses.

• Improvise the operations in case of failure: Try to recover and improve the existing

procedures, if they failed.

9.4 :Security
Security helps you to secure your data and assets by using cloud technologies for protection.

Design Principles:

Strengthen the foundation of identity: Use the concept of least priority while communicating
with your AWS resources, and implement and authorize separation of tasks. The goal is to
remove the use of long-term static credentials through identity management.

Trace the procedures: Real-time monitoring, alerting, and auditing of activities and changes in
the environment. Evaluate logs and automate metric systems.

Employ protection throughout the board: Use numerous security measures to implement a
defense-in-depth strategy. Implement this approach on all layers such as AWS VPC, edge of
the network, load balancing, operating system, and computer service.

Automate the security: Software-based security methods that are programmed increase the
capacity to expand safely and efficiently. Produce secure architectures that include features that
are specified and managed as code in version-controlled templates.
Secure data during transit and inactivity: Create sensitivity levels for your data and use
techniques such as encryption, tokenization, and access control, as and when needed.
Protect data from people: Restrict access to direct or manual data processing using procedures
and tools. When managing sensitive data, the danger of misuse, alteration, or manual errors is
reduced.
Prepare for potential security threats: Be prepared for upcoming threats by establishing incident
management and investigative policies and procedures in line with your company’s needs. To
improve your detection, investigation, and recovery times, do incident response simulations
and employ automated technology.

40
9.5 :Reliability
Reliability refers to workload’s capacity to fulfill its intended purpose accurately and
consistently when required. This includes the ability to run the workload and test it during its
entire life span. This blog provides in-depth, best-practice recommendations for deploying
dependable workloads on AWS.

Design Principles:

• Recovery from failure: Activate the automation of workload once the key performance

indicator (KPI) reaches the threshold.

o KPIs must be on commercial value rather than technical elements of the performance of
service. It enables automatic failure notice, tracking, and automated recovery

methods to work around or correct the problem.

o It is feasible to predict and correct errors before they happen with more advanced

automation.

• Procedures for test recovery: In an on-premises system, testing is to be done to ensure that the
workload works in a specific situation. Identify the reason for the failure of workload and

evaluate your recovery processes in the cloud.

You can use automation to mimic various failures or reproduce circumstances that have
previously resulted in a loss. This method reveals failure paths that may be tested and fixed
before an actual failure occurs, lowering risk.
Scaling of workload availability: To decrease the impact of a single failure on the total
workload, replace one resource with several tiny ones. Distribute requests among several
smaller resources to avoid having a single point of failure.
Stop speculating on capacity: When the demands placed on a workload exceed the
capacity, it is called resource saturation; it is a common cause of failure in on-premises
workloads, which needs to be avoided.
You can monitor requirement and workload utilization in the cloud and automate the addition
and removal of resources to keep demand at an appropriate level without over- or
underprovisioning.
Implement changes using automation: Make automated changes to your infrastructure.
Handle automation changes and these changes may then be monitored and assessed.

41
Performance Efficiency
Performance efficiency involves the capacity to employ computer resources efficiently to fulfill
system needs and to maintain the efficiency as demand changes and technology advances.
Design Principles:
Employ latest technologies: Delegate complicated duties to your cloud vendor to make
technically advanced deployment easier for your team. Recognize technology as a service
rather than just requiring your IT personnel to learn about hosting and administering a new
technology.
NoSQL, database, media transcoding, and machine learning are services that your team may
use in the cloud, allowing them to concentrate on product creation rather than resource
allocation and administration.
Take it to the international level: By employing your workload across various AWS
regions across the world, you may offer your clients lower latency and a better experience at a
cheaper cost.

• Use serverless architectures: For typical computational operations, serverless


architectures eliminate the need to run and manage physical servers.

o Serverless storage applications, for instance, can serve as static web pages without

the need for web servers, while event services can host code.

• Perhaps because managed services run at a cloud-scale, this alleviates the operational
strain of managing physical servers while also potentially lowering transaction costs.

• Prototype frequently: With virtualized and automatable assets, you can easily

compare alternative types of instances, storage, or settings.

The best practices of performance efficiency are selection, review, monitoring, and tradeoffs.

9.6 :-Cost Optimization

The capacity to manage systems that offer business value at the lowest cost is part of cost

optimization.

Design Principles:

42
Deploy Cloud Financial Management: Investing in Cloud Financial Management or
cost optImization will help you to achieve financial success and increase the company value in
the cloud.
Your company must devote time and money to developing capabilities in this new technology
and use the management sector.
To become a cost-effective company, you must create opportunities through knowledge
building, programmes, resources, and workflows, similar to your security or operational
excellence capabilities.

Employ a consumption model: Spend just for the software applications that you need and
adjust use based on business needs rather than relying on extensive forecasts.
During the workweek development, generally, environments are used for eight hours a day only.
You may save up to 75 percent on these resources by turning them off when not in use.

Measure total efficiency: Calculate the workload business output and the expenses
involved with completing it. Use this metric to see how much money you save by boosting
results and cutting costs.

Verify expenses: The cloud makes it simpler to precisely identify system use and
expenditures, allowing for clear IT cost attribution to individual task owners. This enables the
professionals to track their return on investment (ROI) and manage their resources while
lowering expenses.

Fig-18:CloudArchitecture
43
Chapter 10 Auto Scaling and Monitor

10.1 :Introduction:

Autoscaling is a cloud computing feature that enables organizations to scale cloud services
such as server capacities or virtual machines up or down automatically, based on defined
situations such as traffic ir utilization levels. Cloud computing providers, such as
Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), offer
autoscaling tools.
Core autoscaling features also allow lower cost, reliable performance by seamlessly increasing
and decreasing new instances as demand spikes and drops. As such, autoscaling provides
consistency despite the dynamic and, at times, unpredictable demand for applications.
The overall benefit of autoscaling is that it eliminates the need to respond manually in realtime
to traffic spikes that merit new resources and instances by automatically changing the active
number of servers. Each of these servers requires configuration, monitoring and
decommissioning, which is the core of autoscaling.
For instance, when such a spike is driven by a distributed denial of service (DDoS) attack, it
can be difficult to recognize. More efficient monitoring of autoscaling metrics and better
autoscaling policies can sometimes help a system respond quickly to this issue. Similarly, an
auto scaling database automatically scales capacity up or down, starts up, or shuts down based
on the needs of an application.

10.2-Elastic Load Balancing:

 This service distributes application traffic across services.


 The Load Balancer is a single point of contact for incoming web traffic.
 The single point of contact means that the traffic hits the Load Balancer first,
spreading out the load between the resources.
 The balancer accepts requests and directs them to the appropriate instances.
 It ensures that one resource won't get overloaded, and that the traffic is spread
out.
 AWS EC2 and Elastic Load Balancing are two different services that work well
together.
 AWS ELB is built to support increased traffic without increasing the hourly cost.
 AWS ELB scales automatically.

10.2-AWS EC2 Auto Scaling:


• Servers can get more requests than they can handle.
44
• Too many requests can cause timeouts and outages.
• AWS EC2 Auto Scaling allows you to add or remove EC2 instances automatically.
• It automates the capacity to the demand.  There are two approaches:
• Dynamic scaling: responds to changing demand
• Predictive scaling: schedules the number of instances based on a predicted demand
• Dynamic and Predictive scaling can be combined to scale faster
• EC2 Auto Scaling can be added as a buffer on top of your instances.

• It can add new instances to the application when necessary and terminate them when no
longer needed.
• You can set up a group of instances. • The Auto Scaling groups allow you to have a
dynamic environment.

Fig19-AWS EC2 Auto Scaling

45
Chapter-11 Introduction to Machine learning

11.1 :What is Machine Learning?


Machine Learning is a branch of the broader field of artificial intelligence that makes use of
statistical models to develop predictions. It is often described as a form of predictive modelling or
predictive analytics and traditionally, has been defined as the ability of a computer to learn without
explicitly being programmed to do so. Artificial Intelligence Artificial intelligence is the simulation of
human intelligence processes by machines, especially computer systems. Al is important because it can
give enterprises insights into their operations that they may not have been aware of previously and
because, in some cases, Al can perform tasks better than humans. Particularly when it comes to
repetitive, detail-oriented tasks like analyzing large numbers of legal documents to ensure.

Fig 20-Machine Learning

11.2 : Business problems solved with machine learning


There are different ways to train machine learning algorithms, each with their own advantages and
disadvantages. There are also some types of machine learning algorithms that are used in very specific
use-cases, but three main methods are used today. Supervised Learning Supervised learning is one of
the most basic types of machine learning. In this type, the machine learning algorithm is trained on
labelled data. Even though the data needs to be labelled accurately for this method to work, supervised
learning is extremely powerful when used in the right circumstances. In supervised learning, the ML
algorithm is given a small training dataset to work with. This training dataset is a smaller part of the
bigger dataset and serves to give the algorithm a basic idea of the problem, solution, and data points to
be dealt with. The training dataset is also very similar to the final dataset in its characteristics and
provides the algorithm with the labelled parameters required for the problem. There are two primary
applications for supervised machine learning: classification challenges and regression problems.
Classification is the process of converting an input value to a single value. In classification tasks, we
often produce classes or categories as output. Unsupervised Learning Unsupervised machine learning
holds the advantage of being able to work with unlabelled data. This means that human labour is not

46
required to make the dataset machinereadable, allowing much larger datasets to be worked on by the
program. In supervised learning, the labels allow the algorithm to find the exact nature of the
relationship between any two data points. Instead of a defined and set problem statement, unsupervised
learning algorithms can adapt to the data by dynamically changing hidden structures. This offers more
post-deployment development than supervised learning algorithms. Some of the applications are
Products Segmentation, Customer Segmentation, Similarity Detection, Recommendation Systems,
Labelling unlabelled datasets, etc. Reinforcement Learning It directly takes inspiration from how
human beings learn from data in their lives. It features an algorithm that improves upon itself and learns
from new situations using a trial-and-error method. Favourable outputs are encouraged or 'reinforced',
and non-favourable outputs are discouraged or 'punished'. In typical reinforcement learning use-cases,
such as finding the shortest route between two points on a map, the solution is not an absolute value.
Instead, it takes on a score of effectiveness, expressed in a percentage value. The higher this percentage
value is, the more reward is given to the algorithm. Thus, the program is trained to give the best
possible solution for the best possible reward. Some of the applications are autonomous cars, image
processing, robotics, NLP, marketing, gaming, etc.

11.3 :Machine learning process


The purpose of a machine learning pipeline is to outline the machine learning model
process, a series of steps which take a model from initial development to deployment and
beyond. The benefits of a machine learning pipeline include:
=>Mapping a complex process which includes input from different specialisms, providing a
holistic look at the whole sequence of steps.
=>Focusing on specific steps in the sequence in isolation, allowing the optimisation or
automation of individual stages.
=>The first step in transforming a manual process of machine learning development to an
automated sequence.
Common sections of the machine learning pipeline include:

* Data collection and cleaning


*Data validation
* Training of the model
* Evaluation and validation of the model
* Optimisation and retraining

11.4 Machine Learning tools overview

Python tools and libraries:


1) Jupyter Notebook
2) Jupyter lab
3) Pandas
4) Matplotlib
5) Seaborn
6) NumPy

47
7) Scikit-learn
Machine Learning frameworks provide tools and code libraries:
• Customized scripting
• Integration with AWS services
• Community of developers 11.5 Machine Learning challenges Challenges:
Data :
• Poor quality
• Non-Representative o Insufficient
• Overfitting and underfitting  Business:
• Complexity in formulating questions
• Explaining models to the business • Cost of building  systems Users:
• Lack of data science expertise
• Cost of staffing with data scientists
• Lack of management support  Technology:
• Data privacy issues o Tools selection can be complicated Integration with other systems
*Using existing models and services: -> Amazon ML managed services
->No ML experience needed
->Use existing trained and tuned machines
->Enhance with domain specific instances
-> Over 250 ML model packages and algorithms
-> Over 14 industry segments

Fig21:Types of Machine Learning

48
Chapter 12 Implementation of Machine Learning

12.1 Formulating Machine Learning Problems


The process of machine learning consists of several steps. First, data is collected then, a model
is selected or created; finally, the model is trained on the collected data and then applied to new
data. This process is often referred to as the machine learning pipeline. Problem formulation is
the second step in this pipeline and it consists of selecting or creating a suitable model for the
task and determining how to represent the collected data so that it can be used by the selected
model. Some common examples of problem formulations in machine learning are:
>> Classification: Given an input data point predict its category level >> Regression: Given
an input data point, predict a continuous output value.
>> Prediction: Given an input sequence, predict the next value in the sequence.
>> Anomaly detection: Given an input data point, decide whether it is normal.
>> Recommendation: Given information about users and items, recommended items to users
>> Optimization: Given a set of constraints and objectives to find the best solution
Implications of AI and Machine Learning
The Workforce: The rise of AI and machine learning will change the skill set and work
opportunities for people across industries.
Data Handling : The ability to extract and analyze large amounts of data will enable
unprecedented insights and capabilities.
The Human-Machine Interface: As AI and machines become more like humans, we will need
to consider how to integrate this technology in a way that is intuitive and seamless.
Hard Facts about AI implementation in business
• More than 9 out of 10 (91%) top businesses report having an ongoing investment in AIOpen s
a new w in dow as noted in a NewVantage, 2022 research.
• According to a McKinsey survey, high-performing companies attribute most of their profits to
their integration and use of AIOpens a new window .
• A Forbes report discusses that 2022w window could see the “collective shift away from point
solutions toward holistic platforms that offer a suite of business solutions.”

How Can AI Adoption Help Businesses?


• Automating business processes (administrative and financial activities that do need to be done
manually)
• Generating insights with data analysis (using algorithms to find patterns in vast volumes of data
for further predictions)
• Engagement with customers and employees (chatbots, intelligent agents)

49
AI Implementation In Business: Challenges
The main stumbling block in adopting AI for business is that organizations trying to
adopt AI solutions are often complex, making integration and implementation challenging. An
approach recommendedOpens a n ew win dow by McKinsey consultants Tim Fontaine, Brian
McCarthy, and Tamim Saleh is first to consider using AI to reimagine just one crucial business
process or function. They advocate carefully rethinking how that one key business function can
benefit from AI rather than attempting to implement AI solutions across the company.
Businesses need to rethink their business models to benefit from AI in total volume.
You can’t just plug AI into an existing process and expect positive results or valuable insights.
Cybersecurity is still one of the most challenging areas of AI implementation.
Organizations do not have one cyber standard covering everything under one umbrella.
Another point worthy of note is that AI systems often become targets for hackers. The more
complex your AI systems are, the more potential threats to the system

12.2 Collecting and Securing data

Data collection:
Collecting data for training the ML model is the basic step in the machine learning
pipeline. The predictions made by ML systems can only be as good as the data on which they
have been trained. Following are some of the problems that can arise in data collection:
• Inaccurate data. The collected data could be unrelated to the problem statement.
Missing data. Sub-data could be missing. That could take the form of empty values in
columns or missing images for some class of prediction.
• Data imbalance. Some classes or categories in the data may have a disproportionately
high or low number of corresponding samples. As a result, they risk being
underrepresented in the model.
• Data bias. Depending on how the data, subjects and labels themselves are chosen, the
model could propagate inherent biases on gender, politics, age or region, for example.
Data bias is difficult to detect and remove.

12.2.1 : Extracting, transforming, and loading data

ETL stands for Extract, Transform and Load. It's a generic process in which data is firstly
acquired, then changed or processed and is finally loaded into data warehouse or databases or
other files such as PDF, Excel. You can extract data from any data sources such as Files, any
RDBMS/NoSql Database, Websites or real-time user activity, transform the acquired data and
then load the transformed data into a data warehouse for business uses such as reporting or
analytics.
ETL is a 3 steps process:
>> Extracting Data from single or multiple Data Sources
> >Transforming Data as per business logic. Transformation is in itself a two steps process-
data cleansing and data manipulation.

50
>>Loading transformed data into the target data source or data warehouse Key reasons
for using ETL are:
>> Visualizing your entire data flow pipeline which helps business taking critical business
decisions.
>> Transactional databases cannot answer complex business questions that can be answered
by ETL.
>>ETL provides a method of moving the data from various sources into a data warehouse.

12.2.2 : Securing Data


Machine learning in security constantly learns by analyzing data to find patterns,
allowing us to better detect malware in encrypted traffic, identify insider threats, predict
where "bad neighbourhoods" are online to keep people safe while browsing and protect data
in the cloud by uncovering suspicious user behaviour. We can secure the data by using: >>
Finding Network Threats >> Protecting Cloud Data.
>> Encrypting Data.
>> Evading Hacker Attacks.
>> Facilitating Endpoint Security

12.3 : Evaluating data


12.3.1 : Describing your data
Descriptive comes from the word 'describe' and so it typically means to describe
something. Descriptive statistics is essentially describing the data through methods such as
graphical representations, measures of central tendency and measures of variability. It
summarizes the data in a meaningful way which enables us to generate insights from it. The
data can be both quantitative and qualitative in nature. Quantitative data is in numeric form ,
which can be discrete that includes finite numerical values or continuous which also takes
fractional values apart from finite values. To describe and analyse the data, we would need to
know the nature of data as it the type of data influences the type of statistical analysis that can
be performed on it.
Data can be described by using the following methods:
>>Line Plot: A line plot is a graphical representation for understanding the shape or
trend of a distribution. It is constructed by joining the mid points of the bins of a histogram and
connecting them with lines
>>Scatter Plots: Till now we saw how we can describe a variable using the number of
times they occur in the data. But what if we want to describe two variables so as to check if
there is a relationship between them. Scatter plot is a very basic and essential way of doing
that.
>>Histogram: Sometimes, it is better to represent a data by taking range of values
rather than every individual value. We can get an idea of the shape and spread of the
continuous data through a histogram.

12.3.2 : Finding Correlations


Correlation explains how one or more variables are related to each other. These
variables can be input data features which have been used to forecast our target variable. It
51
gives us the idea about the degree of the relationship of the two variables. It's a bi-variate
analysis measure which describes the association between different variables.
For example: No of testing vs no of positive cases in Corona.
1. If two variables are closely correlated, then we can predict one variable from the
other.
2. Correlation plays a vital role in locating the important variables on which other
variables depend.
3. It's used as the foundation for various modelling techniques.
4. Proper correlation analysis leads to better understanding of data.
5. Correlation contribute towards the understanding of causal relationship. Generally
there are three types of correlations.
>>Positive Correlation: Two features (variables) can be positively correlated with each other. It
means that when the value of one variable increase then the value of the other variable(s) also
increases. >>Negative Correlation: Two features (variables) can be negatively correlated with
each other. It means that when the value of one variable increase then the value of the other
variable(s) decreases >>No Correlation: Two features (variables) are not correlated with each
other. It means that when the value of one variable increase or decrease then the value of the
other variable(s) doesn't increase or decreases.
12.4 : Feature engineering
Feature engineering refers to manipulation — addition, deletion, combination,
mutation of your data set to improve machine learning model training, leading to better
performance and greater accuracy. Effective feature engineering is based on sound knowledge
of the business problem and the available data sources.

12.4.1 : Cleaning your data


Data cleaning is one of the important parts of machine learning. It plays a significant
part in building a model. If we have a well-cleaned dataset, there are chances that we can get
achieve good results with simple algorithms also, which can prove very beneficial at times
especially in terms of computation when the dataset size is large.
Steps involved in Data Cleaning:
>> Removal of unwanted observations: This includes deleting duplicate/ redundant or
irrelevant values from your dataset. Duplicate observations most frequently arise during data
collection and Irrelevant observations.
>> Fixing Structural errors: The errors that arise during measurement, transfer of data, or
other similar situations are called structural errors.
>>Managing Unwanted outliers: Outliers can cause problems with certain types of
models. For example, linear regression models are less robust to outliers than decision tree
models.
>> Handling missing data : Missing data is a deceptively tricky issue in machine
learning. We cannot just ignore or remove the missing observation. They must be handled
carefully as they can be an indication of something important.

52
12.4.2 : Dealing with outliers and selecting features
In machine learning, however, there's one way to tackle outliers: it's called "one-class
classification" (OCC). This involves fitting a model on the "normal" data, and then predicting
whether the new data collected is normal or an anomaly. There are 3 different categories of
outliers in machine learning:
>> Global Outliers: Type 1
The Data point is measured as a global outlier if its value is far outside the entirety of the
data in which it is contained.
>> Contextual or Conditional Outliers: Type 2
Contextual or conditional outliers are data sets whose value considerably diverges from
other data points within a similar context. The "context" is approximate all the time temporal in
time-series data sets, like the records of a detailed extent over time. >> Collective
Outliers: Type 3
A division of data points in a data set is measured abnormal if those values as a group
deviate significantly from the whole data set, but the values of a single data point are not
themselves abnormal in whichever contextual or global logic. In time series data sets, one way
this can be noticeable is as usual peaks and valleys happening outside of a time frame when
that seasonal sequence is usual or as a grouping of time series that is in an outlier condition as a
collection. Selecting features: Feature Selection is the method of reducing the input variable to
your model by using only relevant data and getting rid of noise in data. It is the process of
automatically choosing relevant features for your machine learning model based on the type of
problem you are trying to solve.

12.5 : Evaluating the accuracy of the model


Accuracy score in machine learning is an evaluation metric that measures the number of
correct predictions made by a model in relation to the total number of predictions made. We
calculate it by dividing the number of correct predictions by the total number of predictions.

12.5.1 : Calculating Classification metrics


Classification is a supervised learning task in which we try to predict the class or label
of a data point based on some feature values. Depending on the number of classes target
variable includes, it can a binary or multi-class classification. Evaluating a machine learning
model is just as important as building it. In this post, we will go over 10 metrics for evaluating
the performance of a classification model.
* Classification accuracy
* Confusion matrix
* Precision Recall
* Fl score Log loss
* Sensitivity Specificity
* ROC curve
* AUC

53
12.5.2 : Selecting Classification thresholds
A logistic regression model that returns 0.9995 for a particular email message is
predicting that it is very likely to be spam. Conversely, another email message with a prediction
score of 0.0003 on that same logistic regression model is very likely not spam. However, what
about an email message with a prediction score of 0.6? In order to map a logistic regression
value to a binary category, you must define a classification threshold (also called the decision
threshold). A value above that threshold indicates "spam"; a value below indicates "not spam."
It is tempting to assume that the classification threshold should always be 0.5, but thresholds
are problem-dependent, and are therefore values that you must tune.

Chapter 13 Introducing Forecasting

13.1 : Forecasting overview

ML forecasting algorithms often use techniques that involve more complex features and
predictive methods, but the objective of ML forecasting methods is the same as that of
traditional methods — to improve the accuracy of forecasts while minimizing a loss function.
The loss function is usually taken as the sum of squares due to errors in prediction/ forecasting.

Some examples of ML forecasting models used in business applications are:

• Artificial neural network


• Long short-term-memory-based neural network
• Random forest
• Generalized regression neural networks
• K-nearest neighbours regression
• Classification and regression trees (CART)

13.2 : Processing time series data

13.2.1 : Special considerations for time series data

Time series is an important part of decision making and several results are constructed
on estimates of forthcoming events. In the Time series process, since, for the coming actions
include probability; the predictions are frequently not perfect. The objectives of the Time series
are to decrease the prediction error; to create predictions that are rarely improper and that have
minor prediction mistakes. In selecting a suitable Time series model, the researcher wants to be
54
responsive that numerous altered models may have comparable properties. A good model will
fit the data well. The goodness of fit recovers as additional limits are involved in the model.
This arises a problem with the ARMA (p, q) models, because p and q take low values.

13.3 : Using Amazon Forecasting


Amazon Forecast is considered a serverless service. You don't have to manage any compute
instances to use it. Since it is serverless, you can create multiple scenarios simultaneously —
up to three at once. There is no reason to do this in series; you can come up with three
scenarios and fire them off all at once.

General workflow:
1 .Create a Dataset Group. This is just a logical container for all the datasets you're going to use
to create your predictor.
2. Import your source datasets. A nice thing here is that Amazon Forecast facilitates the use of
different "versions" of your datasets. As you go about feature engineering, you are bound to
create different models that will be based on different underlying datasets. This is absolutely
crucial for the process of experimentation and iteration.

3. Create a predictor. This is another way of saying "create a trained model on your source
data."

4. Create a forecast using the predictor. This is where you actually generate a forecast looking
into the future

55
Chapter 14 Introducing Computer Vision

14.1 : Introduction to Computer Vision


Computer vision is simply the process of perceiving the images and videos available in
the digital formats. In Machine Learning (ML) and Al — Computer vision is used to train the
model to recognize certain patterns and store the data into their artificial memory to utilize the
same for predicting the results in real-life use.
The main purpose of using computer vision techn010U in ML and Al is to create a model that
can work itself without human intervention. The whole process involves methods of acquiring
the data, processing, analyzing, and understanding the digital images to utilize the same in the
real-world scenario.
It can recognize the patterns to understand the visual data feeding thousands or millions of images
that have been labelled for supervised machine learning algorithms training.

14.2 : Image and video analysis


Image classification is a supervised learning problem: define a set of target classes (objects to
identify in images), and train a model to recognize them using labelled example photos. Early
computer vision models relied on raw pixel data as the input to the model. Image Processing
(IP) is a computer techn010U applied to images that helps us process, analyze and extract
useful information from them. ML image processing techniques:
Medical Imaging / Visualization: Help medical professionals interpret medical imaging and
diagnose anomalies faster. Law Enforcement & Security: Aid in surveillance & biometric
authentication.
Self-Driving Technologr : Assist in detecting objects and mimicking human visual cues &
interactions.

Gaming: Improving augmented reality and virtual reality gaming experiences.

56
Image Restoration & Sharpening: Improve the quality of images or add popular filters etc.
Pattern Recognition: Classify and recognize objects/ patterns in images and understand
contextual information.
Video analysis is a field within computer vision that involves the automatic interpretation of
digital video using computer algorithms. Although humans are readily able to interpret digital
video, developing algorithms for the computer to perform the same task has been highly
evasive and is now an active research field. Applications include tracking people who are
walking; interpreting actions of moving objects and people; and using the technology to
replace the array of screens used in monitoring high risk environments, such as airport security.
Fundamental problems in video analysis include denoising, searching for events in video, object
extraction.

14.2.1 : Facial Recognition


The most common type of machine learning algorithm used for facial recognition is a deep
learning Convolutional Neural Network (CNN). CNNs are a type of artificial neural network
that are well-suited for image classification tasks. Once a CNN has been trained on a dataset of
facial images, it can be used to identify faces in new images.

This process is called facial recognition. Face recognition is divided into three steps:
>>Face Alignment and Detection — The first step is to detect faces in the input image. This
can be done using a Haar Cascade classifier, which is a type of machine learning algorithm that
is trained on positive and negative images.
>>Feature Measurement and Extraction — Once faces have been aligned and detected, the next
step is to extract features from them. This is where the Convolutional Neural Network (CNN)
comes in.
>>Face Recognition — The last step is to match the extracted features with faces in a database.
This is usually done using a Euclidean distance metric, which measures the similarity between
two vectors.

14.3 : Preparing custom datasets for computer vision

14.3.1 : Creating the training dataset


Steps for Preparing Good Training Datasets:
>> Identify your goal: The initial step is to pinpoint the set of objectives that you want to
achieve through a machine learning application. This can help you identify salient points like
the most suitable model architecture, training techniques, data collection, dataset annotation,
and performance optimization methods to use for resolving problems relevant to your main
challenge. >>Se1ect Suitable Algorithms: There are several different algorithms that are
suitable for training artificial neural networks. So you need to pinpoint the best architecture for
your model. We will only cover general categories of these algorithms and also list down

57
specific methods. This is to ensure that we remain focused on the topic that we're covering
right now, which is creating datasets for your machine learning applications.
>>Determine Cost-Effective Data Collection Strategies: There are many public and private
groups like universities, social organizations, companies and independent developers that offer
paid or free access to their datasets.
>>ldentify the Right Dataset Annotation Methods: At this point, you likely have a dataset that
contains a combination of open source, premium and custom content. This means your next
step is to ensure that your entire dataset is annotated properly.
>>Optimize Your Dataset Annotation & Augmentation Workflow: A dataset annotation tool can
enable you to reduce the time and expenses that you need for preparing your dataset. Kili
Technology provides a platform where you can outsource these requirements to both your
inhouse staff and remote workforces. This multipurpose annotation tool can also enable your
management team and quality assurance specialists to monitor the output of your virtual agents
and in-house employees.
>>C1ean Up Your Dataset: Consistency is important in enhancing the effectiveness of your
dataset for training your machine learning model. This means you need to make sure that the
content of your datasets are in a similar format, has minimal unnecessary noise (yes, noise is
sometimes helpful for certain use cases), and only contains annotated features that are
significant in helping your model learn during the training process.
>>C10se1y Monitor Model Training: By carefully studying training logs of your machine
learning model, you'll be able to properly adjust the myriad of ways that you can do to clean up
your dataset. Plus, you'll be able to re-configure the hyperparameters of your model
architecture and training algorithms. This can also allow you to optimize and clean up your
dataset through your annotation tool in more straightforward ways

14.3.2 : Creating the test dataset

Once we train the model with the training dataset, it's time to test the model with the test
dataset. This dataset evaluates the performance of the model and ensures that the model can
generalize well with the new or unseen dataset. The test dataset is another subset of original
data, which is independent of the training dataset. However, it has some similar types of
features and class probability distribution and uses it as a benchmark for model evaluation once
the model training is completed. Test data is a well-organized dataset that contains data for
each type of scenario for a given problem that the model would be facing when used in the real
world. Usually, the test dataset is approximately 20-25% of the total original data for an ML
project.
At this stage, we can also check and compare the testing accuracy with the training accuracy,
which means how accurate our model is with the test dataset against the training dataset. If the

58
accuracy of the model on training data is greater than that on testing data, then the model is
said to have overfitting.

The testing data should:

Represent or part of the original dataset.

It should be large enough to give meaningful predictions.

14.3.3 : Evaluate and improve your model


Evaluation metrics are used to measure the quality of the statistical or machine learning model.
The idea of building machine learning models works on a constructive feedback principle.
Evaluation metrics explain the performance of a model.
To make sure your model learns it is significant to use multiple evaluation metrics to evaluate
the model. Because a model may perform well using one evaluation metric while the
performance may reduce for a different evaluation metric.
Overfitting and underfitting are the two biggest causes for poor performance of machine learning
algorithms.
>>Overfitting: Occurs when the Model performs well for a parti

cular set of data (Known Data), and may therefore fail to fit additional data (Unknown Data) or
predict future observations reliably.
>>Underfitting: Occurs when the model cannot adequately capture the underlying structure of
the data.
>>Genera1ization: Generalization refers to how well the concepts learned by a machine learning
model apply to specific examples not seen by the model when it was learning.

Data:

>> Reducing false positives (better precision)

• Adjust the confidence threshold to improve precision

• Add additional classes as labels for training

>> Reducing false negatives (better recall)

• Lower the confidence threshold to improve recall

• Use better or more precise classes (labels) for training

59
CHAPTER 15 Introducing NLP

15.1 : Overview of natural language processing

Natural language processing (NLP) is a field of artificial intelligence in which


computers analyze, understand, and derive meaning from human language in a smart and
useful way. By utilizing NLP, developers can organize and structure knowledge to perform
tasks such as automatic summarization, translation, named entity recognition, relationship
extraction, sentiment analysis, speech recognition,and topic segmentation.
NLP is used to analyze text, allowing machines to understand how humans speak. This
human-computer interaction enables real-world applications like automatic text summarization,
sentiment analysis, topic extraction, named entity recognition, parts-ofspeech tagging,
relationship extraction, stemming, and more. NLP is commonly used for text mining, machine
translation, and automated question answering.

Examples of natural language processing:


NLP algorithms are typically based on machine learning algorithms. Instead of hand-coding
large sets of rules, NLP can rely on machine learning to automatically learn these rules by
analyzing a set of examples (i.e. a large corpus, like a book, down to a collection of sentences),
and making a statistical inference. In general, the more data analyzed, the more accurate the
model will be.

Example NLP algorithms:


60
Get a feel for the wide range of NLP use cases with these example algorithms:
Summarize blocks of text using Summarizer to extract the most important and central ideas while
ignoring irrelevant information.
Create a chatbot using Parsey McParseface, a language parsing deep learning model made by
Google that uses point-of-speech tagging.
Identify the type of entity extracted, such as it being a person, place, or organization using Named
Entity Recognition.
Sentiment Analysis, based on Stanford NLP, can be used to identify the feeling, opinion, or
belief of a statement, from very negative, to neutral, to very positive. Often, developers will
use an algorithm to identify the sentiment of a term in a sentence, or use sentiment analysis to
analyze social media.

15.2 : Natural language processing managed services


Our data analysts combine your business context with their understanding of language,
syntax, and sentence structure to accurately parse and tag text according to your specifications.
We can extract meaning from raw audio and text data to advance your NLP project.
• Content Enrichment
• Data cleansing
• Taxonomy Creation
• Aspect Mining
• Categorization
• Intent Recognition
• Syntax analysis
• Topic analysis
• Entity Recognition

Fig22-Natural language processing managed services

61
Conclusion
AI is at the centre of a new enterprise to build computational models of intelligence.
The main assumption is that intelligence (human or otherwise) can be represented in terms of
symbol structures and symbolic operations which can be programmed in a digital computer.
There is much debate as to whether such an appropriately programmed computer would be a
mind, or would merely simulate one, but AI researchers need not wait for the conclusion to that
debate, nor for the hypothetical computer that could model all of human intelligence.

Aspects of intelligent behaviour, such as solving problems, making inferences, learning,


and understanding language, have already been coded as computer programs, and within very
limited domains, such as identifying diseases of soybean plants, AI programs can outperform
human experts. Now the great challenge of AI is to find ways of representing the commonsense
knowledge and experience that enable people to carry out everyday activities such as holding a
wide-ranging conversation, or finding their way along a busy street.

Conventional digital computers may be capable of running such programs, or we may


need to develop new machines that can support the complexity of human thought.

AI and machine learning have the power to revolutionize the world as we


know it, but they come with significant implications and challenges. As we move
forward, we need to ensure that we are using them ethically and for the benefit of
society as a whole

62
AppendixA
INDUSTRIALINTERNSHIPEVALUATIONFORM
FortheStudentsofB.Tech.(IT),SasiInstituteof
Technology &Engineering, Tadepalligudem,
WestGodavariDistrict,AndhraPradesh

Date:
NameoftheIntern : D.VARA LAKSHMI

Reg.No. : 21K61A0618

Branch : COMPUTER SCIENCE


AND TECHNOLOGY
InternshipOffered : From may-july 2023
Evaluate this student intern on the following parameters by checking the
appropriate attributes.

Attributes
Evaluation Give Your Feedback with Tick Mark
Parameters (√ )
Excellent Very Good Good Satisfactory Poor

Attendance
(Punctuality)
Productivity
(Volume, Promptness)
Quality of Work
(Accuracy, Completeness,
Neatness)
Initiative
(Self-Starter, Resourceful)
Attitude
(Enthusiasm, Desire to Learn)

63
Interpersonal Relations
(Cooperative, Courteous,
Friendly)

Ability to Learn
(Comprehension of New
Concepts)
Use of Academic
Training (Applies
Education to Practical
Usage)

Communications Skills
(Written and Oral Expression)
Judgement
(Decision Making)

64
Please summarize. Your comments are especially helpful.

Areas where student excels:

Areas where student needs to improve:

Areas where student gained new skills, insights, values, confidence, etc.

Was student’s academic preparation sufficient for this internship?

Additional comments or suggestions for the student:

65
Points
Awarded
Overall Evaluation of the Intern’s Performance
(Evaluation Scale shown below)

Evaluation Scale:
Attributes Excellent Very Good Satisfactory Poor
Good
Points

Name of Officer In- :


charge
(Guide/Supervisor)
Designation :

Signature of Officer In-


charge
(Guide/Supervisor)

66
Appendix B

PO's and PSO's relevance with Internship Work

Program outcomes Relevance


Engineering Knowledge: Apply Applied basic knowledge of
knowledge of mathematics ,science, engineering to understand about
PO1
engineering fundamentals and an entrepreneurship
engineering specialization to the solution

of complex engineering problems


Problem Analysis: Identify, formulate Performed research in various
research literature and analyze complex ways to analyze problems and find
PO2
engineering problems reaching a soution
substantiated conclusions using first
principles of mathematics, natural

sciences and engineering sciences.


Design/development of solutions: Design Able to understand the market
solutions for complex engineering strategies and problems in the
PO3
problems and design systems components society
or processes that meet specified need with
appropriate consideration for public health
and safety, cultural, societal and
environmental

considerations.
Conduct investigations of complex Investigation of various problems
problems: Research based knowledge and of farmers
PO4
research methods including design of
experiments, analysis and interpretation of
data and synthesis

of information to provide valid conclusions.


Modern tool usage: Create, select and Used many of the Tremendous
apply appropriate techniques, resources tools for Development Process
PO5
and modern engineering and it tools
including prediction and modelling to
complex engineering activities

1
with an understanding of the limitations.

The engineer and society : Apply It can be Implemented in


reasoning informed by contextual various real-world problems
PO6
knowledge to asses societal, health, safety,
legal and cultural issues and consequent
responsibilities relevant to
professional engineering practice

Environment and sustainability:


Understand the impact of the professional
PO7
engineering solutions in societal and
environmental contexts, and demonstrate
the knowledge and need for sustainable -----------
development

Ethics: Apply ethical principles and Able to identify standard norms


commit to professional ethics and
PO8
responsibilities and
norms of the engineering practice.

Individual and team work: Function It is an Individual/Team work that


effectively as an individual, and as a solves problem through
PO9
member or leader in diverse teams, and in technology
multidisciplinary

settings.
Communication: Communicate Prepared & documented
effectively on complex engineering summer internship
PO10
activities with the engineering community report
and with society at large, such as, being on Technology
able to comprehend and write effective Entrepreneurship Program
reports and design documentation, make
effective presentations,
and give and receive clear instructions.

2
Project management and finance: It is a one-year training process
Demonstrate knowledge and conducted by Indian School of
PO11
understanding of the engineering and Business With heavy costing.
management principles and apply these to
one’s own work, as a member and leader in
a team, to manage projects and in

multidisciplinary environments.

Life-long learning: Recognize the need It is a endless learning procedure


for and have the preparation and ability to because entrepreneur should learn
PO12
engage in independent and life-long everyday from everything.
learning in the

broadest context of technological change.


Application Development An application that helps farmers
PSO1

Successful career and Entrepreneurship


PSO2

You might also like