0% found this document useful (0 votes)
300 views64 pages

20121a3226 Internship Report

The document discusses an internship report submitted for a degree in computer science and engineering. It covers two courses - AWS Cloud Foundations which discusses cloud computing concepts, services, security, pricing and machine learning foundations which describes machine learning processes and tools.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
300 views64 pages

20121a3226 Internship Report

The document discusses an internship report submitted for a degree in computer science and engineering. It covers two courses - AWS Cloud Foundations which discusses cloud computing concepts, services, security, pricing and machine learning foundations which describes machine learning processes and tools.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 64

AI-ML Virtual Internship

An Internship report submitted to

Jawaharlal Nehru Technological University Anantapur, Anantapuramu


In partial fulfilment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

Submitted by
Komma Manjunath Reddy
20121A3226
IV B.Tech II Semester
Under the esteemed supervision of

Dr. G. Sunitha
Professor

Department of Computer Science and Engineering

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SREE VIDYANIKETHAN ENGINEERING COLLEGE


(AUTONOMOUS)

(Affiliated to JNTUA, Anantapuramu and approved by AICTE, New Delhi)


Accredited by NAAC with A Grade
Sree Sainath Nagar, Tirupati, Chittoor Dist. - 517 102, A.P, INDIA.

2023 - 2024.
SREE VIDYANIKETHAN ENGINEERING COLLEGE
(AUTONOMOUS)
Sree Sainath Nagar, A. Rangampet

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Certificate

This is to certify that the internship report entitled “AI-ML Virtual Internship” is the

bonafide work done by Komma Manjunath Reddy ( Roll No:20121A3226 ) in the Department of

Computer Science and Engineering, and submitted to Jawaharlal Nehru Technological

University Anantapur, Anantapuramu in partial fulfillment of the requirements for the award of

the degree of Bachelor of Technology in Computer Science during the academic year 2023-2024.

Head:

Dr. B. Narendra Kumar Rao


Professor & Head
Dept. of CSE

INTERNAL EXAMINER EXTERNAL EXAMINER


COMPLETION CERTIFICATE FROM COMPANY
ABSTRACT

This is a two-phase course comprising of CLOUD FOUNDATIONS and MACHINE LEARNING.

AWS Academy Cloud Foundations is intended for students who seek an overall understanding of
cloud computing concepts, independent of specific technical roles. It provides a detailed overview
of cloud concepts, AWS core services, security, architecture, pricing, and support. Machine
learning is the use and development of computer systems that can learn and adapt without following
explicit instructions, by using algorithms and statistical models to analyse and draw inferences from
patterns in data.

In this course, we will learn how to describe machine learning (ML), which includes how to
recognize how machine learning and deep learning are part of artificial intelligence, it also
describes artificial intelligence and machine learning terminology. Through this we can identify
how machine learning can be used to solve a business problem. We can also learn how to describe
the machine learning process in detail and the list the tools available to data scientists to identify
when to use machine learning instead of traditional software development methods. Implementation
of a machine learning pipeline, which includes learning how to formulate a problem from a
business request, obtain and secure data for machine learning, use Amazon Sage Maker to build a
Jupyter notebook, outline the process for evaluating data, explanation of why data must be pre-
processed. Using the open-source tools to examine and pre- process data. We can use Amazon Sage
Maker to train and host a machine learning model.

It also includes in the use of cross validation to test the performance of a machine learning model,
use of hosted model for inference and creating an Amazon Sage Maker hyperparameter tuning job
to optimize a model’s effectiveness. Finally, we will learn how to use managed Amazon ML
services to solve specific machine learning problems in forecasting, computer vision, and natural
language processing.

i
ACKNOWLEDGEMENT

We are extremely thankful to our beloved Chairman and founder Dr. M. Mohan
Babu who took keen interest to provide us the opportunity for carrying out the
project work.

We are highly indebted to Dr.B.M.Satish, Principal of Sree Vidyanikethan


Engineering College for his valuable support in all academic matters.

We are very much obliged to Dr. B. Narendra Kumar Rao, Professor & Head,
Department of CSE, for providing us the guidance and encouragement in completion
of this work.

I would like to express my special thanks of gratitude to the EduSkills Foundations by


AICTE who gave me the golden opportunity to do this wonderful internship, which
also helped me in doing a lot of Research and I came to know about so many new
things I am really thankful to them.

Komma Manjunath Reddy


20121A3226

ii
TABLE OF CONTENTS

Title Page no
Abstract i
Acknowledgement ii
Table of contents iii-v
Abbreviations vii
Course 1 1-19
Course 2 20-44
Summary of Experience 45-46

Reflection on Learning 47-48

Conclusion 49

References 50

iii
CONTENTS
COURSE: AWS CLOUD FOUNDATIONS
SNO CHAPTERS INTRODUCTION PG NO
Cloud Concepts Overview
• Introduction to cloud computing
Chapter 1 • Advantages of cloud computing 1-2
1.
• Introduction to Amazon Web
Services (AWS)
• AWS Cloud Adoption Framework

Cloud Economics and Billing


• Fundamentals of pricing
• Total Cost of Ownership
2. Chapter 2 3-4
• AWS Organizations
• AWS Billing and Cost Management
• Technical Support Demo
AWS Global Infrastructure Overview
3. Chapter 3 • AWS Global Infrastructure 5
• AWS Service overview
AWS Cloud Security
• AWS shared responsibility model
• AWS Identity and Access
Management (IAM)
4. Chapter 4 • Securing a new AWS account 6-7
• Securing accounts
• Securing data on AWS
• Working to ensure compliance
Networking and Content Delivery
• Networking basics
• Amazon Virtual Private Cloud
(Amazon VPC)
5. Chapter 5 • VPC networking 8-9
• VPC security
• Amazon Route 53
• Amazon CloudFront

iv
Compute Compute services overview

Amazon EC2

Amazon EC2 cost optimization
6. Chapter 6 •
ContainerServices

Introduction to AWS Lambda 10-11

Introduction to AWS Elastic Beanstalk

Storage
• Amazon Elastic Block Store
(Amazon EBS)
• Amazon Simple Storage Service
7. Chapter 7 (Amazon S3)
• Amazon Elastic File System 12-13
(Amazon EFS)
• Amazon Simple Storage Service Glacier
Databases
• Amazon Relational Database Service
(Amazon RDS)
8. Chapter 8
• Amazon DynamoDB
• Amazon Redshift 14-15
• Amazon Aurora
Cloud Architecture
• AWS Well-Architected Framework
9. Chapter 9
• Reliability and high availability 16-17
• AWS Trusted Advisor
Auto Scaling and Monitoring
• Elastic Load Balancing
10. Chapter 10
• Amazon CloudWatch 18-19
• Amazon EC2 Auto Scaling

v
COURSE: MACHINE LEARNING FOUNDATIONS

SNO CHAPTERS TOPICS PG NO


Introducing Machine Learning
• What is machine learning?
• Business problems solved with
machine learning
1. Chapter 1 20-22
• Machine learning process
• Machine learning tools overview
• Machine learning challenges

Implementing a Machine Learning Pipeline


with Amazon SageMaker
• Formulating machine learning problems
• Collecting and securing data
• Evaluating your data
2. Chapter 2 • Feature Engineering 23-31
• Training
• Hosting and Using the Model
• Evaluating the accuracy of the model
• Hyperparameter and model tuning
Introducing Forecasting
• Forecasting Overview
3. Chapter 3 • Processing time series data 32-34
• Using Amazon Forecast

Introducing Computer Vision (CV)


• Introduction to computer vision
4. Chapter 4 • Image and video analysis 35-41
• Preparing custom datasets for computing vision

Introducing Natural Language Processing


• Overview of natural language processing
5. Chapter 5 • Natural language processing managed 42-44
services
Conclusion and References
6 49-50

vi
ABBREVATIONS
Abbreviation Full Form
EC2 Elastic Compute Cloud
S3 Simple Storage Service
VPC Virtual Private Cloud
RDS Relational Database Service
IAM Identity and Access Management
ELB Elastic Load Balancer
SQS Simple Queue Service
SNS Simple Notification Service
SES Simple Email Service
Lambda AWS Lambda (Serverless Compute Service)
KMS Key Management Service
CloudFront Amazon CloudFront (Content Delivery Network)
Route 53 Amazon Route 53 (Domain Name System Service)
EBS Elastic Block Store
AMI Amazon Machine Image
CLI Command Line Interface
SDK Software Development Kit
SaaS Software as a Service
PaaS Platform as a Service
IaaS Infrastructure as a Service
CORS Cross-Origin Resource Sharing
CDN Content Delivery Network
API Application Programming Interface
CFN AWS CloudFormation
CICD or CI/CD Continuous Integration / Continuous Deployment
AZ Availability Zone
API Gateway Amazon API Gateway
EKS Amazon Elastic Kubernetes Service
SFTP Secure File Transfer Protocol
VPN Virtual Private Network

vii
COURSE: AWS CLOUD FOUNDATIONS
CHAPTER : 1 CLOUD CONCEPTS OVERVIEW

1. Introduction to Cloud Computing:

Cloud computing is the on-demand delivery of compute power, database,storage,


applications, and other IT resources via the internet with pay-as-you-gopricing. These resources
run on server computers that are in large data centres in different locations around the world.

CLOUD SERVICES: There are three main cloud service models.

Infrastructure as a service (IaaS): IaaS is also known as Hardware as a Service (Haas). It is a


computing infrastructure managed over the internet. The main advantage of using IaaS is that it
helps users to avoid the cost and complexity of purchasing and managing the physical servers.

Platform as a service (PaaS): PaaS cloud computing platform is created for theprogrammer to
develop, test, run, and manage the applications.

Software as a service (SaaS): SaaS is also known as "on-demand software”. Itis a software in
which the applications are hosted by a cloud service provider. Users can access these
applications with the help of internet connection and webbrowser.

2.Advantages of Cloud Computing:

1) Back-up and restore data


2) Improved collaboration
1
3) Excellent accessibility
4) iServices in the pay-per-use model

3. Introduction to Amazon Web Services:

Amazon Web Services (AWS) is a secure cloud platform that offers a broad setof global cloud-
based products. Because these products are delivered over the internet, you have on-demand
access to the compute, storage, network, database,and other IT resources that you might need for
your projects—and the tools to manage them.

4. AWS Cloud Adoption Framework:

The AWS Cloud Adoption Framework (AWS CAF) provides guidance and best
practices to help organizations identify gaps in skills and processes. It also helps organizations
build a comprehensive approach to cloud computing— both across the organization and
throughout the IT lifecycle—to acceleratesuccessful cloud adoption.
At the highest level, the AWS CAF organizes guidance into six areas of focus, called
perspectives. Perspectives span people, processes, and technology. Each perspective consists
of a set of capabilities,which covers distinct responsibilities that are owned or managed by
functionally related stakeholders.

2
CHAPTER 2: CLOUD ECONOMIC AND BILLING
1. Fundamentals of Pricing:

There are three fundamental drivers of cost with AWS: compute, storage, and outbound
data transfer. These characteristics vary somewhat, depending on the AWS product and pricing
model you choose.

There is no charge (with some exceptions) for:

• Inbound data transfer.


• Data transfer between services within the same AWS Region.
• Pay for what you use.
• Start and stop anytime.
• No long-term contracts are required.
• Some services are free, but the other AWS services that they provisionmight not
be free.
2. Total Cost of Ownership:

Total Cost of Ownership (TCO) is the financial estimate to help identify directand indirect
costs of a system.
• To compare the costs of running an entire infrastructure environment
orspecific workload on-premises versus on AWS

• To budget and build the business case for moving to the cloud
3. AWS Organisations:
AWS Organizations is a free account management service that enables you to consolidate
multiple AWS accounts into an organization that you create and centrally manage.
3
AWS Organizations include consolidated billing and account management capabilities that help you
to better meet the budgetary, security, and compliance needs of your business. The main benefits of
AWS Organizations are:

• Centrally managed access policies across multiple AWS accounts.


• Controlled access to AWS services.
• Automated AWS account creation and management.
• Consolidated billing across multiple AWS accounts.
4. AWS Bill And Cost Management:

AWS Billing and Cost Management is the service that you use to pay your AWSbill, monitor
your usage, and budget your costs. Billing and Cost Management enables you to forecast and
obtain a better idea of what your costs and usage might be in the future so that you can plan.

You can set a custom time period and determine whether you would like toview your data
at a monthly or daily level of granularity.

4
CHAPTER 3: GLOBAL INFRASTURUCTURE OVERVIEW

1. AWS Global Infrastructure:

The AWS Global Infrastructure is designed and built to deliver a flexible, reliable, scalable,
and secure cloud computing environment with high-qualityglobal network performance.
AWS Global Infrastructure Map: https://aws.amazon.com/about-aws/global-
infrastructure/#AWS_Global_Infrastructure_MapChoose a circle on the map toview
summary information about the Region represented by the circle.
Regions and Availability Zones: https://aws.amazon.com/about-aws/global-
infrastructure/regions_az/Choose a tab to view a map of the selected geographyand a list of
Regions, Edge locations, Local zones, and Regional Caches.
2. AWS Service Overview:

5
CHAPTER 4: CLOUD SECURITY
1. AWS Shared Responsibility Model:

AWS responsibility:

Security of the cloud AWS responsibilities:

• Physical security of data centres i.e., Controlled, need-based access.


• Hardware and software infrastructure i.e., Storage decommissioning,
hostoperating system (OS) access logging, and auditing

• Network infrastructure i.e., Intrusion detection.


• Virtualization infrastructure.
• Instance isolation
Customer responsibility:
Security in the cloud Customer responsibilities:
• Amazon Elastic Compute Cloud (Amazon EC2) instance operating
systemincluding patching, maintenance
• Applications: -Passwords, role-based access, etc.
• Security group configuration: - OS or host-based firewalls and includingintrusion
detection or prevention systems
2. AWS Identity and Access Management (IAM): IAM is a no-cost
AWSaccount feature.
Use IAM to manage access to AWS resources–

• A resource is an entity in an AWS account that you can work with


• Example resources; An Amazon EC2 instance or an Amazon S3
bucketExample– Control who can terminate Amazon EC2 instances
Define fine-grained access rights –
• Who can access the resource

• Which resources can be accessed and what can the user do to the resource
• How resources can be accessed.
3. Securing a New AWS Account:
AWS account root user access versus IAM access

6
Best practice: Do not use the AWS account root user except when necessary.

• Access to the account root user requires logging in with the email address(and
password) that you used to create the account.
Example actions that can only be done with the account root user:

• Update the account root user password


• Change the AWS Support plan
• Restore an IAM user's permissions
• Change account settings (for example, contact information, allowed Regions).
4: Securing Accounts:
Security features of AWS Organizations:
• Group AWS accounts into organizational units (OUs) and attach differentaccess
policies to each OU.
• Integration and support for IAM: Permissions to a user are the intersectionof what
is allowed by AWS Organizations and what is granted by IAM in that account.
• Use service control policies to establish control over the AWS services andAPI
actions that each AWS account can access.
5. Securing Data on AWS:
Encryption encodes data with a secret key, which makes it unreadable.

• Only those who have the secret key can decode the data
• AWS KMS can manage your secret keys AWS supports encryption ofdata at rest.
• Data at rest = Data stored physically (on disk or on tape)
• You can encrypt data stored in any service that is supported by AWSKMS, including:
Amazon S3, Amazon EBS, Amazon Elastic File System (Amazon EFS), Amazon RDS
managed databases.

7
CHAPTER 5: NETWORKING AND CONTENT DELIVERY
1: Networking Basics:
Computer Network:
An interconnection of multiple devices, also known as hosts, that are connected using multiple
paths for the purpose of sending/receiving data or media. Computer networks can also include
multiple devices/mediums which help inthe communication between two different devices; these
are known as Network devices and include things such as routers, switches, hubs, and bridges.

2. Amazon Virtual Private Cloud (VPC):


Enables you to provision a logically isolated section of the AWS Cloud where you can launch
AWS resources in a virtual network that you define. Gives you control over your virtual
networking resources, including:
• Selection of IP address range
• Creation of subnets
• Configuration of route tables and network gateways
• Enables you to customize the network configuration for your VPC
• Enables you to use multiple layers of security
3. VPC Networking:

There are several VPC networking options, which include:

 Internet gateway: Connects your VPC to the internet


 NAT gateway: Enables instances in a private subnet to connect to theinternet
 VPC endpoint: Connects your VPC to supported AWS services

8
 VPC peering: Connects your VPC to other VPCs
 VPC sharing: Allows multiple AWS accounts to create their applicationresources into shared,
centrally managed Amazon VPCs
 AWS Site-to-Site VPN: Connects your VPC to remote networks
 AWS Direct Connect: Connects your VPC to a remote network by using adedicated network connection
 AWS Transit Gateway: A hub-and-spoke connection alternative to VPCpeering.
4. VPC Security:
 Build security into your VPC architecture:
 Isolate subnets if possible.
 Choose the appropriate gateway device or VPN connection for yourneeds. Use firewalls.
 Security groups and network ACLs are firewall options that you can use tosecure your VPC.
5. Amazon Router 53:
 Is a highly available and scalable Domain Name System (DNS) webservice.
 Is used to route end users to internet applications by translating names (like
www.example.com) into numeric IP addresses (like 192.0.2.1) thatcomputers use
to connect to each other.
 Is fully compliant with IPv4 and IPv6.
6. Amazon Cloud Front:
• Fast, global, and secure CDN service.
• Global network of edge locations and regional edge caches.
• Self-service model.
• Pay-as-you-go pricing.

9
CHAPTER 6: COMPUTE
1. Compute Services Overview:

2. Amazon EC2:

Amazon Elastic Compute Cloud (Amazon EC2):

• Provides virtual machines—referred to as EC2 instances—in the cloud.


• Gives you full control over the guest operating system (Windows orLinux)
on each instance.

• You can launch instances of any size into an Availability Zone anywherein the world.
• Launch instances from Amazon Machine Images (AMIs).
• Launch instances with a few clicks or a line of code, and they are ready inminutes.
• You can control traffic to and from instances.

10
3. Amazon EC2 Cost Optimization:

4. Container Services: Containers are a method of operating


systemvirtualization. Benefits are:

• Repeatable.
• Self-contained environments.
• Software runs the same in different environments.
• Developer's laptop, test, production.
• Faster to launch and stop or terminate than virtual machines.
5. Introduction to AWS Lambda: It is a serverless computing service.

• It supports multiple programming languages.


• Completely automated administration.
• Built-in fault tolerance It supports the orchestration of multiple functions.
• Pay-per-use pricing.

11
CHAPTER 7: STORAGE
1. Amazon Elastic Block Store (Amazon EBS):
Amazon EBS enables you to create individual storage volumes and attach themto an
Amazon EC2 instance:

• Amazon EBS offers block-level storage.


• Volumes are automatically replicated within its Availability Zone.
• It can be backed up automatically to Amazon S3 through snapshots.Uses include

• Boot volumes and storage for Amazon Elastic Compute Cloud


(AmazonEC2) instances.

• Data storage with a file system.


• Database hosts.
• Enterprise applications.
2. Amazon Simple Storage Service (Amazon EBS):

• Backup and storage –Provide data backup and storage services for others
• Application hosting –Provide services that deploy, install, and
manageweb applications

• Media hosting –Build a redundant, scalable, and highly available infrastructure


that hosts video, photo, or music uploads and downloads

• Software delivery –Host your software applications that customers candownload


3. Amazon Elastic File System
(EFS): File storage in the AWS
Cloud:

• Works well for big data and analytics, media processing


workflows,content management, web serving, and home directories.

• Petabyte-scale, low-latency file system.


• Shared storage.
• Elastic capacity.
• Supports Network File System (NFS) versions 4.0 and 4.1 (NFSv4).

12
• Compatible with all Linux-based AMIs for Amazon EC2.

13
4. Amazon Simple Storage Service Glacier:
• Amazon S3 Glacier is a data archiving service that is designed
forsecurity, durability, and an extremely low cost.

• Amazon S3 Glacier is designed to provide 11 9s of durability for objects.


• It supports the encryption of data in transit and at rest through SecureSockets
Layer (SSL) or Transport Layer Security (TLS).

• The Vault Lock feature enforces compliance through a policy.


• Extremely low-cost design works well for long-term archiving.
• Provides three options for access to archives—expedited, standard, andbulk
— retrieval times range from a few minutes to several hours.

14
CHAPTER 8: DATABASES
1. Amazon Relational Databases Service:
Amazon RDS is a web service that makes it easy to set up, operate, and scale a relational
database in the cloud. It provides cost-efficient and resizable capacity while managing time-
consuming database administration tasks so you can focuson your applications and your
business. Amazon RDS is scalable for compute and storage, and automated redundancy and
backup is available. Supported database engines include Amazon Aurora, PostgreSQL, MySQL,
MariaDB,Oracle, and Microsoft SQL Server.
2. Amazon Dynamo DB:
Fast and flexible NoSQL database service for any scale.

• Virtually unlimited storage.


• Items can have differing attributes.
• Low-latency queries.
• Scalable read/write throughput.
The core DynamoDB components are tables, items, and attributes.

• A table is a collection of data.


• Items are a group of attributes that is uniquely identifiable among all theother items.
3. Amazon Redshift:
Usage Case:1
Enterprise data warehouse (EDW)

• Migrate at a pace that customers are comfortable with


• Experiment without large upfront cost or commitment
• Respond faster to business
needs Big data
• Low price point for small customers
• Managed service for ease of deployment and maintenance
• Focus moreon data and less on database management.
Usage Case:2
Software as a service (SaaS)

• Scale the data warehouse capacity as demand grows


15
• Add analytic functionality to applications
4. Amazon Aurora:
• Enterprise-class relational database.
• Compatible with MySQL or PostgreSQL.
• Automate time-consuming tasks (such as provisioning, patching,
backup,recovery, failure detection, and repair).

16
CHAPTER 9: CLOUD ARCHITECTURE
1. AWS Well Architected Framework:
A guide for designing infrastructures that are:

✓ Secure

✓ High performing

✓ Resilient ✓ Efficient

• A consistent approach to evaluating and implementing cloudarchitectures


• A way to provide best practices that were developed through lessonslearned
by reviewing customer architectures.
2. Reliability and Availability:

3. AWS Trusted Advisors:

Cost Optimization–AWS Trusted Advisor looks at your resource use and makes recommendations
to help you optimize cost by eliminating unused and idle resources, or by making commitments to
reserved capacity.

17
Performance–Improve the performance of your service by checking yourservice limits,
ensuring you take advantage of provisioned throughput, andmonitoring for overutilized
instances.
Security–Improve the security of your application by closing gaps, enabling variousAWS security
features, and examining your permissions.
Fault Tolerance–Increase the availability and redundancy of your AWS application by taking
advantage of automatic scaling, health checks, multi-AZdeployments, and backup capabilities.
Service Limits–AWS Trusted Advisor checks for service usage that is more than80percentof the
service limit. Values are based on a snapshot, so your current usage might differ. Limit and usage
data can take up to 24 hours to reflect any changes.

18
CHAPTER 10: AUTO SCALING AND MONITORING
1. Elastic Load Balancing:

2. Amazon CloudWatch:
• Amazon CloudWatch helps you monitor your AWS resources—and
theapplications that you run on AWS—in real time.
CloudWatch enables you to –

• Collect and track standard and custom metrics.


• Set alarms to automatically send notifications to SNS topics or performAmazon
EC2 Auto Scaling or Amazon EC2 actions.
• Define rules that match changes in your AWS environment and route theseevents to
19
targets for processing.

3. Amazon EC2 Auto Scaling:


• Helps you maintain application availability.
• Enables you to automatically add or remove EC2 instances according toconditions
that you define.

• Detects impaired EC2 instances and unhealthy applications and replacesthe


instances without your intervention.

• Provides several scaling options –Manual, scheduled, dynamic or on-demand, and


predictive.

20
COURSE: MACHINE LEARNING FOUNDATIONS CHAPTER
1: INTRODUCING TO MACHINE LEARNING

1. What is Machine Learning?


Machine learning is the scientific study of algorithms and statistical models toperform a task
by using inference instead of instructions.

• Artificial intelligence is the broad field of building machines to performhuman tasks.


• Machine learning is a subset of AI. It focuses on using data to train MLmodels so
the models can make predictions.

• Deep learning is a technique that was inspired from human biology. It useslayers
of neurons to build networks that solve problems.
• Advancements in technology, cloud computing, and algorithm development
have led to a rise in machine learning capabilities andapplications.
2. Business Problems Solved with Machine Learning Machine learning is used
throughout a person’s digital life. Here are some examples:

• Spam –Your spam filter is the result of an ML program that was trainedwith
examples of spam and regular email messages.
• Recommendations –Based on books that you read or products that you buy, ML
programs predict other books or products that you might want.Again, the ML
program was trained with data from other readers’ habitsand purchases.
• Credit card fraud –Similarly, the ML program was trained on examples
oftransactions that turned out to be fraudulent, along with transactions that were
legitimate.
Machine learning problems can be grouped into –

• Supervised learning: You have training data for which you know theanswer.
• Unsupervised learning: You have data, but you are looking for insightswithin
the data.
21
• Reinforcement learning: The model learns in a way that is based
onexperience and feedback.
• Most business problems are supervised learning.
3. Machine Learning Process
The machine learning pipeline process can guide you through the process oftraining
and evaluating a model.
The iterative process can be broken into three broad steps –

• Data processing
• Model training
• Model evaluation
ML PIPELINE:

4. Machine Learning Tools Overview

• Jupyter Notebook is an open-source web application that enables you tocreate and
share documents that contain live code, equations, visualizations, and narrative text.

• Jupyter Lab is a web-based interactive development environment


forJupyter notebooks, code, and data. Jupyter Lab is flexible.
• pandas is an open-source Python library. It’s used for data handling and analysis. It
represents data in a table that is similar to a spreadsheet. Thistable is known as a
panda Data Frame.
• Matplotlib is a library for creating scientific static, animated, and interactive
visualizations in Python. You use it to generate plots of yourdata later in this
course.
22
• Seaborn is another data visualization library for Python. It’s built on matplotlib, and it
provides a high-level interface for drawing informativestatistical graphics.

• NumPy is one of the fundamental scientific computing packages in Python. It


contains functions for N-dimensional array objects and usefulmath functions such as
linear algebra, Fourier transform, and random number capabilities.
• scikit-learn is an open-source machine learning library that supports supervised
and unsupervised learning. It also provides various tools formodel fitting, data pre-
processing, model selection and evaluation, andmany other utilities.

5. Machine Learning Challenges

23
CHAPTER 2: IMPLEMENTING A MACHINE LEARNING PIPELINE
WITH AMAZON SAGEMAKER
1. Formulating Machine Learning Problems

Business problems must be converted into an ML problem. Questions to askinclude –


• Have we asked why enough times to get a solid business problemstatement
and know why it is important?
• Can you measure the outcome or impact if your solution is implemented?Most
business problems fall into one of two categories –
• Classification (binary or multi): Does the target belong to a class?
• Regression: Can you predict a numerical value?
2. Collecting and Securing Data
• Private data is data that you (or your customers) have in various existingsystems.
Everything from log files to customer invoice databases can be useful, depending on the
problem that you want to solve. In some cases, data is found in many different systems.
• Commercial data is data that a commercial entity collected and made available.
Companies such as Reuters, Change Healthcare, Dun & Bradstreet, and Foursquare
maintain databases that you can subscribe to.These databases include curated news
stories, anonymized healthcare transactions, global business records, and location data.
Supplementing your own data with commercial data can provide useful insights that
youwould not have otherwise.
• Open-source data comprises many different open-source datasets that range from
scientific information to movie reviews. These datasets are usually available for use
in research or for teaching purposes. You can findopen-source datasets hosted by
AWS, Kaggle, and the UC Irvine Machine Learning Repository.

24
Securing Data

3. Evaluating Data
• Descriptive statistics can be organized into different categories. Overallstatistics
include the number of rows (instances) and the number of columns (features or
attributes) in your dataset. This information, whichrelates to the dimensions of your
data, is important. For example, it can indicate that you have too many features,
which can lead to high dimensionality and poor model performance.

• Attribute statistics are another type of descriptive statistic, specifically for numeric
attributes. They give a better sense of the shape of your attributes, including
properties like the mean, standard deviation, variance,minimum value, and maximum
value.

• Multivariate statistics look at relationships between more than one variable, such
as correlations and relationships between your attributes.

4. Feature Engineering

Feature selection is about selecting the features that are most relevant and discarding the
25
rest. Feature selection is applied to prevent either redundancy orirrelevance in the existing
features,

26
or to get a limited number of features to prevent overfitting.

Feature extraction is about building up valuable information from raw data by reformatting,
combining, and transforming primary features into new ones. This transformation continues
until it yields a new set of data that can be consumed bythe model to achieve the goals.

Outliers

During feature engineering. You can handle outliers with several differentapproaches.
They include, but are not limited to:

• Deleting the outlier: This approach might be a good choice if your outlieris based on
an artificial error. Artificial error means that the outlier isn’t natural and was introduced
because of some failure—perhaps incorrectly entered data.

• Transforming the outlier: You can transform the outlier by taking the natural log of
a value, which in turn reduces the variation that the extremeoutlier value causes.
Therefore, it reduces the outlier’s influence on the overall dataset.
• Imputing a new value for the outlier: You can use the mean of the feature, for
instance, and impute that value to replace the outlier value.Again, this would be
a good approach if an artificial error caused the outlier.

Feature Selection: Filter Methods


Filter methods use a proxy measure instead of the actual model’s performance. Filter methods
are fast to compute, and they still capturing the usefulness of thefeature set. Common
measures include:

• Pearson’s correlation coefficient –Measures the statistical relationship


orassociation between two continuous variables.

• Linear discriminant analysis (LDA) –Is used to find a


linearcombination of features that separates two or more classes.

• Analysis of variance (ANOVA) –Is used to analyse the differencesamong


group means in a sample.

• Chi-square–Is a single number that tells how much difference exists between
your observed counts and the expected counts, if no relationshipexists in the
population.

Feature Selection: Wrapper Methods

27
• Forward selection starts with no features and adds them until the bestmodel is

28
found.
• Backward selection starts with all features, drops them one at a time, andselects the
best model.
Feature Selection: Embedded Methods

Embedded methods combine the qualities of filter and wrapper methods. Theyare
implemented from algorithms that have their own built-in feature selectionmethods.

Some of the most popular examples of these methods are LASSO and RIDGE
regression, which have built-in penalization functions to reduce overfitting.

Filter Methods Wrapper Methods Embedded Methods

5. Training

Holdout technique and k-fold cross validation methods are the most used oneswhen the data
is to be classified as test set and training set.

Holdout method k-fold cross validation

XGBOOST ALGORITHM

XGBoost is a popular and efficient open-source implementation of the gradientboosted


trees algorithm. Gradient boosting is a supervised learning algorithm that attempts to

29
accurately

30
predict a target variable. It attains its prediction by combining an ensemble of estimates from
a set of simpler, weaker models.
XGBoost has done well in machine learning competitions. It robustly handles various data
types, relationships, and distributions, and the many hyperparameters that can be tweaked
and tuned for improved fit. This flexibilitymakes XGBoost a solid choice for problems in
regression, classification (binaryand multiclass), and ranking.

LINEAR LEARNER

The Amazon SageMaker linear learner algorithm provides a solution for bothclassification and
regression problems.

With the Amazon SageMaker algorithm, you can simultaneously explore different training
objectives and choose the best solution from your validation set. You can also explore
many models and choose the best one for your needs.

The Amazon SageMaker linear learner algorithm compares favourably withmethods


that provide a solution for only continuous objectives.

It provides a significant increase in speed over naive


hyperparameteroptimization techniques.

31
6. Hosting and Using the Model

• You can deploy your trained model by using Amazon SageMaker to handle API
calls from applications, or to perform predictions by using abatch transformation.

• The goal of your model is to generate predictions to answer the businessproblem.


Be sure that your model can generate good results before you deploy it to
production.
• Use Single-model endpoints for simple use cases and use multi-modelendpoint
support to save resources when you have multiple models todeploy.

7. Evaluating the Accuracy of the Model

CONFUSION MATRIX TERMINOLOGY

32
COMPARISION OF MODELS

SENSITIVITY

SPECIFICITY

WHICH MODEL IS BETTER NOW

33
OTHER CLASSIFICATION METRICS

1. Hyperparameter and model tuning


ABOUT HYPERPARAMETER:

HYPERPARAMETER TUNING:

• Tuning hyperparameters can be labour-intensive. Traditionally, thiskind of


tuning was done manually.

34
• Someone—who had domain experience that was related to that
hyperparameter and use case—would manually select the hyperparameters,
according to their intuition and experience.

• Then, they would train the model and score it on the validation data.This
process would be repeated until satisfactory results were achieved.
• This process is not always the most thorough and efficient way oftuning
your hyperparameters.

35
CHAPTER 3: INTRODUCING FORECASTING

1. OVERVIEW OF FORECASTING
Forecasting is an important area of machine learning. It is important because
somany opportunities for predicting future outcomes are based on historical data.

It’s based on time series of data.


Time series data as falling into two broad categories.
The first type is univariate, which means that it has only one variable. Thesecond type
is multivariate.
In addition to these two categories, most time series datasets also follow oneof the
following patterns:
• Trend –A pattern that shows the values as they increase, decrease, orstay the
same over time.

• Seasonal –A repeating pattern that is based on the seasons in a year.


• Cyclical –Some other form of a repeating pattern.
• Irregular –Changes in the data over time that appear to be random orthat have
no discernible pattern.

2. PROCESSING TIME SERIES DATA


• Time Series Data Handling

• Time Series Data Handling: Smoothing of Data


Smoothing your data can help you deal with outliers and other anomalies.You might

36
consider smoothing for the following reasons.

• Data preparation –Removing error values and outliers.


• Visualization –Reducing noise in a plot.

Some Time Series Data Functions Using Python:

Time Series Data Algorithms:

• Autoregressive Integrated Moving Average (ARIMA): This algorithm


removes autocorrelations, which might influence thepattern of observations.
• DeepAR+: A supervised learning algorithm for forecasting one- dimensional
time series. It uses a recurrent neural network to train amodel over multiple
time series.

• Exponential Smoothing (ETS): This algorithm is useful for datasets with


seasonality. It uses a weighted average for all observations. The weights are
decreased over time.

• Non-Parametric Time Series (NPTS): –Predictions are based on sampling


from past observations. Specialized versions are availablefor seasonal and
climatological datasets.
37
• Prophet: A Bayesian time series model. It’s useful for datasets
that span a long time period, have missing data, or have large
outliers.

3. Using Amazon Forecast


Below flowchart describes about the forecasting steps:

Import your data –You must import as much data as you have—both historical data
and related data. You should do some basic evaluation andfeature engineering before
you use the data to train a model.
Train a predictor –To train a predictor, you must choose an algorithm. If you are not sure
which algorithm is best for your data, you can let AmazonForecast choose by selecting
Auto ML as your algorithm. You also must select a domain for your data, but if you’re not
sure which domain fits best,you can select a custom domain. Domains have specific types
of data that they require. For more information, see Predefined Dataset Domains and
Dataset Types in the Amazon Forecast documentation.
Generate forecasts –As soon as you have a trained model, you can use themodel to make
a forecast by using an input dataset group. After you generate a forecast, you can query the
forecast, or you can export it to an Amazon Simple Storage Service (Amazon S3) bucket.
You also have the option to encrypt the data in the forecast before you export it.

38
CHAPTER 4: INTRODUCING COMPUTER VISION
1. Computer Vision enables machines to identify people, places, and thingsin images
with accuracy at or above human levels, with greater speed and efficiency. Often built with
deep learning models, computer vision automates the extraction, analysis, classification,
and understanding of useful information from a single image or a sequence of images. The
image data can take many forms, such as single images, video sequences, views from
multiple cameras, or three-dimensional data.

Applications of Computer Vision:

Public safety and home security

Computer vision with image and facial recognition can help to quickly identify unlawful
entries or persons of interest. This process can result insafer communities and a more
effective way of deterring crimes.
Authentication and enhanced computer-human interaction
Enhanced human-computer interaction can improve customer satisfaction. Examples
include products that are based on customer sentiment analysis inretail outlets or
faster banking services with quick authentication that is based on customer identity
and preferences.
Content management and analysis
Millions of images are added every day to media and social channels. Theuse of
computer vision technologies—such as metadata extraction and image classification—
can improve efficiency and revenue opportunities. Autonomous driving
By using computer-vision technologies, auto manufacturers can provideimproved and safer
39
self-driving car navigation, which can help realize autonomous driving and make it a
reliable transportation option.
Medical imaging
Medical image analysis with computer vision can improve the accuracy andspeed of a
patient's medical diagnosis, which can result in better treatment outcomes and life expectancy.
Manufacturing process control Well-trained computer vision that is incorporated into robotics
can improve quality assurance and operational efficiencies in manufacturing applications. This
process can result in more reliable and cost-effective products.

Computer vison problems:

Problem 01: Recognizing food & state whether it’s breakfast or lunch
or dinner

40
As the CV classified the objects as milk, peaches, ice cream,
salad, nuggets, bread roll thus it’s a breakfast.

Problem 02: Video Analysis

41
2. Image and Video Analysis
Amazon Recognition is a computer vision service based on deep learning.You can use it
to add image and video analysis to your applications.
Amazon Recognition enables you to perform the following types of analysis:
Searchable image and video libraries–Amazon Recognitionmakes images and stored
videos searchable so that you can discover theobjects and scenes that appear in them.
Face-based user verification–Amazon Recognition enables your applications to confirm
user identities by comparing their live image with areference image. Sentiment and
demographic analysis–Amazon Recognition interprets emotional expressions, such as
happy, sad, or surprise. It can also interpret demographic information from facial images,
such as gender.
Unsafe content detection–Amazon Recognition can detect inappropriatecontent in
images and in stored videos.

Text detection–Amazon Recognition Text in Image enables you torecognize


and extract text content from images.
CASE 01: Searchable Image Library

CASE 02: Image Moderation

42
CASE 03: Sentiment Analysis

4. Preparing Customs Dataset for Computer VisonThere are 6


steps involved in preparing customs data:STEP 01: Collect
Images

43
STEP 02: Create Training Dataset

STEP 03: Create Test Dataset

STEP 04: Train the Model

44
STEP 05: Evaluate

STEP 06: Use Model

45
CHAPTER – 6 INTRODUCING NATURAL LANGUAGE PROCESSING.
1. Overview of Natural Language Processing

NLP develops computational algorithms to automatically analyze and represent and


represent human language. By evaluating the structure oflanguage, machine
learning systems can process large sets of words, phrases, and sentences.

Some challenges of NLP

Discovering the structure of the text –One of the first tasks of any NLP application is
to break the text into meaningful units, such as words, phrases, and sentences.

Labelling data –After the system converts the text to data, the next challenge is to apply
labels that represent the various parts of speech. Everylanguage requires a different
labelling scheme to match the language’s grammar. Representing context –Because word
meaning depends on context, any NLP system needs a way to represent context. It is a big
challenge because of the large number of contexts.
Applying grammar –Dealing with the variation in how humans uselanguage is a
major challenge for NLP systems.
NLP FLOW CHART:

46
2. Natural Language Processing Managed services

USES:

• Medical transcription
• Subtitle in streaming content and in offline content

Uses: Navigation Systems and Animation Productions

Uses:
 International Websites
 Software Localization

47
Uses:
 Document Analysis

Fraud Detection
Uses:
 Interactive Assistants
 Database Queries

48
SUMMARY OF EXPERIENCE

Fig.2.1 Major AWS Certifications

During my internship with Amazon Web Services (AWS), I embarked on a comprehensive journey into
the realm of cloud computing, exploring the vast capabilities of the AWS platform. This internship,
conducted in partial fulfillment of the requirements for my Bachelor of Technology degree in Computer
Science and Engineering, provided invaluable insights and hands-on experience in deploying and
managing cloud-based solutions.

The internship began with an introduction to AWS Cloud, where I gained a profound understanding of
its key components and services. I delved into compute services like Amazon EC2 and serverless
computing with AWS Lambda. Storage services such as Amazon S3 and database management through
Amazon RDS and DynamoDB were explored. The networking capabilities, analytics and big data
services, AI and machine learning tools, as well as security and identity features were thoroughly
examined.

Practical implementation played a pivotal role in enhancing my skills. I created a virtual private cloud
(VPC) and configured subnets, launched EC2 instances, and set up web servers with Linux scripts. The
experience of hosting a website on Amazon S3 and creating a book catalog with dynamic content added
a real-world dimension to my learning.

The internship extended into areas like setting up a DynamoDB table and performing operations,
configuring MariaDB on server-oriented instances for DDL and DML testing, and automating EC2
start/stop processes using Lambda functions. The implementation of Elastic Load Balancer (ELB) for
traffic load balancing showcased the importance of high availability in cloud environments.

One of the highlights of my internship was the exploration of Auto Scaling for web servers. Creating
launch templates, defining scaling policies based on CPU utilization, and configuring the automatic
initialization of resources inline with Auto Scaling policies provided a hands-on understanding of
dynamic resource allocation.
49
The journey concluded with reflections on my learning experiences and a comprehensive summary of
the internship. Throughout this internship, I not only acquired technical skills but also developed a
deeper appreciation for the scalability, flexibility, and efficiency that cloud computing offers.

In conclusion, the AWS Cloud Virtual Internship has been a transformative experience, equipping me
with the knowledge and skills essential for navigating the dynamic landscape of cloud computing. The
exposure to real-world scenarios and practical implementations has prepared me for the challenges and
opportunities that lie ahead in the field of technology.

50
REFLECTION ON LEARNING

Fig.3.1 Amazon Web Services Logo

Undertaking the AWS Cloud Virtual Internship has been a transformative experience that has
significantly broadened my understanding of cloud computing and its practical applications. This
internship provided an immersive environment to explore the extensive capabilities of Amazon Web
Services (AWS), a leading cloud computing platform.

Through hands-on tasks and projects, I delved into various AWS services, including EC2, S3,
DynamoDB, Lambda, CloudWatch, and Elastic Load Balancer (ELB). The step-by-step procedures for
setting up and configuring these services enhanced my technical skills and deepened my comprehension
of cloud architecture.

One of the notable aspects of this internship was the practical exposure to real-world scenarios. From
setting up a simple web server to implementing complex solutions like auto-scaling and load balancing,
each task contributed to a holistic understanding of cloud infrastructure management. The emphasis on
creating a comprehensive report also improved my documentation and reporting skills.

The experience of automating processes using Lambda functions, integrating services for seamless
workflows, and setting up monitoring and alerting systems in CloudWatch has been particularly
insightful. These skills are not only valuable in the context of AWS but are transferable to other cloud
platforms, reinforcing the versatility of the knowledge gained.

51
Furthermore, the internship exposed me to database management with DynamoDB and MariaDB,
offering a practical perspective on handling data in cloud environments. Creating DynamoDB tables,
executing queries, and managing data in MariaDB provided a solid foundation in database operations.

Implementing Elastic Load Balancer (ELB) for traffic load balancing and auto-scaling for web servers
deepened my understanding of ensuring high availability and scalability in cloud applications. The
experience of creating Launch Templates, Auto Scaling Groups, and defining policies for scaling based
on CPU utilization was instrumental in mastering these advanced concepts.

In conclusion, the AWS Cloud Virtual Internship has been an enriching journey that transcended
theoretical knowledge to practical proficiency. The skills acquired are not only relevant in the context of
cloud computing but also align with industry demands for professionals well-versed in cloud
technologies. This internship has ignited a passion for continuous learning and exploration in the
dynamic field of cloud computing.

52
CONCLUSION
These CHAPTER s described how model explain ability relates to AI/ML solutions, giving customers
insight to explain ability requirements when initiating AI/ML use cases. Using AWS, four pillars were
presented to assess model explain ability options to bridge knowledge gaps and requirements for simple
to complex algorithms. To help convey how these models explain ability options relate to real-world
scenarios, examples froma range of industries were demonstrated. It is recommended that AI/ML
owners or business leaders follow these steps when initiating a new AI/ML solution:

Collect business requirements to identify the level of explain abilityrequired for your business to accept
the solution.

Based on business requirements, implement an assessment for modelexplain ability.

Work with an AI/ML technician to communicate model explain ability assessment and find the optimal
AI/ML solution to meet yourbusiness objectives.

After the solution is completed, revisit the model explain ability assessment to evaluate that business
requirements are continuouslymet.

By taking these steps, we will mitigate regulation risks and ensure trust in our model. With this trust,
when the event comes to push yourAI/ML solution into an AWS production environment, we will be
ready to create business value for our use case

53
REFERENCES
1. AWS
Documentation:

- https://docs.aws.amazon.com/

2. AWS Training and Certification:

-https://aws.amazon.com/training/

3. AWS Well-Architected Framework:

- https://aws.amazon.com/architecture/well-architected/

4. AWS Whitepapers:

- https://aws.amazon.com/whitepapers

5. AWS Blogs:

- https://aws.amazon.com/blogs/

6. AWS YouTube Channel:

-https://www.youtube.com/user/AmazonWebServices

7. GitHub - AWS Samples:

- https://github.com/aws-samples

8. AWS Architecture Center:

- https://aws.amazon.com/architecture/

9. A Cloud Guru:

- https://acloudguru.com/

10. CloudFormation Templates:

-https://aws.amazon.com/cloudformation/aws-cloudformation-templates/

11. Serverless Architectures with AWS Lambda:

- https://aws.amazon.com/serverless/

12. AWS Solutions:

-https://aws.amazon.com/solutions/

54

You might also like