FAREHA Summer Internship Report AI ML
FAREHA Summer Internship Report AI ML
FAREHA TABASSUM
ROLL.NO:
160519733031
fulfilment of the requirement for the award of the degree of BACHELOR OF ENGINEERING in
This is further certified that the work done under our guidance, and the results of this work have not
2
Acknowledgment
It is a proud privilege and duty to acknowledge the kind of help and guidance received from several
people during this course. It would not have been possible without their valuable help, cooperation
and guidance. I wish to record my sincere gratitude to Dr. MOHAMMED ABDUL BARI, and Dr.
PATHAN AHMED KHAN for their constant support. The course on “AWS AI-ML VIRTUAL
INTERNSHIP” was very helpful in giving the necessary background information and inspiration.
FAREHA TABASSUM
160519733031
3
CERTIFICATE
4
TABLE OF CONTENTS
01 Certificate 2
02 Acknowledgment 3
1 Introduction: 8-9
1.1 What is “machine learning ”?
1.2 Business Problems solved with Machine
Learning
2 Implement Machine Learning Pipeline: 10-13
2.1 Introduction to ML Pipeline
2.2 Implementation Using SageMaker
3 Forecasting: 14-15
3.1 Introduction to Forecasting
3.2 Processing time-series data
3.3 Using Amazon Forecast
Cloud Computing can be defined as the practice of using a network of remote servers hosted on the
Internet to store, manage, and process data, rather than a local server or a personal computer.
Companies offering such kinds of cloud computing services are called cloud providers and typically
charge for cloud computing services based on usage. Grids and clusters are the foundations for cloud
computing.
These are sometimes called the cloud computing stack because they are built on top of one
another. Knowing what they are and how they are different, makes it easier to accomplish your goals.
These abstraction layers can also be viewed as a layered architecture where services of a higher layer
can be composed of services of the underlying layer i.e, SaaS can provide Infrastructure.
• Software as a Service(SaaS)
Software-as-a-Service (SaaS) is a way of delivering services and applications over the
Internet. Instead of installing and maintaining software, we simply access it via the Internet,
freeing ourselves from the complex software and hardware management. It removes the need to
install and run applications on our own computers or in the data centers eliminating the expenses
of hardware as well as software maintenance.
SaaS provides a complete software solution that you purchase on a pay-as-you-go basis
from a cloud service provider. Most SaaS applications can be run directly from a web browser
without any downloads or installations required. The SaaS applications are sometimes called
Web-based software, on-demand software, or hosted software.
Advantages of SaaS
• Cost-Effective: Pay only for what you use.
• Reduced time: Users can run most SaaS apps directly from their web browser without
needing to download and install any software. This reduces the time spent in installation and
configuration and can reduce the issues that can get in the way of the software deployment.
• Accessibility: We can Access app data from anywhere.
• Automatic updates: Rather than purchasing new software, customers rely on a SaaS
provider to automatically perform the updates.
6
• Scalability: It allows the users to access the services and features on-demand.
The various companies providing Software as a service are Cloud9 Analytics,
Salesforce.com, Cloud Switch, Microsoft Office 365, Big Commerce, Eloqua, dropBox, and
Cloud Tran.
• Platform as a Service
PaaS is a category of cloud computing that provides a platform and environment to
allow developers to build applications and services over the internet. PaaS services are hosted
in the cloud and accessed by users simply via their web browser.
A PaaS provider hosts the hardware and software on its own infrastructure. As a
result, PaaS frees users from having to install in-house hardware and software to develop or
run a new application. Thus, the development and deployment of the application take place
independent of the hardware.
The consumer does not manage or control the underlying cloud infrastructure
including network, servers, operating systems, or storage, but has control over the deployed
applications and possibly configuration settings for the application-hosting environment. To
make it simple, take the example of an annual day function, you will have two options either
to create a venue or to rent a venue but the function is the same.
The various companies providing Platform as a service are Amazon Web services
Elastic Beanstalk, Salesforce, Windows Azure, Google App Engine, cloud Bess and IBM
smart cloud.
• Infrastructure as a Service:
Infrastructure as a service (IaaS) is a service model that delivers computer
infrastructure on an outsourced basis to support various operations. Typically IaaS is a service
where infrastructure is provided as outsourcing to enterprises such as networking equipment,
devices, database, and web servers.
It is also known as Hardware as a Service (HaaS). IaaS customers pay on a per-user
basis, typically by the hour, week, or month. Some providers also charge customers based on
the amount of virtual machine space they use.
It simply provides the underlying operating systems, security, networking, and servers
for developing such applications, and services, and deploying development tools, databases,
etc.
7
1. INTRODUCTION
Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to
become more accurate at predicting outcomes without being explicitly programmed to do so.
Machine learning algorithms use historical data as input to predict new output values. Machine
learning algorithms allow computers to train on data inputs and use statistical analysis to output
values that fall within a specific range. Because of this, machine learning facilitates
Computers build models from sample data to automate decision-making processes based on data
inputs.
Machine learning is an application of artificial intelligence (AI) that provides systems the ability to
automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer programs that can access
data and use it to learn for themselves.
8
1. Supervised Learning:
• Supervised learning as the name indicates the presence of a supervisor as a teacher.
• We teach or train the machine using data which is well labeled.(that
means some data is already tagged with the correct answer)
• Once the training is done, The machine is fed with new test data, Using
which, the Accuracy of the particular learning algorithm is tested.
2. Unsupervised Learning:
3. Reinforcement Learning:
• The machine learns through a system of rewards and punishments.
• The goal of the machine is to maximize the total reward and avoid punishment
9
2.IMPLEMENT MACHINE LEARNING PIPELINE
Section 1:
● In this first section, you examine how to think about turning a business
requirement into a machine learning problem.
● Business problems must be converted into an ML problem.
● Questions to ask while doing so include
○ Have we asked why enough times to get a solid business problem
statement and know why it is important?
○ Can you measure the outcome or impact if your solution is implemented?
● Most business problems fall into one of two categories –
○ Classification (binary or multi): Does the target belong to a class?
○ Regression: Can you predict a numerical value
10
Section 2:
● In this section, you will explore some of the techniques and challenges
involved with collecting and securing the data that you need for machine
learning.
● The first step in solving machine learning problems is to obtain data that
is required to train your machine learning model.
● ETL can be used to obtain data from multiple sources.
● Services such as AWS Glue can make it easy to obtain data from multiple data
stores.
● Security requirements should be understood, and they should be based on
both business needs and any regulatory requirements. You want to ensure
that your data is secure, only authorized users can access it, and it is
encrypted where possible.
Section 3:
● In this section, you learn how to evaluate data for machine learning. You
look at different data formats and types, and how you can visualize and
analyze the data before feature engineering
● The first step is to get your data into a format that can be used easily.
● pandas is a popular and useful Python library for working with data.
● Descriptive statistics help you gain insights into the data.
● Use visualizations to examine the dataset in more detail.
Section 4:
● In this section, you look at the key task of feature engineering, which
selects the columns of data that make the most impact in the model.
● Feature engineering involves selecting or extracting the best features for machine
learning.
● Preprocessing gives you better data to work with. Better data typically provides
better results.
● Preprocessing has two categories:
○ Converting data (to numerical values)
○ Cleaning up dirty data (removing missing data, cleaning outliers)
● How you deal with dirty data affects your model, pay attention to missing data and
outliers.
● Develop a strategy for dirty data. Replace or delete rows with missing data.
○ Delete, transform, or impute new values for outliers.
11
Section 5:
● In This section, you will learn how to select a model and train it
with the data that you preprocessed.
● Split data into training, testing and validation sets to help you validate the models
accuracy
● Can use K-fold cross validation can help with smaller datasets
● Can use two key algorithms for supervised learning—XGBoost and linear learner
● Use k-means for unsupervised learning•Use Amazon SageMaker training jobs to
train models
Section 6:
● In this section, you will learn how you can deploy your trained model
for consumption by applications.
● You can deploy your trained model by using Amazon SageMaker to
handle API calls from applications, or to perform predictions by using a
batch transformation.
● The goal of your model is to generate predictions to answer the business
problem. Be sure that your model can generate good results before you
deploy it to production.
● Use Single-model endpoints for simple use cases and use multi-model
endpoint support to save resources when you have multiple models to deploy
Section 7:
● In this section, you will learn how you can evaluate your model’s success in
predicting results.
● To evaluate the model, you must have data that the model hasn’t seen,
through either a hold-out set or by using k-fold cross validation.
● Different machine learning models use different metrics.
○ Classification can use confusion matrix, and the AUC-ROC that can
be generated from it.
○ Regression can use mean squared.
Section 8:
● In this section, you will learn how to tune the model’s hyperparameters
to improve model performance
● Model tuning is important to find the best solution to your business problem.
● Hyperparameters can be tuned for the model, optimizer, and data.
● Amazon SageMaker can perform automatic hyperparameter tuning.
Overall model development can be accelerated by using Autopilot.
12
2.3 Summary:
13
3. FORECASTING
You can use forecasting for a range of domains. Some of the more common applications
include:
● Marketing applications, such as sales forecasting or demand projections.
● Inventory management systems to anticipate required inventory levels.
Often, this type of forecast includes information about delivery times.
● Energy consumption to determine when and where energy is needed.
● Weather forecasting systems for governments, and commercial applications such as
agriculture.
14
3.3 Using Amazon Forecast:
● You can use Amazon Forecast to train and use a model for time series data
● There are specific schemas defined for domains such as retail and EC2
capacity planning, or you can use a custom schema
● You need to supply at least the time series data, but can also provide
metadata and related data to add move information to the model
● As with most supervised machine learning problems, your data is split into
training and testing data, but this split takes into account the time element
● Use RMSE and wQuantileLoss metrics to evaluate the efficiency of the model
Summary:
● In summary, in this module you learned how to:
● Describe the business problems solved by using Amazon Forecast
● Describe the challenges of working with time series data
● List the steps that are required to create a forecast by using Amazon Forecast
● Use Amazon Forecast to make a prediction
15
4. COMPUTER VISION
Some of the primary use cases for computer vision include these examples.
● Public safety and home security: Computer vision with image and facial
recognition can help to quickly identify unlawful entries or persons of
interest. This process can result in safer
communities and a more effective way of deterring crimes.
● Authentication and enhanced computer-human interaction: Enhanced human-
computer
interaction can improve customer satisfaction. Examples include products
that are based on customer sentiment analysis in retail outlets or faster
banking services with quick
authentication that is based on customer identity and preferences.
● Content management and analysis: Millions of images are added every
day to media and social channels. The use of computer vision
technologies—such as metadata extraction and image classification—can
improve efficiency and revenue opportunities.
● Autonomous driving: By using computer-vision technologies, auto
manufacturers can provide improved and safer self-driving car navigation,
which can help realize autonomous driving and make it a reliable
transportation option.
● Medical imaging: Medical image analysis with computer vision can
improve the accuracy and speed of a patient's medical diagnosis, which can
result in better treatment outcomes and life
expectancy.
● Manufacturing process control: Well-trained computer vision that is
incorporated into robotics can improve quality assurance and
operational efficiencies in manufacturing
applications. This process can result in more reliable and cost-effective products
16
● Image analysis includes object classification, detection, and segmentation
● Video analysis includes instance tracking, action recognition, and motion
estimation
17
Amazon SageMaker Ground Truth Plus
● With SageMaker Ground Truth Plus, you can easily create high-quality
training datasets without having to build labeling applications or manage
labeling workforces on your own.
● Amazon SageMaker Ground Truth Plus helps reduce data labeling costs by up to
40%.
Amazon SageMaker Ground Truth Plus provides an expert workforce
that is trained on ML tasks and can help meet your data security, privacy,
and compliance requirements.
● You simply upload your data, and Amazon SageMaker Ground Truth Plus then
creates data
labeling workflows and manages workflows on your behalf.
● The image below depicts the workflow of Amazon Pagemaker ground truth Plus:
18
Amazon SageMaker Ground Truth
● If you want the flexibility to build and manage your data labeling workflows
and manage your data labeling workforce, you can use Amazon SageMaker
Ground Truth.
● SageMaker Ground Truth is a data labeling service that makes it easy to
label data and gives you the option to use human annotators through
Amazon Mechanical Turk, third-party
vendors, or your private workforce.
● the image below depicts the workflow of Amazon SageMaker ground Truth
19
5. NATURAL LANGUAGE PROCESSING
NLP is a broad term for a general set of business or computational problems that you
can solve with machine learning (ML). NLP systems predate ML. Two examples are
speech-to-text on your old cell phone and screen readers. Many NLP systems now
use some form of machine learning. NLP considers the hierarchical structure of
language. Words are at the lowest layer of the hierarchy. A group of words makes a
phrase. The next level up consists of phrases, which make a sentence, and ultimately,
sentences convey ideas.
20
You can apply NLP to a wide range of problems. Some of the more common
applications include:
● Search applications (such as Google and Bing)
● Human-machinene interfaces (such as Alexa)
● Sentiment analysis for marketing or political campaigns
● Social research that is based on media analysis
● Chatbots to mimic human speech in applications
Some of the more common use cases for Amazon Transcribe include:
● Medical transcription –Medical professionals can record their notes, and
Amazon Transcribe can capture their spoken notes as text.
● Video subtitles –Video production organizations can generate subtitles
automatically from video. It can also be done in real-time for a live feed
to add closed captioning (CC).
● Streaming content labeling –Media companies can capture and label
content, and then feed the content into Amazon Comprehend for further
analysis.
● Customer call center monitoring –Companies can record customer service
or sales calls, and then analyze the results for training or strategic
opportunities.
Amazon Polly:
21
Some of the more common use cases for Amazon Polly include:
● News service production – Major news companies use Amazon Polly
to generate vocal content directly from their written stories.
● Language training systems –Language training companies use
Amazon Polly to create systems for learning a new language.
● Navigation systems – Amazon Polly is embedded in mapping
application programming interfaces (APIs) so that developers can add
voice to their geo-based applications.
● Animation production – Animators use Amazon Polly to add voices to their
characters
Amazon Translate
It can create real-time translation between languages. You can create systems
for reading documents in one language and then render or store them in another
language. You can also use Amazon Translate as part of a document analysis system.
Some of the more common use cases for Amazon Translate include:
● International websites – You can use Amazon Translate to quickly globalize your
websites.
● Software localization – Localization is a major cost for all software that is
aimed at a global audience. Amazon Translate can decrease software
development time and significantly reduce costs for localizing software.
● Multilingual chatbots – Chatbots are used to create a more human-like interface
to
applications. With Amazon Translate, you can create a chatbot that speaks multiple
languages.
● International media management – Companies that manage media for a
global audience use Amazon Translate to reduce their costs for localization.
Amazon Comprehend
It automates many of the NLP use cases that are reviewed in this module.
Amazon Comprehend implements many of the NLP techniques that you reviewed
earlier in this module. You can extract key entities, perform sentiment analysis, and
tag words with parts of speech.
Some of the more common use cases for Amazon Comprehend include:
● Analysis of legal and medical documents – Legal, insurance, and medical
organizations have used Amazon Comprehend to perform many of the NLP
functions that you learned about in this module.
● Financial fraud detection – Banking, financial, and other institutions have used
Amazon
Comprehending to examine very large datasets of financial transactions to
uncover fraud and look for patterns of illegal transactions.
22
● Large-scale mobile app analysis – Developers of mobile apps can use Amazon
Comprehend
to look for patterns in how their apps are used so they can design improvements.
● Content management – Media and other content companies can use
Amazon Comprehend to tag content for analysis and management purposes
Amazon Lex
It can create a human-like interface for your application. Amazon Lex enables you to use
the same
conversational engine that powers Amazon Alexa. You can automatically increase
capacity by creating AWS Lambda functions to scale on demand. In addition, you
can store log files of the conversations for further analysis.
Some of the more common use cases for Amazon Lex include:
● Building frontend interfaces for inventory management and sales –
Voice interfaces are becoming more common. Companies use Amazon
Lex to add chatbots to their inventory and sales applications.
● Developing interactive assistants – By combining Amazon Lex with
other ML services, customers create more sophisticated assistants for
many different industries.
● Creating customer service interfaces – Human-like voice applications are
quickly becoming the norm for customer service applications. Amazon Lex
can reduce the time and increase the quality of these chatbots.
● Query databases with a human-like language – Amazon Lex is combined
with other AWS database services to create sophisticated data analysis
applications with a human-like language interface.
23
OUTPUT SCREENS
24
Fig 8: Creating an auto scaling group
25
CONCLUSION:
I’ve also learned how to implement a machine learning pipeline. This included how to:
effectiveness
26
What I have learned from this internship is how to use managed Amazon ML
services for forecasting, computer vision, and natural language processing, and I am
now able to:
● Describe the business problems that Amazon Forecast solves
● Describe the challenges of working with time series data
● List the steps that are required to forecast by using Amazon Forecast
● Use Amazon Forecast to make a prediction
● Describe the computer vision use cases
● Describe the managed Amazon ML services for image and video analysis
● List the steps that are required to prepare a custom dataset for object detection
● Describe how Amazon SageMaker Ground Truth can be used to prepare a custom
dataset
● Use Amazon Rekognition to perform facial detection
● Describe the natural language processing (NLP) use cases that are solved
by using managed Amazon ML services
● Describe the managed Amazon ML services available for NL
27
REFERENCES:
1. https://aws.amazon.com/training/awsacademy/
2. https://docs.aws.amazon.com/
3. https://aws.amazon.com/
4. https://eduskillsfoundation.org/
5. https://en.wikipedia.org/wiki/Cloud_computing
6. https://www.geeksforgeeks.org/cloud-deployment-models/?ref=gcse
7. https://www.w3schools.in/cloud-computing
8. https://www.ibm.com/cloud/learn/cloud-computing#toc-cloud-comp-
noOVC-kh
9. https://www.redhat.com/en/topics/cloud-computing/what-are-cloud-
services
28