0% found this document useful (0 votes)
17 views

MLP Unit-I

Uploaded by

bvinnuroiroi467
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

MLP Unit-I

Uploaded by

bvinnuroiroi467
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

MACHINE LEARNING

W I T H P Y T H O N
UNIT-I

INTRODUCTION TO
MACHINE LEARNING
WITH PYTHON
ARTIFICIAL INTELLIGENCE
MACHINE LEARNING
DEEP LEARNING
ML - Basic Terminology
Machine Learning Relationships
• Machine learning systems uses Relationships between Inputs to
produce Predictions.

• In algebra, a relationship is often written as y = ax + b:


• y is the label we want to predict
• a is the slope of the line
• x are the input values
• b is the intercept

• With ML, a relationship is written as y = b + wx:


• y is the label we want to predict
• w is the weight (the slope)
• x are the features (input values)
• b is the bias
ML - Basic Terminology
Machine Learning Labels
• In Machine Learning terminology, the label is the thing we want
to predict.
• It is like the y in a linear graph:

Algebra Machine Learning

y = ax + b y = b + wx
ML - Basic Terminology
Machine Learning Features
• In Machine Learning terminology, the features are the input.
• They are like the x values in a linear graph:

Algebra Machine Learning


y = ax + b y = b + wx

• Sometimes there can be many features (input values) with different


weights:
• y = b + w1x1 + w2x2 + w3x3 + w4x4
• ML models:
A machine learning (ML) model is a type of artificial intelligence (AI)
model that uses a mathematical formula to make predictions about
future events. ML models are trained on a set of data and then used to
make predictions about new data. They can recognize patterns in data
or make decisions from an unseen dataset.
ML - Basic Terminology
Evaluation:
• Evaluation is the process of measuring the performance or accuracy
of a machine learning model on a given data set. Evaluation can be
done using different metrics, such as error rate, precision, recall, F1-
score, etc.
Optimization:
• Optimization is the process of finding the best or optimal parameters
or settings for a machine learning model that minimize the error or
maximize the accuracy on the training data. Optimization can be
done using different methods, such as gradient descent, stochastic
gradient descent, genetic algorithms, etc.
ML - Basic Terminology
ML - Types
1. Supervised Learning
• Supervised learning is applicable when a machine has sample data,
i.e., input as well as output data with correct labels. Correct labels
are used to check the correctness of the model using some labels and
tags.
• Supervised learning technique helps us to predict future events with
the help of past experience and labelled examples.
• Initially, it analyses the known training dataset, and later it
introduces an inferred function that makes predictions about output
values.
• Further, it also predicts errors during this entire learning process and
also corrects those errors through algorithms.
--Example: Let's assume we have a set of images tagged as ''dog''. A
machine learning algorithm is trained with these dog images so it can
easily distinguish whether an image is a dog or not.

--Determining whether a tumor is benign based on a medical image


--Detecting fraudulent activity in credit card transactions
--Identifying the zip code from handwritten digits on an envelope

• --Regression: A regression problem is when the output variable is a


real value, such as “dollars” or “weight”.
• Classification: A classification problem is when the output variable is
a category, such as “Red” or “blue” , “disease” or “no disease”.
ML - Types
2. Unsupervised Learning
• In unsupervised learning, a machine is trained with some input
samples only, while output is not known.
• The training information is neither classified nor labeled; hence, a
machine may not always provide correct output compared to
supervised learning.
• Although Unsupervised learning is less common in practical business
settings, it helps in exploring the data and can draw inferences from
datasets to describe hidden structures from unlabeled data.
Example: Let's assume a machine is trained with some set of documents
having different categories (Type A, B, and C), and we have to organize them
into appropriate groups. Because the machine is provided only with input
samples or without output, so, it can organize these datasets into type A,
type B, and type C categories, but it is not necessary whether it is organized
correctly or not.
--Identifying topics in a set of blog posts
--Segmenting customers into groups with similar preferences
--Detecting abnormal access patterns to a website

• Unsupervised learning is classified into two categories of algorithms:


• Clustering: A clustering problem is where you want to discover the
inherent groupings in the data, such as grouping customers by purchasing
behavior.
• Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people that
buy X also tend to buy Y.
ML - Types
3. Reinforcement Learning

• Reinforcement Learning is a feedback-based machine learning


technique. In such type of learning, agents (computer programs) need
to explore the environment, perform actions, and on the basis of
their actions, they get rewards as feedback.

• For each good action, they get a positive reward, and for each bad
action, they get a negative reward. The goal of a Reinforcement
learning agent is to maximize the positive rewards. Since there is no
labeled data, the agent is bound to learn by its experience only.
• --An example of reinforcement learning is teaching a computer
program to play a video game. The program learns by trying different
actions, receiving points for good moves and losing points for
mistakes.
• --RL can help cars navigate complex environments, making self-driving
technology safer and more reliable.
• --Traffic signal control
• RL can be used to control traffic signals in complex urban networks.
ML - Types
4. Semi-supervised Learning

• Semi-supervised Learning is an intermediate technique of both


supervised and unsupervised learning.
• It performs actions on datasets having few labels as well as unlabeled
data. However, it generally contains unlabeled data.
• Hence, it also reduces the cost of the machine learning model as
labels are costly, but for corporate purposes, it may have few labels.
• Further, it also increases the accuracy and performance of the
machine learning model.
--Semi-supervised learning helps data scientists to overcome the
drawback of supervised and unsupervised learning. Speech analysis,
web content classification, protein sequence classification, text
documents classifiers., etc., are some important applications of Semi-
supervised learning.
Applications of Machine Learning
• Emotion analysis

• Sentiment analysis

• Error detection and prevention

• Weather forecasting and prediction

• Stock market analysis and forecasting

• Speech synthesis
Applications of Machine Learning

• Speech recognition

• Customer segmentation

• Object recognition

• Fraud detection

• Fraud prevention

• Recommendation of products to customer in online shopping.


Applications of Machine Learning
1. Emotion Analysis
Objective: Determine the emotional tone or sentiment behind a piece of text or speech.
Techniques: Natural Language Processing (NLP), sentiment analysis models, emotion detection
algorithms.
Applications: Customer feedback analysis, social media monitoring, mental health assessments.
2. Sentiment Analysis
Objective: Assess the sentiment expressed in text, such as positive, negative, or neutral.
Techniques: Text classification, machine learning algorithms (e.g., Naive Bayes, SVM), deep learning
(e.g., LSTM, BERT).
Applications: Market research, brand monitoring, user reviews analysis.
3. Error Detection and Prevention
Objective: Identify and prevent errors in data, systems, or processes.
Techniques: Anomaly detection, error-correcting codes, predictive maintenance.
Applications: Software quality assurance, fraud detection, system monitoring.
Applications of Machine Learning
4. Weather Forecasting and Prediction
Objective: Predict weather conditions based on historical and real-time data.
Techniques: Numerical weather prediction models, machine learning algorithms, statistical
methods.
Applications: Daily weather forecasts, climate research, disaster preparedness.
5. Stock Market Analysis and Forecasting
Objective: Analyze stock market trends and predict future stock prices or trends.
Techniques: Time series analysis, machine learning models (e.g., ARIMA, LSTM), technical analysis
indicators.
Applications: Investment strategies, financial planning, risk management.
6. Speech Synthesis
Objective: Convert written text into spoken words.
Techniques: Text-to-Speech (TTS) systems, neural networks (e.g., WaveNet), prosody modeling.
Applications: Virtual assistants, accessibility tools, automated customer service.
Applications of Machine Learning
7. Speech Recognition
Objective: Convert spoken language into text.
Techniques: Automatic Speech Recognition (ASR) systems, deep learning models (e.g., CNNs, RNNs).
Applications: Voice commands, transcription services, hands-free control.
8. Customer Segmentation
Objective: Divide customers into groups based on similarities in behavior or characteristics.
Techniques: Clustering algorithms (e.g., K-means, DBSCAN), dimensionality reduction (e.g., PCA).
Applications: Targeted marketing, personalized recommendations, market research.
9. Object Recognition
Objective: Identify and classify objects within images or video frames.
Techniques: Convolutional Neural Networks (CNNs), object detection frameworks (e.g., YOLO, Faster
R-CNN).
Applications: Image search engines, autonomous vehicles, security systems.
Applications of Machine Learning
10. Fraud Detection
Objective: Identify fraudulent activities or anomalies.
Techniques: Anomaly detection, machine learning models (e.g., decision trees, random forests),
rule-based systems.
Applications: Financial transactions, insurance claims, identity theft prevention.
11. Fraud Prevention
Objective: Implement measures to prevent fraudulent activities.
Techniques: Risk assessment models, real-time monitoring systems, security protocols.
Applications: Transaction monitoring, access control, compliance enforcement.
12. Recommendation of Products to Customers in Online Shopping
Objective: Suggest relevant products to users based on their preferences and behavior.
Techniques: Collaborative filtering, content-based filtering, hybrid recommendation systems.
Applications: E-commerce websites, personalized marketing, user experience enhancement.
Why Python?
• Python has become the lingua franca for many data science
applications.
• It combines the power of general-purpose programming languages
with the ease of use of domain-specific scripting languages like
MATLAB or R.
• Python has libraries for data loading, visualization, statistics, natural
language processing, image processing, and more.
• One of the main advantages of using Python is the ability to interact
directly with the code, using a terminal or other tools like the
Jupyter Notebook.
Why Python?
• --It is an interpreted language, which means the source code of
Python program would be first converted into bytecode and then
executed by Python virtual machine.
• --Machine learning and data analysis are fundamentally iterative
processes, in which the data drives the analysis. It is essential for
these processes to have tools that allow quick iteration and easy
interaction.
• --As a general-purpose programming language, Python also allows for
the creation of complex graphical user interfaces (GUIs) and web
services, and for integration into existing systems.
Installing Python
• For working in Python, we must first have to install it. You can perform the
installation of
• Python in any of the following two ways:
• Installing Python individually
• Using Pre-packaged Python distribution: Anaconda

• Installing Python Individually


If you want to install Python on your computer, then then you need to
download only the binary code applicable for your platform. Python
distribution is available for Windows, Linux and Mac platforms.
Installing Python
On Windows platform
• With the help of following steps, we can install Python on Windows
platform:
• First, go to https://www.python.org/downloads/.
• Next, click on the link for Windows installer python-XYZ.msi file. Here
XYZ is the version we wish to install.
• Now, we must run the file that is downloaded. It will take us to the
Python install wizard, which is easy to use. Now, accept the default
settings and wait until the install is finished
Pre-packaged Python Distribution:
Anaconda
• Anaconda is a packaged compilation of Python which have all the libraries widely
used in Data science.
• We can follow the following steps to setup Python environment using Anaconda:
• Step1: First, we need to download the required installation package from
Anaconda distribution. The link for the same is
https://www.anaconda.com/distribution/. You can choose from Windows, Mac
and Linux OS as per your requirement.
• Step2: Next, select the Python version you want to install on your machine. The
latest Python version is 3.12.4 There you will get the options for 64-bit and 32-bit
Graphical installer both.
• Step3: After selecting the OS and Python version, it will download the Anaconda
installer on your computer. Now, double click the file and the installer will install
Anaconda package.
Essential Libraries and Tools
Jupyter Notebook

• The Jupyter Notebook is an interactive environment for running code in


the browser.
• It is a great tool for exploratory data analysis and is widely used by data
scientists.
• While the Jupyter Notebook supports many programming languages,
we only need the Python support.
• The Jupyter Notebook makes it easy to incorporate code, text, and
images.
scikit-learn
• scikit-learn is an open source project, meaning that it is free to use
and distribute, and anyone can easily obtain the source code to see
what is going on behind thescenes.
• The scikit-learn project is constantly being developed and improved,
and it has a very active user community.
• It contains a number of state-of-the-art machine learning algorithms,
as well as comprehensive documentation about each algorithm.
• scikit-learn is a very popular tool, and the most prominent Python
library for machine learning.
NumPy

• NumPy is one of the fundamental packages for scientific


computing in Python.
• It contains functionality for multidimensional arrays, high- Example:
level mathematical functions such as linear algebra
operations and the Fourier transform, and import numpy as np
pseudorandom number generators. x = np.array([[1, 2, 3], [4, 5,
6]])
• In scikit-learn, the NumPy array is the fundamental data print("x:\n{}".format(x))
structure. scikit-learn takes in data in the form of NumPy
arrays. Any data you’re using will have to be converted to
a NumPy array.
• The core functionality of NumPy is the ndarray class, a
multidimensional (n-dimensional) array. All elements of
the array must be of the same type.
SciPy

• SciPy is a collection of functions for scientific


computing in Python. It provides, among other Example:
functionality, advanced linear algebra routines, from scipy import sparse
mathematical function optimization, signal # Create a 2D NumPy array with a
processing, special mathematical functions, and diagonal of ones, and zeros
statistical distributions. everywhere else
eye = np.eye(4)
• scikit-learn draws from SciPy’s collection of functions print("NumPy
for implementing its algorithms. array:\n{}".format(eye))
• The most important part of SciPy for us is
scipy.sparse: this provides sparse matrices, which are sparse_matrix =
sparse.csr_matrix(eye)
another representation that is used for data in
print("\nSciPy sparse CSR
scikit_learn. matrix:\n{}".format(sparse_matrix))
• Sparse matrices are used whenever we want to store
a 2D array that contains mostly zeros
matplotlib

• matplotlib is the primary scientific plotting


library in Python.
Example:
• It provides functions for making publication-
%matplotlib inline
quality visualizations such as line charts,
import matplotlib.pyplot as plt
histograms, scatter plots, and so on. # Generate a sequence of numbers from -10
to 10 with 100 steps in between
• Visualizing your data and different aspects of x = np.linspace(-10, 10, 100)
your analysis can give you important insights, and # Create a second array using sine
we will be using matplotlib for all our y = np.sin(x)
# The plot function makes a line chart of one
visualizations. array against another
plt.plot(x, y, marker="x")
• When working inside the Jupyter Notebook, you
can show figures directly in the browser by using
the %matplotlib notebook and %matplotlib inline
commands.
A First Application: Classifying Iris Species
• Our goal is to build a
machine learning model
that can learn from the
measurements of these
irises whose species is
known, so that we can
predict the species for a
new iris.
A First Application: Classifying Iris Species
• Because we have measurements for which we know the correct
species of iris, this is a supervised learning problem.
• In this problem, we want to predict one of several options (the
species of iris). This is an example of a classification problem.
• The possible outputs (different species of irises) are called classes.
Every iris in the dataset belongs to one of three classes, so this
problem is a three-class classification problem.
• The desired output for a single data point (an iris) is the species of
this flower. For a particular data point, the species it belongs to is
called its label.
Meet the Data
• The data we will use for this example is the Iris dataset, a classical
dataset in machine learning and statistics. It is included in scikit-learn in
the datasets module.
• We can load it by calling the load_iris function:
In[10]:
from sklearn.datasets import load_iris
iris_dataset = load_iris()
• The iris object that is returned by load_iris is a Bunch object, which is
very similar to a dictionary. It contains keys and values:
In[11]:
print("Keys of iris_dataset: \n{}".format(iris_dataset.keys()))
• The value of the key target_names is an array of strings, containing
the species of flower that we want to predict:

In[13]:
print("Target names: {}".format(iris_dataset['target_names']))
• The value of feature_names is a list of strings, giving the description
of each feature:

In[14]:
print("Feature names:
\n{}".format(iris_dataset['feature_names']))
• The data itself is contained in the target and data fields. data contains
the numeric measurements of sepal length, sepal width, petal length,
and petal width in a NumPy array:
In[15]:
print("Type of data: {}".format(type(iris_dataset['data'])))

• The rows in the data array correspond to flowers, while the columns
represent the four measurements that were taken for each flower:
In[16]:
print("Shape of data: {}".format(iris_dataset['data'].shape))
• We see that the array contains measurements for 150 different flowers.
Remember that the individual items are called samples in machine
learning, and their properties are called features. Here are the feature
values for the first five samples:

In[17]:
print("First five columns of
data:\n{}".format(iris_dataset['data'][:5]))
• The target array contains the species of each of the flowers that were
measured, also as a NumPy array:
In[18]:
print("Type of target: {}".format(type(iris_dataset['target'])))

target is a one-dimensional array, with one entry per flower:

In[19]:
print("Shape of target: {}".format(iris_dataset['target'].shape))
• The species are encoded as integers from 0 to 2:
In[20]:
print("Target:\n{}".format(iris_dataset['target']))

The meanings of the numbers are given by the iris['target_names'] array:


0 means setosa, 1 means versicolor, and 2 means virginica.
Measuring Success: Training and Testing
Data
• To assess the model’s performance, we show it new data (data that it
hasn’t seen before) for which we have labels.
• This is usually done by splitting the labeled data we have collected
(here, our 150 flower measurements) into two parts.
• One part of the data is used to build our machine learning model, and
is called the training data or training set.
• The rest of the data will be used to assess how well the model works;
this is called the test data, test set, or hold-out set.
• scikit-learn contains a function that shuffles the dataset and splits it
for you: the train_test_split function. This function extracts 75% of
the rows in the data as the training set, together with the
corresponding labels for this data. The remaining 25% of the data,
together with the remaining labels, is declared as the test set.

• In[21]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
iris_dataset['data'], iris_dataset['target'], random_state=0)
• The output of the train_test_split function is X_train, X_test, y_train, and
y_test, which are all NumPy arrays. X_train contains 75% of the rows of the
dataset, and X_test contains the remaining 25%:
In[22]:
print("X_train shape: {}".format(X_train.shape))
print("y_train shape: {}".format(y_train.shape))

In[23]:
print("X_test shape: {}".format(X_test.shape))
print("y_test shape: {}".format(y_test.shape))
Look at Your Data
In[24]:
# create dataframe from data in X_train
# label the columns using the strings in iris_dataset.feature_names
iris_dataframe = pd.DataFrame(X_train,
columns=iris_dataset.feature_names)
# create a scatter matrix from the dataframe, color by y_train
grr = pd.scatter_matrix(iris_dataframe, c=y_train, figsize=(15, 15),
marker='o',
hist_kwds={'bins': 20}, s=60, alpha=.8, cmap=mglearn.cm3)
Building the Model: k-Nearest Neighbors
In[25]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)

In[26]:
knn.fit(X_train, y_train)
Making Predictions
In[27]:
X_new = np.array([[5, 2.9, 1, 0.2]])
print("X_new.shape: {}".format(X_new.shape))
Out[27]:
X_new.shape: (1, 4)

In[28]:
prediction = knn.predict(X_new)
print("Prediction: {}".format(prediction))
print("Predicted target name: {}".format(
iris_dataset['target_names'][prediction]))
Out[28]:
Prediction: [0]
Predicted target name: ['setosa']
Evaluating the Model
In[29]:
y_pred = knn.predict(X_test)
print("Test set predictions:\n {}".format(y_pred))
Out[29]:
Test set predictions:
[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0 2]

In[30]:
print("Test set score: {:.2f}".format(np.mean(y_pred == y_test)))
Out[30]:
Test set score: 0.97

You might also like