0% found this document useful (0 votes)
22 views

Slide 1

Machine learning slide one
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Slide 1

Machine learning slide one
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Machine Learning

Machine Learning
Arthur Samuel, a pioneer in the field of artificial intelligence and computer gaming, coined
the term “Machine Learning” as – “Field of study that gives computers the capability
to learn without being explicitly programmed”.

How it is different from


traditional Programming:
 In Traditional Programming, we feed the Input,
Program logic and run the program to get
output.
 In Machine Learning, we feed the input, output
and run it on machine during training and the
machine creates its own logic, which is being
evaluated while testing.
Terminologies that one should know before starting Machine Learning:

 Model: A model is a specific representation learned from data by


applying some machine learning algorithm. A model is also
called hypothesis.

 Feature: A feature is an individual measurable property of our data. A set


of numeric features can be conveniently described by a feature vector.
Feature vectors are fed as input to the model. For example, in order to
predict a fruit, there may be features like color, smell, taste, etc.

 Target(Label): A target variable or label is the value to be predicted by our


model. For the fruit example discussed in the features section, the label
with each set of input would be the name of the fruit like apple, orange,
banana, etc.
 Training: The idea is to give a set of inputs(features) and it’s expected
outputs(labels), so after training, we will have a model (hypothesis) that will
then map new data to one of the categories trained on.

 Prediction: Once our model is ready, it can be fed a set of inputs to which
it will provide a predicted output(label).
Types of Learning

• Supervised Learning
• Unsupervised Learning
• Semi-Supervised Learning

1. Supervised Learning: Supervised learning is when the model is getting


trained on a labelled dataset. Labelled dataset is one which have both input
and output parameters. In this type of learning both training and validation
datasets are labelled as shown in the figures below.

Classification Regression
Types of Supervised Learning:
• Classification
• Regression

Classification : It is a Supervised Learning task where output is having


defined labels(discrete value). For example in above Figure A, Output –
Purchased has defined labels i.e. 0 or 1 ; 1 means the customer will
purchase and 0 means that customer won’t purchase. It can be either
binary or multi class classification. In binary classification, model predicts
either 0 or 1 ; yes or no but in case of multi class classification, model
predicts more
Example: than
Gmail one class.
classifies mails in more than one classes like social,
promotions, updates, offers.

Regression : It is a Supervised Learning task where output is having


continuous value.
Example in before regression Figure, Output – Wind Speed is not having
any discrete value but is continuous in the particular range. The goal here
is to predict a value as much closer to actual output value as our model
can and then evaluation is done by calculating error value. The smaller
the error the greater the accuracy of our regression model.
Example of Supervised Learning
Algorithms:
 Linear Regression

 Nearest Neighbor

 Gaussian Naive Bayes

 Decision Trees

 Support Vector Machine (SVM)

 Random Forest
Unsupervised Learning:
Unsupervised learning is the training of machine using information
that is neither classified nor labeled and allowing the algorithm to act
on that information without guidance. Here the task of machine is to
group unsorted information according to similarities, patterns and
differences without any prior training of data. Unsupervised machine
learning is more challenging than supervised learning due to the
absence of labels.

Types of Unsupervised
Learning:

 Clustering

 Association
Clustering: A clustering problem is where you want to discover the
inherent groupings in the data, such as grouping customers by
purchasing behavior.

Association: An association rule learning problem is where you


want to discover rules that describe large portions of your data, such
as people that buy X also tend to buy Y.
Examples of unsupervised learning algorithms are:

 k-means for clustering problems.


 Apriori algorithm for association rule learning
problems
The most basic disadvantage of any Supervised
Learning algorithm is that the dataset has to be hand-labeled
either by a Machine Learning Engineer or a Data Scientist. This is
a very costly process, especially when dealing with large volumes
of data. The most basic disadvantage of any Unsupervised
Learning is that it’s application spectrum is limited.
Semi-supervised machine learning:
To counter these disadvantages, the concept of Semi-Supervised
Learning was introduced. In this type of learning, the algorithm is trained
upon a combination of labeled and unlabeled data. Typically, this combination
will contain a very small amount of labeled data and a very large amount of
unlabeled data.

• In semi supervised learning


labelled data is used to learn a
model and using that model
unlabeled data is labelled called
pseudo labelling now using
whole data model is trained for
further use
Model with labellled data and model with both labelled and
unlabelled data

Intuitively, one may imagine the three types of learning algorithms as


Supervised learning where a student is under the supervision of a teacher at
both home and school, Unsupervised learning where a student has to figure
out a concept himself and Semi-Supervised learning where a teacher teaches a
few concepts in class and gives questions as homework which are based on
REGRESSION

Regression is a statistical measurement used in finance,


investing, and other disciplines that attempts to
determine the strength of the relationship between one
dependent variable and a series of other changing
variables or independent variable
Types of regression
linear regression
Simple linear regression
Multiple linear
regression
Polynomial regression
Decision tree regression
Random forest regression
Simple Linear regression

 The simple linear


regression models are
used to show or predict
the relationship between
the two variables or
factors

 The factor that being


predicted is called
dependent variable and
the factors that is are
used to predict the
dependent variable are
called independent Simple Linear regression
variables
Predicting C02 emission with engine size
feature using simple linear regression
from sklearn import linear_model

regr = linear_model.LinearRegression()

train_x = np.asanyarray(train[['ENGINESIZE']])

train_y = np.asanyarray(train[['CO2EMISSIONS']])

regr.fit (train_x, train_y)

# The coefficients

print ('Coefficients: ', regr.coef_)


print ('Intercept: ',regr.intercept_)
Multiple linear
regression
 Multipleregression is an
extension of simple linear
regression. It is used
when we want to predict
the value of a variable
based on the value of
two or more other
variables. The variable
we want to predict is
called the dependent
variable (or sometimes,
the outcome, target or
criterion variable).
Simple linear regression
• Predict CO2 emission vs Engine size of all cars
- Independent variable(x) : Engine size
-Dependent variable(y):CO2 emission
Multiple linear regression
• Predict CO2 emission vs Engine size and cylinders of all car
-Independent variable(x) : engine size,cylinders
-Dependent variable(y):CO2 emission
from sklearn import linear_model

regr = linear_model.LinearRegression()

train_x =
np.asanyarray(train[['ENGINESIZE','CYLINDERS']])

train_y = np.asanyarray(train[['CO2EMISSIONS']])

regr.fit (train_x, train_y)

# The coefficients

print ('Coefficients: ', regr.coef_)


print ('Intercept: ',regr.intercept_)
Polynomial regression
 Polynomial Regression is
a form of linear
regression in which the
relationship between the
independent variable x
and dependent variable y
is modelled as
an nth degree polynomial.
Polynomial regression fits
a nonlinear relationship
between the value of x
and the corresponding
conditional mean of y,
denoted E(y |x)
from sklearn.preprocessing import PolynomialFeatures

from sklearn import linear_model

train_x =
np.asanyarray(train[['ENGINESIZE','CYLINDERS']])
train_y = np.asanyarray(train[['CO2EMISSIONS']])

test_x = np.asanyarray(test[['ENGINESIZE']])
test_y = np.asanyarray(test[['CO2EMISSIONS']])

poly = PolynomialFeatures(degree=2)
train_x_poly = poly.fit_transform(train_x)
train_x_poly.shape
fit_transform takes our x values, and output a list of our data
raised from power of 0 to power of 2 (since we set the degree of our
polynomial to 2).

in our example

Now, we can deal with it as 'linear regression' problem. Therefore,


this polynomial regression is considered to be a special case of
traditional multiple linear regression. So, you can use the same
mechanism as linear regression to solve such a problems.
so we can use LinearRegression() function to solve it:

clf = linear_model.LinearRegression()
train_y_ = clf.fit(train_x_poly, train_y)

# The coefficients
print ('Coefficients: ', clf.coef_)
print ('Intercept: ',clf.intercept_)
Decision tree regression

Decision tree builds regression models in the form of a tree structure.


It breaks down a dataset into smaller and smaller subsets while at the
same time an associated decision tree is incrementally developed. The
final result is a tree with decision nodes and leaf nodes. A decision
node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast
and Rainy), each representing values for the attribute tested. Leaf
node (e.g., Hours Played) represents a decision on the numerical
target. The topmost decision node in a tree which corresponds to the
best predictor called root node. Decision trees can handle both
categorical and numerical data.
Decision tree regression observes features of an object and
trains a model in the structure of a tree to predict data in the
future to produce meaningful continuous output. Continuous
output means that the output/result is not discrete, i.e., it is
not represented just by a discrete, known set of numbers or
values.
Discrete output example: A weather prediction model that
predicts whether or not there’ll be rain in a particular day.
Continuous output example: A profit prediction model that
Code:
# import the regressor
from sklearn.tree import DecisionTreeRegressor

# create a regressor object


regressor = DecisionTreeRegressor(random_state = 0)

# fit the regressor with X and Y data


regressor.fit(X, y)
Random forest regression

The Random Forest is one of the most effective machine


learning models for predictive analytics, making it an
industrial workhorse for machine learning.
The random forest model is a type of additive model that
makes predictions by combining decisions from a sequence
of base models. Here, each base classifier is a
simple decision tree. This broad technique of using multiple
models to obtain better predictive performance is
called model ensembling. In random forests, all the base
models are constructed independently using a different
subsample of the data
Approach :
 Pick at random K data
points from the training
set.
 Build the decision tree
associated with those K
data points.
 Choose the number Ntree
of trees you want to build
and repeat step 1 & 2.
 For a new data point, make
each one of your Ntree
trees predict the value of Y
for the data point, and
assign the new data point
the average across all of
the predicted Y values.
Code

# import the regressor


from sklearn.tree import DecisionTreeRegressor

# create a regressor object


regressor = DecisionTreeRegressor(random_state = 0)

# fit the regressor with X and Y data


regressor.fit(X, y)
Pros and cons

Regression model Pros Cons

Works on any size of dataset,


The Linear regression
Linear regression gives information about
features. assumptions.
Works on any size of dataset, Need to choose right
Polynomial regression works very well on non linear polynomial degree for a. Good
problems bias and trade off.
Easily adaptable, works very Compulsory to apply feature
SVR well on non linear problems, scaling, not well known ,more
not biased by outliers difficult to understand.
Interpretability ,no need for Poor results on small
Decision tree recession feature scaling ,works on both datasets, overfitting can
linear and non linear problems easily occur
Powerful and accurate ,good No Interpretability , overfitting
Random forest regression performance many can easily occur, need to
problems ,including non linear choose number of trees

You might also like