0% found this document useful (0 votes)
131 views

Machine Learning Fundamentals

This document provides an introduction to machine learning. It discusses machine learning basics like classification, clustering, and regression. Classification involves predicting a class from observations. Clustering groups observations into meaningful categories. Regression predicts a value from observations. The document also discusses supervised, unsupervised, and reinforcement learning. Popular machine learning techniques include K-means clustering, hierarchical clustering, logistic regression, naive Bayes classification, and support vector machines. Finally, the document lists several real-world applications of machine learning and popular machine learning frameworks.

Uploaded by

Nitesh Shinde
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views

Machine Learning Fundamentals

This document provides an introduction to machine learning. It discusses machine learning basics like classification, clustering, and regression. Classification involves predicting a class from observations. Clustering groups observations into meaningful categories. Regression predicts a value from observations. The document also discusses supervised, unsupervised, and reinforcement learning. Popular machine learning techniques include K-means clustering, hierarchical clustering, logistic regression, naive Bayes classification, and support vector machines. Finally, the document lists several real-world applications of machine learning and popular machine learning frameworks.

Uploaded by

Nitesh Shinde
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Introduction to

Machine Learning
Team Members
 Sudarshan Poojari (37)
 Abhijit Pradhan (38)
 Nilesh Shinde (47)
Agenda
• Introduction
• Basics
• Classification
• Clustering
• Regression
• Use-Cases
2
Quick Questionnaire

How many people have heard about Machine Learning

How many people know about Machine Learning

How many people are using Machine Learning


About
• subfield of Artificial Intelligence (AI)
• name is derived from the concept that it deals with
“construction and study of systems that can learn fromdata”
• can be seen as building blocks to make computers learn to
behave more intelligently
• It is a theoretical concept. There are various techniqueswith
various implementations.
• http://en.wikipedia.org/wiki/Machine_learning
In other words…

“A computer program is said to learn from


experience (E) with some class of tasks (T) and a
performance measure (P) if its performance at tasks
in Tas measured by Pimproves with E”
Terminology
• Features
– The number of features or distinct traits that can be used to describe
each item in a quantitative manner.
• Samples
– A sample is an item to process (e.g. classify). It can be a document, a
picture, a sound, a video, a row in database or CSVfile, or whatever
you can describe with a fixed set of quantitativetraits.
• Feature vector
– is an n-dimensional vector of numerical features thatrepresent some
object.
• Feature extraction
– Preparation of feature vector
– transforms the data in the high-dimensional space to a space of
fewer dimensions.
• Training/Evolution set
– Set of data to discover potentially predictiverelationships.
Let’s dig deep into it…

What do you meanby

Apple
Learning (Training)

Features: Features: Features:


1. Color: Radish/Red 1. SkyBlue 1. Yellow
2. Type : Fruit 2. Logo 2. Fruit
3. Shape 3. Shape 3. Shape
etc… etc… etc…
Workflow
Categories

• Supervised Learning

• Unsupervised Learning

• Semi-Supervised Learning

• Reinforcement Learning
Supervised Learning
• the correct classes of the trainingdata are
known

Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Unsupervised Learning
• the correct classes of the training data are not
known

Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Semi-Supervised Learning
• A Mix of Supervised and Unsupervised learning

Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Reinforcement Learning
• allows the machine or software agent to learn its
behavior based on feedback from the environment.
• This behavior can be learnt once and for all, orkeep on
adapting as time goes by.

Credit: http://us.hudson.com/legal/blog/postid/513/predictive-analytics-artificial-intelligence-science-fiction-e-discovery-truth
Machine Learning Techniques
Techniques
• classification: predict class from observations
• clustering: group observations into
“meaningful” groups

• regression (prediction): predict value from


observations
Classification
• classify a document into a predefined category.
• documents can be text, images
• Popular one is Naive BayesClassifier.
• Steps:
– Step1 : Train the program (Building a Model) using a
training set witha category for e.g. sports, cricket, news,
– Classifier will compute probability for each word, the
probability that it makes a document belong to each of
considered categories
– Step2 : Test with a test data set against thisModel
• http://en.wikipedia.org/wiki/Naive_Bayes_classifier
Clustering
• clustering is the task of grouping a set of objectsin
such a way that objects in the same group (called
a cluster) are more similar to eachother
• objects are not predefined
• For e.g. these keywords
– “man’s shoe”
– “women’s shoe”
– “women’s t-shirt”
– “man’s t-shirt”
– can be cluster into 2 categories “shoe” and “t-shirt” or
“man” and “women”
• Popular ones are K-means clustering and Hierarchical
clustering
K-means Clustering
• partition n observations into k clusters in which each observation belongs
to the cluster with the nearest mean, serving as a prototype of the cluster.
• http://en.wikipedia.org/wiki/K-means_clustering

http://pypr.sourceforge.net/kmeans.html
Hierarchical clustering
• method of cluster analysis which seeks tobuild
a hierarchy of clusters.
• There can be two strategies
– Agglomerative:
• This is a "bottom up" approach: each observation starts in itsown
cluster, and pairs of clusters are merged as one moves up the
hierarchy.
• Time complexity is O(n^3)
– Divisive:
• This is a "top down" approach: all observations start inone cluster,
and splits are performed recursively as one moves down the
hierarchy.
• Time complexity is O(2^n)

• http://en.wikipedia.org/wiki/Hierarchical_clustering
Regression
• is a measure of the relation between
the mean value of one variable (e.g.
output) and corresponding values of
other variables (e.g. time andcost).
• regression analysis is a statistical
process for estimating the
relationships among variables.
• Regression means to predictthe
output value using training data.
• Popular one is Logistic regression
(binary regression)
• http://en.wikipedia.org/wiki/Logistic_regression
Classification vs Regression
• Classification means to • Regression means to
group the output into predict the output
a class. value using training
• classification to predict data.
the type of tumor i.e. • regression to predict
harmful or not harmful the house price from
using training data training data
• if it is • if it is a real
discrete/categorical number/continuous,
variable, then it is then it is regression
classification problem problem.
Let’s see the usage in Reallife
Use-Cases
• Spam Email Detection
• Machine Translation (Language Translation)
• Image Search (Similarity)
• Clustering (KMeans) : Amazon
Recommendations
• Classification : Google News

continued…
Use-Cases (contd.)
• Text Summarization - Google News
• Rating a Review/Comment: Yelp
• Fraud detection : Credit cardProviders
• Decision Making : e.g. Bank/Insurance sector
• Sentiment Analysis
• Speech Understanding – iPhone with Siri
• Face Detection – Facebook’s Photo tagging
Popular Frameworks/Tools
• Weka
• Carrot2
• Gate
• OpenNLP
• LingPipe
• Stanford NLP
• Mallet – Topic Modelling
• Gensim – Topic Modelling (Python)
• Apache Mahout
• MLib – Apache Spark
• scikit-learn - Python
• LIBSVM : Support Vector Machines
• and many more…
Questions ?

34
Thank You!

35

You might also like