Introduction To Machine Learning
Introduction To Machine Learning
What is learning?
The ability to improve one’s behaviour with experience.
How machine learning is different from traditional programming?
Data Program Data Output
Computer Computer
Output Program
Traditional Programming Machine Learning
Definitions
Arthur Samuel (1959):
Machine Learning is a field of study that gives computers the ability to learn
without being explicitly programmed.
- Making predictions
- Classifications
- Clustering
- Decision Making
- Solving tasks/problems
Machine Learning Process
Machine Learning Process
1. Choose the training experience/data.
2. Choose the target function (that is to be learnt).
3. Choose the target class of the function.
4. Choose a learning algorithm to infer the target function.
Types of Machine Learning
Type of Machine Learning
● Supervised Learning
● Unsupervised Learning
● Semi-supervised Learning
● Reinforcement Learning
Supervised Learning
● Supervised learning involves learning from a training set of labelled data.
● Every point in the training is an input-output pair, where the input maps to an
output.
● The learning problem consists of inferring the function that maps between the
input and the output, such that the learned function can be used to predict the
output from future input.
● It is called “supervised” because of the presence of the outcome variable to
guide the learning process.
● Supervised learning problems are further categorised into Regression and
Classification problems.
Supervised Learning
Applications of Supervised Learning
● Prediction
○ Stock prices
○ House prices
○ Weather Forecasting
○ Sales Volume Forecasting
● Classification
○ Disease identification
○ Sentiment Analysis
○ Spam Mail Detection
○ Handwritten Digit Recognition
Supervised Learning Algorithms
● Linear Regression
● Logistic Regression
● Naive Bayes Classifier
● Decision Trees
● Neural Networks
● Support Vector Machines
● K-nearest Neighbour
Unsupervised Learning
● It is a type of machine learning in which the algorithm is not provided with any
pre-assigned labels or scores for the training data.
● Unsupervised learning algorithms must first self-discover any naturally
occurring patterns in that training data set.
● Common examples include clustering, and principal component analysis,
● We observe only the features and have no measurements of the outcome.
● The task of learner is to describe how the data are organized or clustered.
Advantages and Disadvantages of Unsupervised Learning
● A minimal workload to prepare and audit the training set.
● Greater freedom to identify and exploit previously undetected patterns that
may not have been noticed by the "experts".
● The cost of unsupervised techniques requiring a greater amount of training
data and converging more slowly to acceptable performance.
● Increased computational and storage requirements during the exploratory
process,
● Potentially greater susceptibility to artifacts or anomalies in the training data
that might be obviously irrelevant or recognized as erroneous by a human, but
are assigned undue importance by the unsupervised learning algorithm.
Applications of Unsupervised Learning
● Clustering
○ Customer Segmentation
○ Grouping products in a supermarket
● Visualization
● Dimensionality reduction
● Finding association rules
○ Customer that buy item X will buy item Y too.
● Anomaly detection
○ Fraudulent card transaction
○ Malware detection
○ Identification of human errors during data entry
Unsupervised Learning Algorithms
● K-Means Clustering
● Expectation Maximization
● Principal Component Analysis
● Hierarchical Clustering
Basic Terminology
● The inputs are often called the predictors or more classically the
independent variables.
Using validation and test sets will increase the generalizing capability of the model
on new unseen data.
Hypothesis Space and Inductive Bias
Hypothesis
● A hypothesis is a function that best describes the target in supervised
machine learning.
● Represented by h1, h2, h3 etc.
Machine learning
involves finding a
model (hypothesis)
that best explains
the training data.
Hypothesis Space
Hypothesis space is a set of valid hypothesis, i.e. all possible functions.
Represented by symbol H.
Hypothesis Language
X1 X2 X3 X4 Output Class
1 0 1 0 POSITIVE
0 0 0 1 NEGATIVE
1 1 1 1 POSITIVE