0% found this document useful (0 votes)
0 views

Introduction to machine learning

The document provides an overview of Machine Learning (ML), defining it as a subfield of artificial intelligence that enables computers to learn from data without explicit programming. It discusses various aspects of ML, including types of learning (supervised, unsupervised, reinforcement), data handling, performance measures, and applications in fields like computer vision and natural language processing. Additionally, it addresses challenges such as overfitting, data augmentation, and the prerequisites for understanding ML.

Uploaded by

achouriarij59
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Introduction to machine learning

The document provides an overview of Machine Learning (ML), defining it as a subfield of artificial intelligence that enables computers to learn from data without explicit programming. It discusses various aspects of ML, including types of learning (supervised, unsupervised, reinforcement), data handling, performance measures, and applications in fields like computer vision and natural language processing. Additionally, it addresses challenges such as overfitting, data augmentation, and the prerequisites for understanding ML.

Uploaded by

achouriarij59
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Machine learning

Introduction

Mohamed FARAH

Année universitaire : 2024-2025

Machine Learning

 Machine Learning is the field of study that gives the


computer the ability to learn without being explicitly
programmed (Arthur Samuel (1959)
Machine Learning

 Machine Learning is:


 a subfield of artificial intelligence based on mathematical and
statistical approaches to empower computers to learn from data
 automatically resolves decision problems without explicit
programming
 relates to the design, optimisation and implementation of methods to
learn from past data in order to predict new observations

Machine Learning – new programming paradigm

Traditional Programming

Data
Computer Output
 Data driven Program

 Automating automation
 Getting computers to Machine Learning
programme themselves
Data
Computer Program
Output

4
Machine Learning

 A computer program is said to learn from experience E with respect to


some class of tasks T and performance measure P, if its performance
at tasks in T, as measured by P, improves with experience E (Tom
Mitchell, 1998)

 Example :
• Experience (data): games played by the program (with itself) Tom Mitchell

• Performance measure: winning rate

 Learning is the acquisition of the ability to perform the task


 How to learn is another type of problem and there are many methods

The Experience E / Data

 Most algorithms experience an entire dataset


 Dataset: A collection of examples, aka data points
 An example is a collection of features (data) that have been
quantitatively measured for some object/event that we
want the ML system to process

6
Data – Example
 Anderson’s Iris data (oldest dataset, in stat/ML 1936)

• Measurements of 150 iris flowers

- 4 attributes : sepal length, sepal width, petal length, petal width ∈ ×

• 3 species: Setosa, versicolor, virginica

23

https://en.wikipedia.org/wiki/Iris_flower_data_set

Data as vectors, matrices, tensors

 Tensors: generalization of matrices


to n dimensions (or rank, order, degree)
• 1D tensor: vector
• 2D tensor: matrix
• 3D, 4D, 5D tensors
Data
 Datasets decomposition
• Training set : data to train with
• Validation set : when to stop training
• Test or generalisation set : data to test on

 These datasets are all disjoint

Dataset Assumptions

 data are generated by a probability distribution


over the data
 Typically make i.i.d assumptions
• Samples are independent from each other
• Training and test sets are identically distributed
(drawn from the same distribution)
The Task T

 ML enables tackling tasks too difficult to solve with fixed


programs written and designed manually
 T is usually described in terms of how the machine learning
system should process an example
 NB. The process of learning itself is not the task

The Performance Measures, P

 P are specific to the task T


 Well known measures based on the confusion matrix

Accuracy
Precision
Recall
F-score
etc.

! Applied on data not seen before:


Test set ... not the training set

12
After the task is learned

 Processing of new data is called inference

 Computational costs during training (high) vs inference (lower)

13

Related Domains

 Statistics: learning theory, data mining, inference


 Computing: AI, computer vision, IR
 Engineering: signal, robotics, control
 Cognitive science, psychology, epistemology, neuroscience
 Economics: decision theory, game theory

14
Applications of Machine Learning

15

Computer Vision

 Image recognition, segmentation, classification, etc.

Model Cat or Dog

 Example : Recognition of handwritten characters

16
Computer Vision

 Example : Face detection

17

Computer Vision

 Example : Detection of pedestrians

Example of training images

18
Natural Language Processing (NLP)

 Example : Classification of Textual Documents.

19

Natural Language Processing (NLP)

 Example : Detection of spams in the emails.

Hint: Count the frequency and co-occurrence of certain keywords, e.g.


congratulations, lottery, win, prize, etc.

20
Natural Language Processing (NLP)

 Example : Automatic Translation.

“How are you?” Model “Wie geht’s dir?”

Translating machine

21

Natural Language Processing (NLP)

 Example : Recommendation Systems.

22
Natural Language Processing (NLP)

 Example : Chatbots

“How are you?” Model ‘I am fine thank you’

Conversational agent / chatbot

23

Bio-Informatics

 Sequence alignment, analysis of genetic data, etc.

 Example : Prediction of Caesarean Emergency


Conditions

24
Signal processing
 Speech recognition, person identification, speech to text, text to
speech, etc.

Model ‘Hello’

Speech recognition

25

Other areas of application

 Robotics: estimation of positions, of states, etc.


 Financial analysis: allocation of portfolio, credits, grants, etc.
 Medicine: diagnosis, treatment, design of therapies, etc.
 Graphic design: realistic design and simulations, etc.
 Social networks
 Content generation
 etc.

26
Learning Types

(based on tasks)

27

Learning Types

Supervised Unsupervised
Learning Learning

Reinforcement
Learning

28
Supervised learning

29

Supervised learning
 Given: a dataset that contains samples
1 , 1 ,…( , )
 Task: if a residence has square feet, predict its price?

15th sample
( 15 , 15 )

= 800
=?
Housing price prediction
Supervised learning
 Given: a dataset that contains samples
1 , 1 ,…( , )
 Task: if a residence has square feet, predict its price?

= 800
=?
Housing price prediction

Regression vs Classification
 regression: if ∈ ℝ is a continuous variable
 e.g., price prediction

 classification: the label is a discrete variable


 e.g., the task of predicting the types of residence

(size, lot size) → house or townhouse?

= house or
townhouse?
Supervised Learning – Model Types

2 types of models:

• Discriminative model:
• it is estimated that ( | )
• we're learning the decision boundary
• Generative model:
• it is estimated that ( │ ) is used to deduce ( | )
• we learn probability distributions of data 33

Supervised learning in Computer Vision

 Image Classification
 = raw pixels of the image, = the main object

ImageNet Large Scale Visual Recognition Challenge. Russakovsky et al.’2015


Supervised learning in Computer Vision

 Object localization and detection


 = raw pixels of the image, = the bounding boxes

ImageNet Large Scale Visual Recognition Challenge. Russakovsky et al.’2015

Supervised learning in Computer Vision

 Recognition of handwritten characters (OCR)


: values of intensities of pixels of the image.
: identity of the character (class).

36
Supervised learning in NLP
 Machine translation

Unsupervised learning

Also called Knowledge discovery

38
Unsupervised Learning

 Dataset contains no labels: 1 ,…


 Target is not explicitly known
 Goal (vaguely-posed): to find interesting structures /
patterns in the data

supervised unsupervised

Clustering

 k-mean clustering, mixture of Gaussians, etc.


Clustering

 k-mean clustering, mixture of Gaussians, etc.

Density Estimation

 learning the probability distribution having generated the


data.
• To generate new realistic data.
• To distinguish “realistic” data from “false” data (e.g. spam
filtering).
• Compression of data
• etc.

42
Density Estimation

 given a sample = , = 1. . from a distribution,


 obtain an estimate of the density function at any point.
 Parametric:
• Assume a parametric family of densities . ! (e.g., (", # $ )) and obtain
the best estimate !% of !
 Nonparametric:
• Obtain a good estimate of the
entire density directly from
the sample (e.g. Histogram)

43

Representation learning

 automatically extracting useful and significant characteristics


from raw data without labels.

 The aim is to transform the data into a more compact and


informative representation (embeddings) that facilitates
subsequent tasks such as classification or grouping.

44
Word Embedding

Rome
Represent words by vectors
Paris
encode Italy
 word vector
Berlin
encode
 relation direction France
Germany

Word2vec [Mikolov et al’13]


GloVe [Pennington et al’14]

Clustering Words with Similar Meaning


(Hierachically)

[Arora-Ge-Liang-M.-Risteski, TACL’17,18]
Dimensionality reduction

 reduce the number of variables or dimensions of the data,


while preserving the essential information.

“swiss roll”
dataset

47
https://link.springer.com/article/10.1007/s00477-016-1246-2/figures/1

Latent Semantic Analysis (LSA)


documents
words

 Principal Component Analysis (PCA) used in LSA


https://commons.wikimedia.org/wiki/File:Topic_
detection_in_a_document-word_matrix.gif
Large Language Models (LLM)
 machine learning models for language learnt on large-
scale language datasets
 can be used for many purposes

Language Models are Few-Shot Learners [Brown et al.’20]


https://openai.com/blog/better-language-models/

Reinforcement learning

50
Reinforcement learning

 Learning to make sequential decisions


 Chess
• 1997: Deep Blue (IBM) defeated world chess champion Garry Kasparov
in a six-game match.
• 2017: AlphaZero (DeepMind) defeated Stockfish (chess engine)

 Go
• 2016: AlphaGo (DeepMind) defeated 18-time world champion Lee
Sedol 4-1 in a five-game match.
• 2017: AlphaGo Master defeated world champion Ke Jie
• 2017: AlphaGo Zero (a more advanced version) surpassed all previous
versions

Reinforcement learning

 The algorithm can collect data interactively

Try the strategy and Data Improve the strategy


collect feedbacks
Training based on the
collection
feedbacks
Reinforcement learning

 Problem Data
 A state describes a situation
 An action allows you to switch between states
 A policy allows you to choose the action to be taken based on your
current state
 At the end of each action, a + or - reward is observed

 Objectives
 Guide an agent to define a policy: Improve the policy of choice of
action at time t+1
 Avoid failure situations
53

Challenges

 The ability to generalise a model


 Overfiting
 Underfitting

 Curse of dimensionality (Lots of features vs dataset size)


 Vanishing and exploding gradients (in Neural Networks
based models)
 Data not available
 Data augmentation (Reduced datasets)
 Imbalanced datasets
 etc. 54
Generalisation
 a major challenge of ML
• Ability to perform well on previously unseen outputs

 training error vs test / generalisation error

• training error: error on training input

• test / generalisation error: expected error on a new input

 ML training algorithm reduces training error, which is the task


of optimisation

 What differentiates ML from pure optimisation is that the test /


generalisation error needs to be low as well

Typical learning curve

Validation Loss

Training loss

Number of training steps


Overfitting
• A major problem for the learning techniques!

• One can find a hypothesis that makes a good prediction for


training data, but that does not generalise well for the rest
of the data.

• In the rest of the course, we will see methods to


mitigate the overfitting problem.

57

Vanishing and exploding gradients problem

 For Neural Networks based models


• Vanishing Gradients: Occur when the gradients of the loss
function with respect to the parameters become very small
during backpropagation. This prevents the weights from
updating effectively, slowing or halting learning, especially
in early layers of deep networks.
• Exploding Gradients: Occur when the gradients become
very large, causing unstable updates to the weights and
making training diverge.
Vanishing and exploding gradients problem

Data augmentation

What ?
 increase the size and diversity of a training dataset
 apply various transformations to the original data
 used when the original dataset is small or lacks diversity.
Why ?
 Prevents overfitting by exposing the model to more varied
data.
 Improves the model's ability to generalize to unseen data.
 Enhances performance in tasks like image classification,
object detection, natural language processing, etc. 60
Data augmentation

Common Techniques:
1. Image Data:
1. Rotation, flipping, cropping, scaling, and translation.
2. Color jittering (adjusting brightness, contrast, saturation).
3. Adding noise or blurring.
4. Random erasing or cutout.
2. Text Data:
1. Synonym replacement, random insertion, or deletion of words.
2. Back-translation (translating text to another language and back).
3. Shuffling sentences or phrases.
3. Audio Data:
1. Time stretching, pitch shifting, or adding background noise.
4. Tabular Data:
1. Adding noise to numerical features.
61
2. Synthetic minority oversampling techniques (e.g., SMOTE).

Data augmentation

 Examples for Image Data

62
Imbalanced datasets

 Skewed class distributions can lead to biased


models that favor the majority class.
 Common Techniques: Resampling Techniques
• Oversampling:
- Increase the number of instances in the minority
class.
- Examples: Random oversampling, SMOTE
(Synthetic Minority Oversampling Technique),
ADASYN.
• Undersampling:
- Reduce the number of instances in the majority
class.
- Examples: Random undersampling, Tomek links,
Cluster Centroids.
• Hybrid Approaches:
- Combine oversampling and undersampling for
balanced results.

Prerequisites

• Knowledge in numerical analysis: derivation calculation,


partial derivative, gradient, integral, etc.

• Knowledge of linear algebra: matrix, vector, norm, scalar


product, etc.

• Knowledge in probabilities & statistics

• Knowledge of programming

64
References
• A. Geron. Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow Concepts, Tools, and Techniques to
Build Intelligent Systems. O'Reilly Media Inc., 2019.
• C. Bishop. Pattern Recognition and Machine Learning.
Springer 2006.
• R. Duda, P. Storck and D. Hart. Pattern Classification.
Prentice Hall, 2002).
• ...

65

You might also like