Machine Learning and Web Scraping Lecture 01
Machine Learning and Web Scraping Lecture 01
Introduction to Machine
Learning
Definition
3
Learning
4
Data Mining
The process of sorting through large data sets to identify patterns and
relationships that can help solve business problems through data
analysis.
key part of data analytics overall and one of the core disciplines in data
science, which uses advanced analytics techniques to find useful
information in data sets.
• Retail: Market basket analysis, Customer relationship management
(CRM)
• Finance: Credit scoring, fraud detection
• Manufacturing: Optimization, troubleshooting
• Medicine: Medical diagnosis
• Telecommunications: Quality of service optimization
• Bioinformatics: Motifs, alignment
• Web mining: Search engines
• ...
5
Machine Learning…
6
Applications
• Association
• Supervised Learning
• Classification
• Regression
• Unsupervised Learning
• Reinforcement Learning
7
Learning Associations
• Basket analysis:
P (Y | X ) probability that somebody who buys X also buys Y where X
and Y are products/services.
8
Classification
9
Classification: Applications
10
Face Recognition
Test images
12
Regression Applications
α1
14
Unsupervised Learning
Unsupervised machine learning is mainly used to:
Cluster datasets on similarities between features or
segment data
Understand relationship between different data point such
as automated music recommendations
Perform initial data analysis
• Learning “what normally happens”
• No output
• Clustering: Grouping similar instances
• Example applications
• Customer segmentation in CRM
• Image compression: Color quantization
• Bioinformatics: Learning motifs
15
Reinforcement Learning
Reinforcement Learning is a feedback-based Machine learning technique in
which an agent learns to behave in an environment by performing the actions
and seeing the results of actions. For each good action, the agent gets positive
feedback, and for each bad action, the agent gets negative feedback or penalty.
In Reinforcement Learning, the agent learns automatically using feedbacks
without any labeled data, unlike supervised learning.
17
Resources: Journals
18
Resources: Conferences
19