0% found this document useful (0 votes)

84 views

2 - Business Problems and Data Science Solutions

Uploaded by

Quang Bùi Nhật

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views

2 - Business Problems and Data Science Solutions

Uploaded by

Quang Bùi Nhật

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Business Problems and Data

Science Solutions
Each data-driven business decision-making problem is unique,
comprising its own combination of goals, desires, constraints.

In collaboration with business stakeholders, data scientists decompose a

business problem into subtasks.

The solutions to the subtasks can then be composed to solve the overall
problem.
Business Problems and Data
Science Solutions
Some of these subtasks are unique to the particular business problem, but
others are common data mining tasks.

For example, our telecommunications churn problem is unique to

MegaTelCo which we saw in previous class.

However, a subtask that will likely be part of the solution to any churn
problem is to estimate from historical data the probability of a customer
terminating her contract.

This sub-task once you have solved can be applied to churn problems at
different companies in the same business or even across business domains.
Business Problems and Data
Science Solutions
A critical skill in data science is the ability to decompose a data-analytics
problem into pieces such that each piece matches a known task for
which tools are available.

Recognizing familiar problems and their solutions avoids wasting time

and resources reinventing the wheel.
Business Problems and Data
Science Solutions
There are a large number of data mining, machine learning algorithms.

These algorithms though perform a handful of tasks.

The 2 most common of these tasks are:

1) Classification

2) Regression
Business Problems and Data
Science Solutions
Tasks performed by datamining and machine learning algorithms:

1) Classification:
The goal here is to classify a sample (data point) into the most probable
class.
E.g for the churn problem we have been studying

“Among all the customers of MegaTelCo, which are likely to respond to a

given offer?” In this example the two classes could be called will respond
and will not respond.
Business Problems and Data
Science Solutions
A closely related task to classification is scoring or class probability
estimation.

A scoring model applied to a sample outputs the probability that that

individual belongs to each class.

In these models, the class which has the highest probability becomes the
predicted class.
Business Problems and Data
Science Solutions
2) Regression

Regression attempts to estimate or predict, for each individual, the

numerical value of some variable for that individual

E.g. What is the value of this house given its age, number of bedrooms,
number of bathrooms.
Business Problems and Data
Science Solutions
3) Similarity matching

Similarity matching attempts to identify similar individuals based on

data known about them. Similarity matching can be used directly to find
similar entities.

E.g. IBM is interested in finding companies similar to their best business

customers, in order to focus their sales force on the best opportunities.

Netflix, Amazon also use similarity matching to make recommendations.

Business Problems and Data
Science Solutions
4) Clustering

Clustering attempts to group individuals in a population together by

their similarity, but not driven by any specific purpose.

E.g. “Do our customers form natural groups or segments?”

Clustering is useful in preliminary domain exploration to see which

natural groups exist because these groups in turn may suggest other
data mining tasks or approaches.
Business Problems and Data
Science Solutions
5) Frequent itemset mining.

This is one of the very first data mining algorithms. Apriori algorithm is
one the earliest algorithms developed in this space.

E.G Walmart trying to figure out which items sell together.

The key here is to uncover patterns for which we have no human
comprehension.

For example, analyzing purchase records from a supermarket may uncover

that ground meat is purchased together with hot sauce much more
frequently than we might expect.
Business Problems and Data
Science Solutions
6) Profiling

Profiling is often used to establish behavioral norms for anomaly

detection applications such as fraud detection and monitoring for
intrusions to computer systems.

E.g. if we know what kind of purchases a person typically makes on a

credit card, we can determine whether a new charge on the card fits
that profile or not. We can use the degree of mismatch as a suspicion
score and issue an alarm if it is too high.
Business Problems and Data
Science Solutions
7) Link Prediction

Link prediction attempts to predict connections between data items, usually by

suggesting that a link should exist, and possibly also estimating the strength of the
link

E.G
For recommending movies to customers one can think of a graph between
customers and the movies they’ve watched or rated. Within the graph, we search
for links that do not exist between customers and movies, but that we predict
should exist and should be strong. These links form the basis for recommendations.
Business Problems and Data
Science Solutions
8) Data Reduction

Data reduction attempts to take a large set of data and replace it with a
smaller set of data that contains much of the important information in
the larger set.
A popular technique for data reduction is called “Principal component
Analysis” or PCA.
Data reduction usually involves loss of information. It is a tradeoff to
between reducing the dimensions so that the model trains faster.
Business Problems and Data
Science Solutions
9) Causal Modeling

Causal modeling attempts to help us understand what events or actions

actually influence others.

E.g. consider that we use predictive modeling to target advertisements

to consumers, and we observe that indeed the targeted consumers
purchase at a higher rate subsequent to having been targeted. Was this
because the advertisements influenced the consumers to purchase? Or
did the predictive models simply do a good job of identifying those
consumers who would have purchased anyway?
Supervised Versus
Unsupervised Methods
Supervised learning is when your data set comes with labels.

E.g.
Lets say you have file containing information of customers.

Each row in the file corresponds to one customer.

If in this file each customer is labeled as a good customer or a bad

customer and our objective is to build a model that predicts whether a
new customer is good or bad then this is a case of supervised learning
Supervised Versus
Unsupervised Methods
Unsupervised learning is when your data set does not have any labels.

E.g.
Do our customers naturally fall into different groups?” Here no specific
purpose or target has been specified for the grouping. When there is no
such target, the data mining problem is referred to as unsupervised.

Clustering, an unsupervised task, produces groupings based on

similarities, but there is no guarantee that these similarities are
meaningful or will be useful for any particular purpose.
Supervised Versus
Unsupervised Methods
Supervised tasks require different techniques than unsupervised tasks
do, and the results often are much more useful.

Supervised learning is more widely adopted that unsupervised learning

at the moment.

For supervised learning, acquiring data on the target often is a key data
science investment. The value for the target variable for an individual is
often called the individual’s label.

Getting labeled data for supervised learning will often incur an expense.
Supervised Versus
Unsupervised Methods
Supervised tasks
Classification, regression, and causal modeling generally are solved with
supervised methods.

Unsupervised tasks
Clustering, co-occurrence grouping, and profiling generally are
unsupervised

Similarity matching, link prediction, and data reduction could be either.

Supervised Versus
Unsupervised Methods
Two main subclasses of supervised data mining, classification and
regression, are distinguished by the type of target. Regression involves a
numeric target while classification involves a categorical (often binary)
target.

“Will this customer purchase service S1 if given incentive I?”

Type = Classification
Supervised Versus
Unsupervised Methods
Which service package (S1, S2, or none) will a customer likely purchase
if given incentive I?”

Type = classification into 3 classes S1, S2 and none

“How much will this customer spend in a month?”

Type = regression because we are going to predict a value

Data Mining \ Machine
Learning Project
SOFTWARE SKILLS VERSUS
ANALYTIC SKILLS
Coding is core skill for software engineers.

Is being a good programmer important for data scientists?

Absolutely, yes a data scientist needs to be comfortable writing code to

build prototype models.
In addition to building models using code, a data scientist also needs to
research and try new models, try new approaches to solve problems,
making assumptions to structure data.
Common analytic techniques
DATABASE QUERYING

A query is a specific request for a subset of data or for statistics about

data, formulated in a technical language and posed to a database system.

For example, if an analyst suspects that middle-aged men living in the

Northeast have some particularly interesting churning behavior, she
could compose a SQL query:

SELECT * FROM CUSTOMERS WHERE AGE > 45 and SEX='M' and

DOMICILE = 'NE'
Common analytic techniques
DATA WAREHOUSING

Data warehouses collect data from across the organization in a format that
enables quick access to historical information and also allows building of
analytical metrics from that data.

Building a data warehouse is a process which requires significant time and

investment.

A data warehouse generally feeds data into machine learning \ Data mining
algorithms.

We have a separate class for Data Warehousing in Sem 3.

Common analytic techniques
MACHINE LEARNING AND DATA MINING

The collection of methods for extracting (predictive) models from data,

now known as machine learning methods, were developed in several
fields contemporaneously, most notably Machine Learning, Applied
Statistics, and Pattern Recognition.

In Machine learning, the algorithm learns from the data it observes and
improves its performance.

We study machine learning in detail in Sem 4.

Common analytic techniques
MACHINE LEARNING AND DATA MINING

The field of Data Mining started with finding patterns within large data
sets (E.g. Apriori algorithm).

The algorithms used for data mining and machine learning are
sometimes the same.

Some people use these terms interchangeably.

Advanced Machine Learning Mastering Level Learning With Python
No ratings yet
Advanced Machine Learning Mastering Level Learning With Python
81 pages
20 - Chapter 2: Business Problems and Data Science Solutions
No ratings yet
20 - Chapter 2: Business Problems and Data Science Solutions
4 pages
M1.1 DS
No ratings yet
M1.1 DS
57 pages
Lecture 4
No ratings yet
Lecture 4
20 pages
Summary Business Analytics
No ratings yet
Summary Business Analytics
24 pages
DSA Unit1
No ratings yet
DSA Unit1
37 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
36 pages
2 & 3_Business Problems and Science Solution
No ratings yet
2 & 3_Business Problems and Science Solution
26 pages
Data Science S3mca
No ratings yet
Data Science S3mca
55 pages
DataMining_Chapter1
No ratings yet
DataMining_Chapter1
13 pages
Unit 2
No ratings yet
Unit 2
37 pages
Lecture 1 - Introduction To Data Science
No ratings yet
Lecture 1 - Introduction To Data Science
14 pages
The Elements of Statistical Learning Notes
No ratings yet
The Elements of Statistical Learning Notes
3 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
20 pages
Introduction Am
No ratings yet
Introduction Am
74 pages
Chapter 2 (1)
No ratings yet
Chapter 2 (1)
35 pages
Week 4 - Introduction to Data Mining and Data Mining Techniques (3)
No ratings yet
Week 4 - Introduction to Data Mining and Data Mining Techniques (3)
44 pages
Data Analytics PDF
0% (1)
Data Analytics PDF
6 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
26 pages
Chapter 6_Data science and k nearest neighbour model (PART B)
No ratings yet
Chapter 6_Data science and k nearest neighbour model (PART B)
5 pages
2 Buss Intel Analytics
No ratings yet
2 Buss Intel Analytics
43 pages
Lecture+Notes (Upgrad)
No ratings yet
Lecture+Notes (Upgrad)
5 pages
1 Stop Project1
No ratings yet
1 Stop Project1
27 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages
Data Mining
No ratings yet
Data Mining
23 pages
Reference Papers
No ratings yet
Reference Papers
7 pages
Data Warehouse and Mining Notes
No ratings yet
Data Warehouse and Mining Notes
12 pages
DMlecture1
No ratings yet
DMlecture1
39 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
Activity 1 PDF
No ratings yet
Activity 1 PDF
3 pages
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
TTDS Lectures
No ratings yet
TTDS Lectures
13 pages
WEEK 4-5-Exploring Data Science Methods, Models, And Application
No ratings yet
WEEK 4-5-Exploring Data Science Methods, Models, And Application
18 pages
1.1 Project Overview: Data Mining
No ratings yet
1.1 Project Overview: Data Mining
74 pages
Complete
No ratings yet
Complete
27 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
Data Mining and BI - Student Notes 2
No ratings yet
Data Mining and BI - Student Notes 2
40 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
8 Chapter Eight
No ratings yet
8 Chapter Eight
20 pages
Data Analytics Chapter -1
No ratings yet
Data Analytics Chapter -1
42 pages
PSK Unit 1 Merged
No ratings yet
PSK Unit 1 Merged
125 pages
Big Data Analytics Algorithm, Tools in Systematic Review
No ratings yet
Big Data Analytics Algorithm, Tools in Systematic Review
7 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Down 2
No ratings yet
Down 2
61 pages
Introduction To Data Mining Techniques: Dr. Rajni Jain
No ratings yet
Introduction To Data Mining Techniques: Dr. Rajni Jain
11 pages
Data Preprocessing For Business Intelligence
No ratings yet
Data Preprocessing For Business Intelligence
93 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Data Mining Models and Tasks
No ratings yet
Data Mining Models and Tasks
6 pages
Business Data Mining
No ratings yet
Business Data Mining
9 pages
Data Mining 4545
No ratings yet
Data Mining 4545
20 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
35 pages
Chapter-14 Data Science
No ratings yet
Chapter-14 Data Science
12 pages
Introd M
No ratings yet
Introd M
37 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
Data Mining Questions
100% (1)
Data Mining Questions
7 pages
Data Mining Real
No ratings yet
Data Mining Real
19 pages
Data Science
No ratings yet
Data Science
64 pages
ML & AI-Introduction To Data-Science Tools
No ratings yet
ML & AI-Introduction To Data-Science Tools
7 pages
Lecture 01 05.08.2024 AI-ML Introduction
No ratings yet
Lecture 01 05.08.2024 AI-ML Introduction
46 pages
Monash Data Science
No ratings yet
Monash Data Science
4 pages
Unit V-Ccs45-Ethics and Ai Notes
100% (1)
Unit V-Ccs45-Ethics and Ai Notes
31 pages
How to Speak Machine PDF
No ratings yet
How to Speak Machine PDF
30 pages
MBA Analytics For Finance 11
No ratings yet
MBA Analytics For Finance 11
12 pages
Data Scientist Interview Questions
No ratings yet
Data Scientist Interview Questions
2 pages
DWM Final
No ratings yet
DWM Final
11 pages
PG Data Analytics
No ratings yet
PG Data Analytics
4 pages
Linear_regression_final
No ratings yet
Linear_regression_final
160 pages
Data-Driven Modeling Scientific Computation Method
No ratings yet
Data-Driven Modeling Scientific Computation Method
4 pages
LNCS9714 PDF
No ratings yet
LNCS9714 PDF
564 pages
Data Science in E-Commerce - Report - Writing
No ratings yet
Data Science in E-Commerce - Report - Writing
18 pages
Cloudera Enterprise Whitepaper
No ratings yet
Cloudera Enterprise Whitepaper
10 pages
AI for Absolute Beginners by Oliver Theobald
No ratings yet
AI for Absolute Beginners by Oliver Theobald
209 pages
Java for Data Science 1st Edition Reese All Chapters Instant Download
100% (5)
Java for Data Science 1st Edition Reese All Chapters Instant Download
55 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
26 pages
Dsbda Unit 1
No ratings yet
Dsbda Unit 1
25 pages
The State of Data Engineering in India - 2024
No ratings yet
The State of Data Engineering in India - 2024
46 pages
CMO - Chief Marketing Officer 1
No ratings yet
CMO - Chief Marketing Officer 1
6 pages
12 2marks With Ans
No ratings yet
12 2marks With Ans
21 pages
Financial - Data Science Manager
No ratings yet
Financial - Data Science Manager
1 page
Getting Started With Data Literacy and Information As A Second Language
No ratings yet
Getting Started With Data Literacy and Information As A Second Language
26 pages
Information Technology Dissertation Topics
100% (2)
Information Technology Dissertation Topics
5 pages
Chap I
No ratings yet
Chap I
16 pages
B.tech CSE Artificial Intelligence Data Science
No ratings yet
B.tech CSE Artificial Intelligence Data Science
4 pages
CDSS - Day 1
No ratings yet
CDSS - Day 1
106 pages
Advanced Data Analytics
No ratings yet
Advanced Data Analytics
114 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
84 pages
B.Sc. in Applied Statistics With Computing: First Class Honor
No ratings yet
B.Sc. in Applied Statistics With Computing: First Class Honor
3 pages
New Ebook Guide To AI & Data Science
No ratings yet
New Ebook Guide To AI & Data Science
175 pages

2 - Business Problems and Data Science Solutions

Uploaded by

2 - Business Problems and Data Science Solutions

Uploaded by

Business Problems and Data

In collaboration with business stakeholders, data scientists decompose a

For example, our telecommunications churn problem is unique to

Recognizing familiar problems and their solutions avoids wasting time

These algorithms though perform a handful of tasks.

The 2 most common of these tasks are:

“Among all the customers of MegaTelCo, which are likely to respond to a

A scoring model applied to a sample outputs the probability that that

Regression attempts to estimate or predict, for each individual, the

Similarity matching attempts to identify similar individuals based on

E.g. IBM is interested in finding companies similar to their best business

Netflix, Amazon also use similarity matching to make recommendations.

Clustering attempts to group individuals in a population together by

E.g. “Do our customers form natural groups or segments?”

Clustering is useful in preliminary domain exploration to see which

E.G Walmart trying to figure out which items sell together.

For example, analyzing purchase records from a supermarket may uncover

Profiling is often used to establish behavioral norms for anomaly

E.g. if we know what kind of purchases a person typically makes on a

Link prediction attempts to predict connections between data items, usually by

Causal modeling attempts to help us understand what events or actions

E.g. consider that we use predictive modeling to target advertisements

Each row in the file corresponds to one customer.

If in this file each customer is labeled as a good customer or a bad

Clustering, an unsupervised task, produces groupings based on

Supervised learning is more widely adopted that unsupervised learning

Similarity matching, link prediction, and data reduction could be either.

“Will this customer purchase service S1 if given incentive I?”

Type = classification into 3 classes S1, S2 and none

“How much will this customer spend in a month?”

Type = regression because we are going to predict a value

Is being a good programmer important for data scientists?

Absolutely, yes a data scientist needs to be comfortable writing code to

A query is a specific request for a subset of data or for statistics about

For example, if an analyst suspects that middle-aged men living in the

SELECT * FROM CUSTOMERS WHERE AGE > 45 and SEX='M' and

Building a data warehouse is a process which requires significant time and

We have a separate class for Data Warehousing in Sem 3.

The collection of methods for extracting (predictive) models from data,

We study machine learning in detail in Sem 4.

Some people use these terms interchangeably.

You might also like