2 - Business Problems and Data Science Solutions
2 - Business Problems and Data Science Solutions
Science Solutions
Each data-driven business decision-making problem is unique,
comprising its own combination of goals, desires, constraints.
The solutions to the subtasks can then be composed to solve the overall
problem.
Business Problems and Data
Science Solutions
Some of these subtasks are unique to the particular business problem, but
others are common data mining tasks.
However, a subtask that will likely be part of the solution to any churn
problem is to estimate from historical data the probability of a customer
terminating her contract.
This sub-task once you have solved can be applied to churn problems at
different companies in the same business or even across business domains.
Business Problems and Data
Science Solutions
A critical skill in data science is the ability to decompose a data-analytics
problem into pieces such that each piece matches a known task for
which tools are available.
1) Classification
2) Regression
Business Problems and Data
Science Solutions
Tasks performed by datamining and machine learning algorithms:
1) Classification:
The goal here is to classify a sample (data point) into the most probable
class.
E.g for the churn problem we have been studying
In these models, the class which has the highest probability becomes the
predicted class.
Business Problems and Data
Science Solutions
2) Regression
E.g. What is the value of this house given its age, number of bedrooms,
number of bathrooms.
Business Problems and Data
Science Solutions
3) Similarity matching
This is one of the very first data mining algorithms. Apriori algorithm is
one the earliest algorithms developed in this space.
E.G
For recommending movies to customers one can think of a graph between
customers and the movies they’ve watched or rated. Within the graph, we search
for links that do not exist between customers and movies, but that we predict
should exist and should be strong. These links form the basis for recommendations.
Business Problems and Data
Science Solutions
8) Data Reduction
Data reduction attempts to take a large set of data and replace it with a
smaller set of data that contains much of the important information in
the larger set.
A popular technique for data reduction is called “Principal component
Analysis” or PCA.
Data reduction usually involves loss of information. It is a tradeoff to
between reducing the dimensions so that the model trains faster.
Business Problems and Data
Science Solutions
9) Causal Modeling
E.g.
Lets say you have file containing information of customers.
E.g.
Do our customers naturally fall into different groups?” Here no specific
purpose or target has been specified for the grouping. When there is no
such target, the data mining problem is referred to as unsupervised.
For supervised learning, acquiring data on the target often is a key data
science investment. The value for the target variable for an individual is
often called the individual’s label.
Getting labeled data for supervised learning will often incur an expense.
Supervised Versus
Unsupervised Methods
Supervised tasks
Classification, regression, and causal modeling generally are solved with
supervised methods.
Unsupervised tasks
Clustering, co-occurrence grouping, and profiling generally are
unsupervised
Data warehouses collect data from across the organization in a format that
enables quick access to historical information and also allows building of
analytical metrics from that data.
A data warehouse generally feeds data into machine learning \ Data mining
algorithms.
In Machine learning, the algorithm learns from the data it observes and
improves its performance.
The field of Data Mining started with finding patterns within large data
sets (E.g. Apriori algorithm).
The algorithms used for data mining and machine learning are
sometimes the same.