0% found this document useful (0 votes)
89 views5 pages

Data Science Methodologies (Coursera)

The document discusses two data science methodologies: 1) The Foundational Methodology focuses on business understanding as the first step to define the problem and determine what data is needed. 2) CRISP-DM is a six step methodology used in data mining: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. It is a flexible and cyclical process where steps can be revisited.

Uploaded by

Muhammad Jefry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views5 pages

Data Science Methodologies (Coursera)

The document discusses two data science methodologies: 1) The Foundational Methodology focuses on business understanding as the first step to define the problem and determine what data is needed. 2) CRISP-DM is a six step methodology used in data mining: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. It is a flexible and cyclical process where steps can be revisited.

Uploaded by

Muhammad Jefry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Science Methodologies

FROM PROBLEM TO APPROACH

This course focuses on the Foundational Methodology for Data Science by John Rollins, which
was introduced in the previous video. However, it is not the only methodology that you will
encounter in data science. For example, in data mining, the Cross Industry Process for Data
Mining (CRISP-DM) methodology is widely used.
What is CRISP-DM?
The CRISP-DM methodology is a process aimed at increasing the use of data mining over a wide
variety of business applications and industries. The intent is to take case specific scenarios and
general behaviors to make them domain neutral. CRISP-DM is comprised of six steps with an
entity that has to implement in order to have a reasonable chance of success. The six steps are
shown in the following diagram:

Fig.1 CRISP-DM model, IBM Knowledge Center, CRISP-DM Help Overview


1. Business Understanding This stage is the most important because this is where the
intention of the project is outlined. Foundational Methodology and CRISP-DM are
aligned here. It requires communication and clarity. The difficulty here is that
stakeholders have different objectives, biases, and modalities of relating information.
They don’t all see the same things or in the same manner. Without clear, concise, and
complete perspective of what the project goals are resources will be needlessly
expended.
2. Data Understanding Data understanding relies on business understanding. Data is
collected at this stage of the process. The understanding of what the business wants and
needs will determine what data is collected, from what sources, and by what methods.
CRISP-DM combines the stages of Data Requirements, Data Collection, and Data
Understanding from the Foundational Methodology outline.
3. Data Preparation Once the data has been collected, it must be transformed into a
useable subset unless it is determined that more data is needed. Once a dataset is
chosen, it must then be checked for questionable, missing, or ambiguous cases. Data
Preparation is common to CRISP-DM and Foundational Methodology.
4. Modeling Once prepared for use, the data must be expressed through whatever
appropriate models, give meaningful insights, and hopefully new knowledge. This is the
purpose of data mining: to create knowledge information that has meaning and utility.
The use of models reveals patterns and structures within the data that provide insight
into the features of interest. Models are selected on a portion of the data and
adjustments are made if necessary. Model selection is an art and science. Both
Foundational Methodology and CRISP-DM are required for the subsequent stage.
5. Evaluation The selected model must be tested. This is usually done by having a pre-
selected test, set to run the trained model on. This will allow you to see the
effectiveness of the model on a set it sees as new. Results from this are used to
determine efficacy of the model and foreshadows its role in the next and final stage.
6. Deployment In the deployment step, the model is used on new data outside of the
scope of the dataset and by new stakeholders. The new interactions at this phase might
reveal the new variables and needs for the dataset and model. These new challenges
could initiate revision of either business needs and actions, or the model and data, or
both.
CRISP-DM is a highly flexible and cyclical model. Flexibility is required at each step along with
communication to keep the project on track. At any of the six stages, it may be necessary to
revisit an earlier stage and make changes. The key point of this process is that it’s cyclical;
therefore, even at the finish you are having another business understanding encounter to
discuss the viability after deployment. The journey continues.
Metodologi Data Science (Understanding Bussines)
Welcome to Data Science Methodology 101 From Problem to Approach Business
Understanding!
Has this ever happened to you?
You've been called into a meeting by your boss, who makes you aware of an important
task one with a very tight deadline that absolutely has to be met.
You both go back and forth to ensure that all aspects of the task have been considered
and the meeting ends with both of you confident that things are on track.
Later that afternoon, however, after you've spent some time examining the various issues
at play, you realize that you need to ask several additional questions in order to truly
accomplish the task.
Unfortunately, the boss won't be available again until tomorrow morning.
Now, with the tight deadline still ringing in your ears, you start feeling a sense of
uneasiness.
So, what do you do?
Do you risk moving forward or do you stop and seek clarification.
Data science methodology begins with spending the time to seek clarification, to attain
what can be referred to as a business understanding.
Having this understanding is placed at the beginning of the methodology because getting
clarity around the problem to be solved, allows you to determine which data will be used to
answer the core question.
Rollins suggests that having a clearly defined question is vital because it ultimately directs
the analytic approach that will be needed to address the question.
All too often, much effort is put into answering what people THINK is the question, and while
the methods used to address that question might be sound, they don't help to solve
the actual problem.
Establishing a clearly defined question starts with understanding the GOAL of the person
who is asking the question.
For example, if a business owner asks: "How can we reduce the costs of performing an
activity?"
We need to understand, is the goal to improve the efficiency of the activity?
Or is it to increase the businesses profitability?
Once the goal is clarified, the next piece of the puzzle is to figure out the objectives
that are in support of the goal.
By breaking down the objectives, structured discussions can take place where priorities
can be identified in a way that can lead to organizing and planning on how to tackle the
problem.
Depending on the problem, different stakeholders will need to be engaged in the discussion
to help determine requirements and clarify questions.
So now, let's look at the case study related to applying "Business Understanding"
In the case study, the question being asked is: What is the best way to allocate the limited
healthcare budget to maximize its use in providing quality care?
This question is one that became a hot topic for an American healthcare insurance provider.
As public funding for readmissions was decreasing, this insurance company was at risk of having
to make up for the cost difference,which could potentially increase rates for its customers.
Knowing that raising insurance rates was not going to be a popular move, the insurance
company sat down with the health care authorities in its region and brought in IBM data
scientists
to see how data science could be applied to the question at hand.
Before even starting to collect data, the goals and objectives needed to be defined.
After spending time to determine the goals and objectives, the team prioritized "patient
readmissions" as an effective area for review.
With the goals and objectives in mind, it was found that approximately 30% of individuals
who finish rehab treatment would be readmitted to a rehab center within one year; and that
50% would be readmitted within five years.
After reviewing some records, it was discovered that the patients with congestive heart failure
were at the top of the readmission list.
It was further determined that a decision-tree model could be applied to review this scenario,
to determine why this was occurring.
To gain the business understanding that would guide the analytics team in formulating and
performing their first project, the IBM Data scientists, proposed and delivered an on-site
workshop to kick things off.
The key business sponsors involvement throughout the project was critical, in that the sponsor:
Set overall direction
Remained engaged and provided guidance.
Ensured necessary support, where needed.
Finally, four business requirements were identified for whatever model would be built.
Namely:
Predicting readmission outcomes for those patients with Congestive Heart Failure
Predicting readmission risk.
Understanding the combination of events that led to the predicted outcome
Applying an easy-to-understand process to new patients, regarding their readmission
risk.
This ends the Business Understanding section of this course.
Thanks for watching!
Lesson Summary

In this lesson, you have learned:

 The need to understand and prioritize the business goal.


 The way stakeholder support influences a project.
 The importance of selecting the right model.
 When to use a predictive, descriptive, or classification model.

You might also like