0% found this document useful (0 votes)
17 views

Week 5 CRISP-DM Process and Its Applications (PDF)

The document provides an introduction to data science workflows, focusing on KDD and CRISP-DM processes, which are essential for data-driven decision-making. It outlines the steps involved in these workflows and their applications across various industries such as marketing, healthcare, and finance. Additionally, it emphasizes the importance of knowledge management in preserving and utilizing insights derived from data.

Uploaded by

MeMee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Week 5 CRISP-DM Process and Its Applications (PDF)

The document provides an introduction to data science workflows, focusing on KDD and CRISP-DM processes, which are essential for data-driven decision-making. It outlines the steps involved in these workflows and their applications across various industries such as marketing, healthcare, and finance. Additionally, it emphasizes the importance of knowledge management in preserving and utilizing insights derived from data.

Uploaded by

MeMee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Introduction to Data Science

CRISP-DM PROCESS
INTRODUCTION TOAND
ITS APPLICATION
DATA SCIENCE Professor. O.O Obe

Professor. O.O Obe


Doctor of Philosophy, Space Physics &
Telecommunications
American University of Nigeria
Introduction to Data Science

Objectives
At the end of this lesson you should be able to:
● Identify various data science workflows.
● Describe dierent data science workflows.
● Identify the dierent applications of each data science
workflows
Introduction to Data Science

Data Science Workflows: Overview


Data Science Workflows form the backbone of data-driven decision-making. They
encompass a series of structured approaches designed to tackle intricate problems
through the utilisation of data. By guiding professionals through the steps of data
collection, cleaning, analysis, and interpretation, these workflows are instrumental
in extracting meaningful insights from complex datasets. In this discussion, we will
explore three key pillars of data science workflows: KDD (Knowledge Discovery in
Databases), CRISP-DM (Cross-Industry Standard Process for Data Mining), and the
pivotal role of Knowledge Management in driving successful data science
endeavors.
Introduction to Data Science

Overview of KDD Process


Introduction to Data Science

Overview of KDD Process


The KDD process is a systematic and comprehensive strategy tailored to unearth valuable and
comprehensible information from voluminous datasets. It involves a sequence of meticulously
planned steps aimed at converting raw data into actionable knowledge.
Introduction to Data Science

KDD Process Steps


It involves several meticulously planned steps:
1. Selection: This initial phase involves the critical task of
cherry-picking pertinent data from various sources,
including databases and other repositories.

2. Preprocessing: Data undergoes a rigorous cleaning and


preparation process to ensure accuracy and relevance
before analysis begins.

3. Transformation: Data is converted into a format


conducive to mining, enhancing its suitability for
subsequent analytical processes.
Introduction to Data Science

Overview of KDD Process


4. Data Mining: We apply a spectrum of algorithms, uncovering paerns,
trends, and anomalies, providing critical insights into the dataset.

5. Interpretation/Evaluation: This pivotal phase focuses on comprehending


and evaluating the results generated from the data mining process,
determining their significance and applicability.
Introduction to Data Science

KDD Process: Applications


● Marketing
Analyzing customer behavior to refine segmentation and
targeting strategies.

● Healthcare
Predicting disease outbreaks, aiding in patient diagnosis,
and formulating eective treatment plans.

● Finance
Detecting fraud, assessing risks, and optimizing
investment strategies.
Introduction to Data Science

CRISP-DM Process -
Cross-Industry Standard
Process for Data Mining
Introduction to Data Science

Overview of CRISP-DM Process


CRISP-DM is a widely accepted industry-standard methodology
for executing data mining projects. It provides a structured
framework encompassing all stages of a data mining endeavour.
Introduction to Data Science

Phases of CRISP-DM Process


It involves several key phases:
● Business Understanding
Gaining a comprehensive understanding of project objectives and
requirements sets the foundation for successful execution.

● Data Understanding
Thoroughly exploring and gaining insights into the structure and quality of
the data is paramount for meaningful analysis.

● Data Preparation
Cleaning, transforming, and selecting data for modelling ensures that the
dataset is refined and optimised for subsequent processes.
Introduction to Data Science

Mid-lesson questions
1. What forms the backbone of data-driven decision-making and involves a series of
structured approaches for tackling intricate problems through the utilization of data?

A Data Integration

B Data Exploration

C Data Science workflows

D Data Visualisation
Introduction to Data Science

Mid-lesson questions
2. What is CRISP-DM?

A A programming language

B A methodology for data mining projects

C A software tool

D An encryption algorithm
Introduction to Data Science

Phases of CRISP-DM Process


● Modelling
Building and evaluating predictive or descriptive models
forms the core of this phase.

● Evaluation
Assessing the model’s performance and determining its
impact on the business helps in refining and optimising
the model further.

● Deployment
Implementing the model into production environments
ensures its practical application in real-world scenarios.
Introduction to Data Science

CRISP-DM Process: Applications


● E-commerce
Implementing recommender systems and
predicting customer churn rates.

● Manufacturing
Enabling predictive maintenance and ensuring
quality control in production processes.

● Telecommunications
Optimizing network operations and eectively
segmenting customer demographics.
Introduction to Data Science

Knowledge Management
Introduction to Data Science

Knowledge management: Overview


Knowledge Management involves the systematic
capture, organisation, and application of collective
knowledge within an organisation. In the context of
data science workflows, it plays an indispensable role
in preserving and disseminating insights derived from
data.
Introduction to Data Science

Components of Knowledge Management


It comprises five key components:

1. Data Repositories
Centralized storage for both structured and unstructured
data ensures easy access and retrieval.

2. Knowledge Bases
An organized collection of insights, models, and best
practices forms the foundation for informed
decision-making.

3. Collaboration Platforms
Tools that facilitate the sharing of ideas, experiences, and
knowledge promote a collaborative environment.
Introduction to Data Science

Components of Knowledge Management


4. Documentation
Recording processes, methodologies, and
findings ensures that valuable insights are
preserved for future reference.

5. Training and Development


Continuous learning and upskilling of team
members ensure they are equipped to handle
evolving challenges and opportunities.
Introduction to Data Science

Benefits of Knowledge Management


The benefits of Knowledge Management are immense. It
enables eicient decision-making, reduces redundancy, and
fosters an environment for continuous improvement.
Introduction to Data Science

Conclusion
In conclusion, Data Science Workflows, including KDD,
CRISP-DM, and Knowledge Management, are essential
frameworks for extracting maximum value from data. By
implementing these methodologies eectively, we’re not
only discovering insights but also preserving and
leveraging them for future endeavours. This drives
innovation and ensures informed decision-making.
Introduction to Data Science

SUMMARY
● Data Science Workflows form the backbone of data-driven
decision-making
● KDD process steps are: selection, pre-processing,
transformation, data mining, interpretation/evaluation,
● CRISP-DM is a widely accepted industry-standard
methodology for executing data mining projects
● Knowledge Management involves the systematic capture,
organisation, and application of collective knowledge
within an organisation
Introduction to Data Science

REFERENCES
● Provost, F., & Fawce, T. (2013). Data science for
business: What you need to know about data mining
and data-analytic thinking. O'Reilly Media.
● Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996).
From data mining to knowledge discovery in
databases. AI magazine, 17(3), 37-54.
● Dalkir, K. (2011). Knowledge management in theory
and practice. MIT press.
Introduction to Data Science

Thank
You

You might also like