0% found this document useful (0 votes)

14 views

Chapter 02 DataAnalyticsLifecycle

Chapter_02_DataAnalyticsLifecycle

Uploaded by

datnthe171250

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Chapter 02 DataAnalyticsLifecycle

Chapter_02_DataAnalyticsLifecycle

Uploaded by

datnthe171250

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

DATA ANALYTICS

LIFECYCLE
Author : FU
Date : Mar-2022
Objectives

After studying this chapter, the student should

be able to:
 Understand data analytics lifecycle
 Understand key roles for a successful analytics project
 Understand what analytics team should lean and what
needs for data discovery?
Content

1. Data Analytics Lifecycle Overview

2. Phase 1: Data Discovery
3. Phase 2: Data Preparation
4. Phase 3: Model Planning
5. Phase 4: Model Building
6. Phase 5: Communicate Results
7. Phase 6: Operationalize
8. Case Study: Global Innovation Network and
Analysis (GINA
1.1 Key Roles for a Successful Analytics Project

 Business User
 Project Sponsor
 Project Manager
 Business Intelligence Analyst
 Database Administrator
 Data Engineer
 Data Scientist
FIGURE 2-1 Key roles for a successful analytics project
1.2 Process overview (1)

 Data Analytics Lifecycle

designed for Big Data
problems and data
science projects has six
phases
 Project work can occur in
several phases at once.
For most phases in the
lifecycle, the movement
can be either forward or
backward
FIGURE 2-2 Overview of Data Analytics Lifecycle
1.2 Process overview (2)

 Phase 1—Discovery
o Learns business domain, including relevant history such as
whether the organization or business unit has attempted similar
projects in the past
o Assesses resources available to support the project: people,
technology, time, and data
o Important activities : framing the business problem as an
analytics challenge and formulating initial hypotheses
 Phase 2—Data preparation
o Presence of an analytic sandbox
o Execute extract, load, and transform (ELT) or extract, transform
and load (ETL) to get data into the sandbox. Data transformed in
the ETLT process so the team can work with it and analyze it.
1.2 Process overview (3)

 Phase 3—Model planning

o Determines methods, techniques, and workflow it intends to
follow for the subsequent model building phase.
o Explores data to learn about the relationships between variables
and subsequently selects key variables and the most suitable
models
 Phase 4—Model building
o Develops datasets for testing, training, and production purposes.
o Builds and executes models based on the work done in the
model planning phase
o Considers whether its existing tools will suffice for running
models
1.2 Process overview (4)

 Phase 5—Communicate results

o Collaboration with major stakeholders, determines if the
results of the project are a success or a failure based on the
criteria developed in Phase 1
o Identify key findings, quantify the business value, and
develop a narrative to summarize and convey findings to
stakeholders.
 Phase 6—Operationalize
o Delivers final reports, briefings, code, and technical
documents
o Run a pilot project to implement the models in a production
environment
2. Phase 1: Discovery

 Learning the Business Domain

 Resources (technology, tools, systems, data, and
people)
 Framing the Problem
 Identifying Key Stakeholders
 Interviewing the Analytics Sponsor
 Developing Initial Hypotheses
 Identifying Potential Data Sources
2.1 Learning the Business Domain

 Learn and investigate the problem, develop context and

understanding, and learn about the data sources needed
and available for the project
 Formulates initial hypotheses that can later be tested with
data
 Have deep computational and quantitative knowledge
broadly applied across many disciplines
o Deep knowledge of the methods, techniques, and ways for
applying heuristics to a variety of business and conceptual
problems
o Have deep knowledge of a domain area, coupled with
quantitative expertise
2.2 Resources

 Assess resources available to support the project: technology,

tools, systems, data, and people
 Available tools and technology the team will be using and the
types of systems needed for later phases to operationalize the
models
 What types of skills and roles will be needed for the recipients of
the model being developed? influence techniques the team
selects and the kind of implementation the team chooses to
pursue in subsequent phases of the Data Analytics Lifecycle
 Computing resources: types of data available, collect additional
data, purchase it from outside sources, or transform existing
data
2.3 Framing the Problem

 Framing is the process of stating the analytics problem to be solved.

 A best practice is to write down the problem statement and share it with
the key stakeholders
 Identify main objectives of the project, identify what needs to be achieved
in business terms, and identify what needs to be done to meet the needs.
Need to consider the objectives and the success criteria for the project
 What is the team attempting to achieve by doing the project, and what will
be considered “good enough” as an outcome of the project?
 Need to document and share with the project team and key stakeholders
 The best practice is to share the statement of goals and success criteria
with the team and confirm alignment with the project sponsor’s
expectations
2.4 Identifying Key Stakeholders

 Important step is to identify the key stakeholders and

their interests in the project
 Identify the success criteria, key risks, and stakeholders,
which should include anyone who will benefit from the
project or will be significantly impacted by the project
 When interviewing stakeholders, learn about the domain
area and any relevant history from similar analytics
projects.
 Critical to articulate the pain points as clearly as possible
to address them and be aware of areas to pursue or avoid
as the team gets further into the analytical process
2.5 Interviewing the Analytics Sponsor (1)

 When interviewing the main stakeholders, the team

needs to take time to thoroughly interview the project
sponsor, funding the project or providing the high-
level requirements.
 It is critical to thoroughly understand the sponsor’s
perspective to guide the team in getting started on
the project.
2.5 Interviewing the Analytics Sponsor (2)

 Some tips for interviewing project sponsors:

o Prepare for the interview; draft questions, and review with colleagues.
o Use open-ended questions; avoid asking leading questions.
o Probe for details and pose follow-up questions.
o Avoid filling every silence in the conversation; give the other person time to
think.
o Let the sponsors express their ideas and ask clarifying questions, such as “Why?
Is that correct? Is this idea on target? Is there anything else?”
o Use active listening techniques; repeat back what was heard to make sure the
team heard it correctly, or reframe what was said.
o Try to avoid expressing the team’s opinions, which can introduce bias; instead,
focus on listening.
o Be mindful of the body language of the interviewers and stakeholders; use eye
contact where appropriate, and be attentive.
o Minimize distractions.
o Document what the team heard, and review it with the sponsors.
2.5 Interviewing the Analytics Sponsor (3)

 Common questions that are helpful to ask during the

discovery phase when interviewing the project sponsor
o What business problem is the team trying to solve?
o What is the desired outcome of the project?
o What data sources are available?
o What industry issues may impact the analysis?
o What timelines need to be considered?
o Who could provide insight into the project?
o Who has final decision-making authority on the project?
o How will the focus and scope of the problem change if the
following dimensions change (Time, People, Risk, Resources,
Size and Attribute of data:
2.6 Developing Initial Hypotheses

 Developing a set of Initial Hypotheses (IHs) is a key facet of the

discovery phase, involves forming ideas that the team can test with data
 it is best to come up with a few primary hypotheses to test and then be
creative about developing several more
 These IHs form the basis of the analytical tests the team will use in later
phases and serve as the foundation for the findings in Phase 5
 Can compare its answers with the outcome of an experiment or test to
generate additional possible solutions to problems. As a result, the team
will have a much richer set of observations to choose from and more
choices for agreeing upon the most impactful conclusions from a project
 Another part of this process involves gathering and assessing hypotheses
from stakeholders and domain experts who may have their own
perspective on what the problem is, what the solution should be, and how
to arrive at a solution
2.7 Identifying Potential Data Sources

 Five main activities during this step of the discovery

phase:
o Identify data sources
o Capture aggregate data sources
o Review the raw data
o Evaluate the data structures and tools needed
o Scope the sort of data infrastructure needed for this type of
problem
3. Phase 2: Data Preparation

 Preparing the Analytic Sandbox

 Performing ETLT
 Learning About the Data
 Data Conditioning
 Survey and Visualize
 Common Tools for the Data Preparation Phase
4. Phase 3: Model Planning

 Data Exploration and Variable Selection

 Model Selection
 Common Tools for the Model Planning Phase
4.1 Data Exploration and Variable Selection

 To understand the relationships among the variables to inform selection of

the variables and methods and
 To understand the problem domain.
 Common ways by using tools to perform data visualizations
 Stakeholders and subject matter experts have instincts and hunches about
what the data science team should be considering and analyzing, good grasp
of the problem and domain, but may not be aware of the subtleties within the
data or the model needed to accept or reject a hypothesis
 To approach problems with an unbiased mind-set and be ready to question all
assumptions
 To question the incoming assumptions and test initial ideas of the project
sponsors and stakeholders
 Depending on objectives, need to consider an alternate method, reduce the
number of data inputs, or transform the inputs to allow the team to use the
best method for a given business problem
4.2 Model Selection

 Main goal is to choose an analytical technique, or a short list

of candidate techniques, based on the end goal of the
project
 A model simply refers to an abstraction from reality
 Observes events happening in real data to construct models
that emulate this behavior with a set of rules and conditions.
 In data mining & machine learning, rules and conditions are
grouped into several general sets of techniques, such as
classification, association rules, and clustering
 To identify and document the modeling assumptions it is
making as it chooses and constructs preliminary models.
4.3 Common Tools for the Model Planning Phase

 R has a complete set of modeling capabilities and

provides a good environment for building interpretive
models with high-quality code.
 SQL Analysis services can perform in-database
analytics of common data mining functions, involved
aggregations, and basic predictive models
 SAS/ACCESS provides integration between SAS and
the analytics sandbox via multiple data connectors
such as OBDC, JDBC, and OLE DB
5. Phase 4: Model Building (1)

 Develop datasets for training, testing,

and production purposes
 Analytical model is developed and fit
on the training data and evaluated
(scored) against the test data
 Model planning and model building
can overlap quite a bit, and in practice
one can iterate back and forth
between the two phases for a while
before settling on a final model.
 Execute the models defined in Phase
3.
FIGURE 2-6 Model building phase
5. Phase 4: Model Building (2)

 Questions to consider include these:

o Does the model appear valid and accurate on the test data?
o Does the model output/behavior make sense to the domain
experts? That is, does it appear as if the model is giving answers
that make sense in this context?
o Do the parameter values of the fitted model make sense in the
context of the domain?
o Is the model sufficiently accurate to meet the goal?
o Does the model avoid intolerable mistakes?
o Are more data or more inputs needed? Do any of the inputs need
to be transformed or eliminated?
o Will the kind of model chosen support the runtime requirements?
o Is a different form of the model required to address the business
problem? If so, go back to the model planning phase and revise
the modeling approach.
5.1 Common Tools for the Model Building Phase

 Commercial Tools
o SAS Enterprise Miner
o SPSS Modeler
o Matlab
o Alpine Miner
o STATISTICA and Mathematica
 Free or Open Source tools
o R and PL/R
o Octave
o WEKA
o Python
o SQL in-database implementations, such as MADlib
6. Phase 5: Communicate Results

 Considers how best to articulate the

findings and outcomes to the various
team members and stakeholders, taking
into account caveats, assumptions, and
any limitations of the results?
 Determine if it succeeded or failed in its
objectives
 Perform very robust analysis and are
searching for ways to show results, even
when results may not be there
 Have determined which model or models
address the analytical challenge in the
most appropriate way
 Have ideas of some of the findings as a
result of the project
7. Phase 6: Operationalize (1)

 Communicates benefits of the project more broadly and

sets up a pilot project to deploy the work in a controlled
way before broadening the work to a full enterprise or
ecosystem of users
 Approach deploying the new analytical methods or
models in a production environment
 Learn by undertaking a small scope, pilot deployment
before a wide-scale rollout to learn about the
performance and related constraints of the model in a
production environment on a small scale and make
adjustments before a full deployment.
7. Phase 6: Operationalize (2)

FIGURE 2-9 Key

outputs from a
successful analytics
project
8. Case Study: Global Innovation Network
and Analysis (GINA)

 EMC’s Global Innovation Network and Analytics (GINA) team is a group of

senior technologists located in centers of excellence (COEs) around the world.
 Team’s charter
o To engage employees across global COEs to drive innovation, research, and university
partnerships. In 2012, a newly hired director wanted to improve these activities and
provide a mechanism to track and analyze the related information.
o To create more robust mechanisms for capturing the results of its informal
conversations with other thought leaders within EMC, in academia, or in other
organizations, which could later be mined for insights
 Provide a means to share ideas globally and increase knowledge sharing
among GINA members who may be separated geographically. It planned to
create a data repository containing both structured and unstructured data to
accomplish three main goals.
o Store formal and informal data.
o Track research from global technologists.
o Mine the data for patterns and insights to improve the team’s operations and strategy.
8.1. Phase 1: Discovery (1)

 Identifying data sources

o Although GINA was a group of technologists skilled in many different aspects
of engineering, it had some data and ideas about what it wanted to explore
but lacked a formal team that could perform these analytics.
o After consulting with various experts including Tom Davenport, a noted expert
in analytics at Babson College, and Peter Gloor, an expert in collective
intelligence and creator of CoIN (Collaborative Innovation Networks) at MIT,
the team decided to crowdsource the work by seeking volunteers within EMC.
 Various roles on the working team were fulfilled.
o Business User, Project Sponsor, Project Manager: Vice President from Office of
the CTO
o Business Intelligence Analyst: Representatives from IT
o Business Intelligence Analyst: Representatives from IT
o Data Scientist: Distinguished Engineer, who also developed the social graphs
shown in the GINA case study
8.1. Phase 1: Discovery (2)

 Two main categories of data

o First category represented five years of idea submissions from EMC’s
internal innovation contests, known as the Innovation Roadmap
(formerly called the Innovation Showcase). Data is a mix of structured
data, such as idea counts, submission dates, inventor names, and
unstructured content, such as the textual descriptions of the ideas
themselves.
o Second category of data encompassed minutes and notes representing
innovation and research activity from around the world. This also
represented a mix of structured and unstructured data.
 Structured data included attributes such as dates, names, and geographic
locations.
 Unstructured documents contained the “who, what, when, and where”
information that represents rich data about knowledge growth and transfer within
the company. This type of information is often stored in business silos that have
little to no visibility across disparate research teams.
8.1. Phase 1: Discovery (3)

 10 main IHs that the GINA team developed were as follows:

o IH1: Innovation activity in different geographic regions can be
mapped to corporate strategic directions
o IH2: The length of time it takes to deliver ideas decreases when
global knowledge transfer occurs as part of the idea delivery
process.
o IH3: Innovators who participate in global knowledge transfer
deliver ideas more quickly than those who do not.
o IH4: An idea submission can be analyzed and evaluated for the
likelihood of receiving funding.
o IH5: Knowledge discovery and growth for a particular topic can be
measured and compared across geographic regions.
o IH6: Knowledge transfer activity can identify research-specific
boundary spanners in disparate regions.
8.1. Phase 1: Discovery (4)

 10 main IHs that the GINA team developed were as follows:

o IH7: Strategic corporate themes can be mapped to geographic regions.
o IH8: Frequent knowledge expansion and transfer events reduce the
time it takes to generate a corporate asset from an idea.
o IH9: Lineage maps can reveal when knowledge expansion and transfer
did not (or has not) resulted in a corporate asset.
o IH10: Emerging research topics can be classified and mapped to
specific ideators, innovators, boundary spanners, and assets.
 The GINA (IHs) can be grouped into two categories:
o Descriptive analytics of what is currently happening to spark further
creativity, collaboration, and asset generation
o Predictive analytics to advise executive management of where it
should be investing in the future
8.2 Phase 2: Data Preparation

 Team partnered with its IT department to set up a new analytics sandbox to

store and experiment on the data. During the data exploration exercise, the
data scientists and data engineers began to notice that certain data
needed conditioning and normalization. In addition, the team realized that
several missing datasets were critical to testing some of the analytic
hypotheses.
 As the team explored the data, it quickly realized that if it did not have data
of sufficient quality or could not get good quality data, it would not be able
to perform the subsequent steps in the lifecycle process. As a result, it was
important to determine what level of data quality and cleanliness was
sufficient for the project being undertaken. In the case of the GINA, Team
discovered that many of the names of the researchers and people
interacting with the universities were misspelled or had leading and trailing
spaces in the datastore. Seemingly small problems such as these in the
data had to be addressed in this phase to enable better analysis and data
aggregation in subsequent phases.
8.3 Phase 3: Model Planning (1)

 For much of the dataset seemed feasible to use social network

analysis techniques to look at the networks of innovators within EMC.
 In other cases, it was difficult to come up with appropriate ways to
test hypotheses due to the lack of data. In one case (IH9), the team
made a decision to initiate a longitudinal study to begin tracking data
points over time regarding people developing new intellectual
property. This data collection would enable the team to test the
following two ideas in the future:
o IH8: Frequent knowledge expansion and transfer events reduce the amount of
time it takes to generate a corporate asset from an idea.
o IH9: Lineage maps can reveal when knowledge expansion and transfer did
not (or has not) result(ed) in a corporate asset.
 Team needed to establish goal criteria for the study, e.g. the end goal
of a successful idea that had traversed the entire journey
8.3 Phase 3: Model Planning (2)

 Parameters related to scope of the study included the

following considerations:
o Identify the right milestones to achieve this goal.
o Trace how people move ideas from each milestone toward
the goal.
o Once this is done, trace ideas that die, and trace others that
reach the goal. Compare the journeys of ideas that make it
and those that do not.
o Compare the times and the outcomes using a few different
methods (depending on how the data is collected and
assembled). These could be as simple as t-tests or perhaps
involve different types of classification algorithms.
8.4 Phase 4: Model Building (1)

 In Phase 4, the GINA team employed several

analytical methods
o Natural Language Processing (NLP) techniques on the
textual descriptions of the Innovation Roadmap ideas
o Conducted social network analysis using R and RStudio, and
then he developed social graphs and visualizations of the
network of communications related to innovation using R’s
ggplot2 package. Examples of this work are shown in
Figures 2-10 and 2-11.
8.4 Phase 4: Model Building (2)

FIGURE 2-10 Social graph

[27] visualization of idea
submitters and finalists

FIGURE 2-11 Social graph visualization of top innovation influenc

8.5 Phase 5: Communicate Results (1)

 Team found several ways to cull results of the analysis and identify
the most impactful and relevant findings. This project was considered
successful in identifying boundary spanners and hidden innovators.
 As a result, CTO office launched longitudinal studies to begin data
collection efforts and track innovation results over longer periods of
time. The GINA project promoted knowledge sharing related to
innovation and researchers spanning multiple areas within the
company and outside of it.
 GINA also enabled EMC to cultivate additional intellectual property
that led to additional research topics and provided opportunities to
forge relationships with universities for joint academic research in the
fields of Data Science and Big Data. In addition, the project was
accomplished with a limited budget, leveraging a volunteer force of
highly skilled and distinguished engineers and data scientists.
8.5 Phase 5: Communicate Results (2)

 One of the key findings from the project is that there was a disproportionately
high density of innovators in Cork, Ireland. Each year, EMC hosts an innovation
contest, open to employees to submit innovation ideas that would drive new
value for the company. When looking at the data in 2011, 15% of the finalists and
15% of the winners were from Ireland.
 These are unusually high numbers, given the relative size of the Cork COE
compared to other larger centers in other parts of the world. After further
research, it was learned that the COE in Cork, Ireland had received focused
training in innovation from an external consultant, which was proving effective.
The Cork COE came up with more innovation ideas, and better ones, than it had
in the past, and it was making larger contributions to innovation at EMC. It would
have been difficult, if not impossible, to identify this cluster of innovators through
traditional methods or even anecdotal, word-of-mouth feedback.
 Applying social network analysis enabled the team to find a pocket of people
within EMC who were making disproportionately strong contributions. These
findings were shared internally through presentations and conferences and
promoted through social media and blogs.
8.6 Phase 6: Operationalize (1)

 Key findings from the project include these:

o The CTO office and GINA need more data in the future, including a
marketing initiative to convince people to inform the global
community on their innovation/research activities.
o Some of the data is sensitive, and the team needs to consider
security and privacy related to the data, such as who can run the
models and see the results.
o In addition to running models, a parallel initiative needs to be
created to improve basic Business Intelligence activities, such as
dashboards, reporting, and queries on research activities
worldwide.
o A mechanism is needed to continually reevaluate the model after
deployment. Assessing the benefits is one of the main goals of this
stage, as is defining a process to retrain the model as needed.
8.6 Phase 6: Operationalize (2)

 Table 2-3 outlines

an analytics plan
for the GINA case
study example.
Summary

 Data Analytics Lifecycle, which is an approach to

managing and executing analytical projects. This approach
describes the process in six phases.
o Discovery
o Data preparation
o Model planning
o Model building
o Communicate results
o Operationalize
 Through these steps, data science teams can Identify
problems and perform rigorous investigation of the
datasets needed for in-depth analysis

Design Economics and Cost Plannaing
No ratings yet
Design Economics and Cost Plannaing
20 pages
Coursera - Data Analytics - Course 1
No ratings yet
Coursera - Data Analytics - Course 1
8 pages
Research Plan Master Template
No ratings yet
Research Plan Master Template
3 pages
Chapter 2: Bringing Systems Into Being
No ratings yet
Chapter 2: Bringing Systems Into Being
39 pages
95-0140 EP HSE Strategy and Policy Implementation Guide
100% (1)
95-0140 EP HSE Strategy and Policy Implementation Guide
90 pages
Lifecycle Assessment of Paint PDF
100% (1)
Lifecycle Assessment of Paint PDF
8 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
Module 1 (3)
No ratings yet
Module 1 (3)
50 pages
BSR-Data Science
No ratings yet
BSR-Data Science
308 pages
Data Science: Lesson 4
No ratings yet
Data Science: Lesson 4
8 pages
Module I - 1
No ratings yet
Module I - 1
23 pages
LIFE CYCLE
No ratings yet
LIFE CYCLE
35 pages
MODULE 1
No ratings yet
MODULE 1
40 pages
Logical Framework Analysis/Approach (LFA)
No ratings yet
Logical Framework Analysis/Approach (LFA)
61 pages
Overview of Data Analytics Lifecycle: Unit 2
No ratings yet
Overview of Data Analytics Lifecycle: Unit 2
100 pages
Ch1-Introduction to data analytics & LifeCycle
No ratings yet
Ch1-Introduction to data analytics & LifeCycle
25 pages
Ch1-Introduction to Data Analytics & LifeCycle
No ratings yet
Ch1-Introduction to Data Analytics & LifeCycle
26 pages
INS3063 - Final Project Description - Rubik
No ratings yet
INS3063 - Final Project Description - Rubik
6 pages
Research Process
No ratings yet
Research Process
9 pages
Lec7 & 8 - System Planning
No ratings yet
Lec7 & 8 - System Planning
31 pages
Main Documentation Chapters
No ratings yet
Main Documentation Chapters
11 pages
Research Project
No ratings yet
Research Project
8 pages
W3 - DA Life Cycle
No ratings yet
W3 - DA Life Cycle
49 pages
SESSION IV For Pres.-aauSC-open-PME - PPTs. - Data Collection & Analysis For Project Eval.
No ratings yet
SESSION IV For Pres.-aauSC-open-PME - PPTs. - Data Collection & Analysis For Project Eval.
22 pages
2 Data Analytics
No ratings yet
2 Data Analytics
49 pages
ATW115 Slides Chp02
No ratings yet
ATW115 Slides Chp02
52 pages
Lecture 2
No ratings yet
Lecture 2
39 pages
Descriptive, Normative, and Cause-Effect Evaluation Designs: Ipdet Handbook
No ratings yet
Descriptive, Normative, and Cause-Effect Evaluation Designs: Ipdet Handbook
47 pages
Log Frame Analysis
100% (3)
Log Frame Analysis
61 pages
Requirements Elicitation Techniques
No ratings yet
Requirements Elicitation Techniques
7 pages
Fact Finding Tech
100% (1)
Fact Finding Tech
23 pages
Lecture 5 [Autosaved]
No ratings yet
Lecture 5 [Autosaved]
20 pages
2009 Assessment Task 4 - Group Project
100% (3)
2009 Assessment Task 4 - Group Project
10 pages
Unit 2 DS
No ratings yet
Unit 2 DS
116 pages
20014156-108 Ass1 Se
No ratings yet
20014156-108 Ass1 Se
14 pages
IDS Unit - 5
No ratings yet
IDS Unit - 5
6 pages
Unit2-Data Science
No ratings yet
Unit2-Data Science
20 pages
DAFD UNit-2
No ratings yet
DAFD UNit-2
16 pages
What Is Data Anaysis
No ratings yet
What Is Data Anaysis
8 pages
ITECH7405-Project 2 Enterprise Systems, Semester 2 2016
No ratings yet
ITECH7405-Project 2 Enterprise Systems, Semester 2 2016
2 pages
ANSWERS 2 Research Methodology Year III Semester II
No ratings yet
ANSWERS 2 Research Methodology Year III Semester II
10 pages
A Project Report On Decision Making
No ratings yet
A Project Report On Decision Making
5 pages
Week 3 - System Analysis Fundamentals
No ratings yet
Week 3 - System Analysis Fundamentals
35 pages
Group 7 Problem Solving Tools and Techniques
No ratings yet
Group 7 Problem Solving Tools and Techniques
74 pages
463 - Fis 413 Lecture Notespdf
No ratings yet
463 - Fis 413 Lecture Notespdf
21 pages
Collect Requirements
No ratings yet
Collect Requirements
10 pages
Teaching Handout Chapter 2
No ratings yet
Teaching Handout Chapter 2
8 pages
Project Audit: Phase 1 - Success Criteria and Questionnaire Development
No ratings yet
Project Audit: Phase 1 - Success Criteria and Questionnaire Development
4 pages
Project Work Format and Guidelines
No ratings yet
Project Work Format and Guidelines
12 pages
Hair 4e IM Ch02
No ratings yet
Hair 4e IM Ch02
21 pages
Unit 1 Half
No ratings yet
Unit 1 Half
8 pages
As a final year ICT student
No ratings yet
As a final year ICT student
3 pages
Chapter 1- Intr to DS and Business Understanding
No ratings yet
Chapter 1- Intr to DS and Business Understanding
35 pages
Chapter 2
No ratings yet
Chapter 2
23 pages
Institute of Business Management College of Business Management Department of Business Course Specification Methods in Business Research
No ratings yet
Institute of Business Management College of Business Management Department of Business Course Specification Methods in Business Research
20 pages
Assignment Frontshet EPD
No ratings yet
Assignment Frontshet EPD
29 pages
Manish 2
No ratings yet
Manish 2
42 pages
Section 1 - : Techniques To Translate Design Research Into Useful, Usable, and Desirable Products
No ratings yet
Section 1 - : Techniques To Translate Design Research Into Useful, Usable, and Desirable Products
30 pages
HCI II Questions and Answers (AutoRecovered)
No ratings yet
HCI II Questions and Answers (AutoRecovered)
15 pages
Chap 1
No ratings yet
Chap 1
42 pages
Preliminary Investigation
No ratings yet
Preliminary Investigation
17 pages
Bi Unit 1 PDF
No ratings yet
Bi Unit 1 PDF
38 pages
6 Phrase of Data Analysis
No ratings yet
6 Phrase of Data Analysis
9 pages
Elicitation Techniques for Business Analysis
From Everand
Elicitation Techniques for Business Analysis
Kadir Çamoğlu
No ratings yet
25 - BC - AssetSMART-2.0-A-Tool-to-Assess-Your-Communitys-Asset-Management-Practices
No ratings yet
25 - BC - AssetSMART-2.0-A-Tool-to-Assess-Your-Communitys-Asset-Management-Practices
10 pages
The List
No ratings yet
The List
11 pages
Quality System Model Ich Q10
No ratings yet
Quality System Model Ich Q10
48 pages
Blomsma F 2017 PHD Thesis
No ratings yet
Blomsma F 2017 PHD Thesis
323 pages
M SC II Environmental Science
No ratings yet
M SC II Environmental Science
22 pages
Session 6-MATERIALS AND RESOURCES
No ratings yet
Session 6-MATERIALS AND RESOURCES
14 pages
Full Download Assessing The Environmental Impact of Textiles and The Clothing Supply Chain 2nd Ed Edition Muthu PDF
100% (2)
Full Download Assessing The Environmental Impact of Textiles and The Clothing Supply Chain 2nd Ed Edition Muthu PDF
64 pages
Eco-Sheet 1.6 MVA Industrial Transformer Designs With Increasing Efficiency
No ratings yet
Eco-Sheet 1.6 MVA Industrial Transformer Designs With Increasing Efficiency
5 pages
Iso 55002 P0wer Plant
No ratings yet
Iso 55002 P0wer Plant
13 pages
Refrigerant Report 21
No ratings yet
Refrigerant Report 21
46 pages
ESIA Unit 1 Notes
No ratings yet
ESIA Unit 1 Notes
25 pages
Guidelines Physical Asset Management: Section I
No ratings yet
Guidelines Physical Asset Management: Section I
169 pages
Life Cycle Inventory (LCI) Study
No ratings yet
Life Cycle Inventory (LCI) Study
74 pages
Sustainability 15 05363
No ratings yet
Sustainability 15 05363
23 pages
Making Sense of Sustainable Project Management
No ratings yet
Making Sense of Sustainable Project Management
4 pages
Module 3 Comprehension Questions
No ratings yet
Module 3 Comprehension Questions
2 pages
FRM 101 Management of Resources and System Dynamics 2(1+1)
No ratings yet
FRM 101 Management of Resources and System Dynamics 2(1+1)
133 pages
Energy and Built Environment
No ratings yet
Energy and Built Environment
14 pages
Moventor Skiddometer BV11
No ratings yet
Moventor Skiddometer BV11
8 pages
DMR-2505 DMR-2505
No ratings yet
DMR-2505 DMR-2505
44 pages
EPD Al Amaar Block Manufacturing Co. LLC Concrete Masonry Thermal Insulated Block HUB-0194 2023-12-18
No ratings yet
EPD Al Amaar Block Manufacturing Co. LLC Concrete Masonry Thermal Insulated Block HUB-0194 2023-12-18
11 pages
Life Cycle Assessment For Road Construction and Use
100% (1)
Life Cycle Assessment For Road Construction and Use
20 pages
Waste management notes
No ratings yet
Waste management notes
29 pages
Life Cycle Assessment
No ratings yet
Life Cycle Assessment
15 pages
A Practical Guide To Sustainable Fashion 2nd Edition Alison Gwilt 2024 Scribd Download
100% (7)
A Practical Guide To Sustainable Fashion 2nd Edition Alison Gwilt 2024 Scribd Download
62 pages
C2022 - Takagi - ISO 21500 and SM
No ratings yet
C2022 - Takagi - ISO 21500 and SM
20 pages

Chapter 02 DataAnalyticsLifecycle

Uploaded by

Chapter 02 DataAnalyticsLifecycle

Uploaded by

DATA ANALYTICS

After studying this chapter, the student should

1. Data Analytics Lifecycle Overview

 Data Analytics Lifecycle

 Phase 3—Model planning

 Phase 5—Communicate results

 Learning the Business Domain

 Learn and investigate the problem, develop context and

 Assess resources available to support the project: technology,

 Framing is the process of stating the analytics problem to be solved.

 Important step is to identify the key stakeholders and

 When interviewing the main stakeholders, the team

 Some tips for interviewing project sponsors:

 Common questions that are helpful to ask during the

 Developing a set of Initial Hypotheses (IHs) is a key facet of the

 Five main activities during this step of the discovery

 Preparing the Analytic Sandbox

 Data Exploration and Variable Selection

 To understand the relationships among the variables to inform selection of

 Main goal is to choose an analytical technique, or a short list

 R has a complete set of modeling capabilities and

 Develop datasets for training, testing,

 Questions to consider include these:

 Considers how best to articulate the

 Communicates benefits of the project more broadly and

FIGURE 2-9 Key

 EMC’s Global Innovation Network and Analytics (GINA) team is a group of

 Identifying data sources

 Two main categories of data

 10 main IHs that the GINA team developed were as follows:

 10 main IHs that the GINA team developed were as follows:

 Team partnered with its IT department to set up a new analytics sandbox to

 For much of the dataset seemed feasible to use social network

 Parameters related to scope of the study included the

 In Phase 4, the GINA team employed several

FIGURE 2-10 Social graph

FIGURE 2-11 Social graph visualization of top innovation influenc

 Key findings from the project include these:

 Table 2-3 outlines

 Data Analytics Lifecycle, which is an approach to

You might also like