Chapter 02 DataAnalyticsLifecycle
Chapter 02 DataAnalyticsLifecycle
LIFECYCLE
Author : FU
Date : Mar-2022
Objectives
Business User
Project Sponsor
Project Manager
Business Intelligence Analyst
Database Administrator
Data Engineer
Data Scientist
FIGURE 2-1 Key roles for a successful analytics project
1.2 Process overview (1)
Phase 1—Discovery
o Learns business domain, including relevant history such as
whether the organization or business unit has attempted similar
projects in the past
o Assesses resources available to support the project: people,
technology, time, and data
o Important activities : framing the business problem as an
analytics challenge and formulating initial hypotheses
Phase 2—Data preparation
o Presence of an analytic sandbox
o Execute extract, load, and transform (ELT) or extract, transform
and load (ETL) to get data into the sandbox. Data transformed in
the ETLT process so the team can work with it and analyze it.
1.2 Process overview (3)
Commercial Tools
o SAS Enterprise Miner
o SPSS Modeler
o Matlab
o Alpine Miner
o STATISTICA and Mathematica
Free or Open Source tools
o R and PL/R
o Octave
o WEKA
o Python
o SQL in-database implementations, such as MADlib
6. Phase 5: Communicate Results
Team found several ways to cull results of the analysis and identify
the most impactful and relevant findings. This project was considered
successful in identifying boundary spanners and hidden innovators.
As a result, CTO office launched longitudinal studies to begin data
collection efforts and track innovation results over longer periods of
time. The GINA project promoted knowledge sharing related to
innovation and researchers spanning multiple areas within the
company and outside of it.
GINA also enabled EMC to cultivate additional intellectual property
that led to additional research topics and provided opportunities to
forge relationships with universities for joint academic research in the
fields of Data Science and Big Data. In addition, the project was
accomplished with a limited budget, leveraging a volunteer force of
highly skilled and distinguished engineers and data scientists.
8.5 Phase 5: Communicate Results (2)
One of the key findings from the project is that there was a disproportionately
high density of innovators in Cork, Ireland. Each year, EMC hosts an innovation
contest, open to employees to submit innovation ideas that would drive new
value for the company. When looking at the data in 2011, 15% of the finalists and
15% of the winners were from Ireland.
These are unusually high numbers, given the relative size of the Cork COE
compared to other larger centers in other parts of the world. After further
research, it was learned that the COE in Cork, Ireland had received focused
training in innovation from an external consultant, which was proving effective.
The Cork COE came up with more innovation ideas, and better ones, than it had
in the past, and it was making larger contributions to innovation at EMC. It would
have been difficult, if not impossible, to identify this cluster of innovators through
traditional methods or even anecdotal, word-of-mouth feedback.
Applying social network analysis enabled the team to find a pocket of people
within EMC who were making disproportionately strong contributions. These
findings were shared internally through presentations and conferences and
promoted through social media and blogs.
8.6 Phase 6: Operationalize (1)