Big Data Analytics
Big Data Analytics
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 1
Introduction to Big Data Analytics
Your Thoughts?
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 2
Big Data Defined
• “Big Data” is data whose scale, distribution, diversity,
and/or timeliness require the use of new technical
architectures and analytics to enable insights that unlock
new sources of business value.
Requires new data architectures, analytic sandboxes
New tools
New analytical methods
Integrating multiple skills into new role of data scientist
2
EMC PROVEN PROFESSIONAL
Copyright © 2011
2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA
Key Characteristics of Big Data
1. Data Volume
44x increase from 2010 to 2020
(1.2zettabytes to 35.2zb)
2. Processing Complexity
Changing data structures
Use cases warranting additional transformations and
analytical techniques
3. Data Structure
Greater variety of data structures to mine and analyze
2
EMC PROVEN PROFESSIONAL
Copyright © 2011
2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA
Big Data Characteristics:
S Data Structures
Data Growth e is Increasingly Unstructured
“ m
Q i • Data containing a defined data type, format, structure
u -
• Example: Transaction data and OLAP
a S
s t • Textual data files with a discernable pattern,
i r
More Structured
enabling parsing
Semi-Structured Data
View Source
http://www.google.com/
#hl=en&sugexp=kjrmc&cp=8&gs_id=2m&xhr=t&q=data+scientist&pq=big+data&pf=p&sclien
t=psyb&source=hp&pbx=1&oq=data+sci&aq=0&aqi=g4&aql=f&gs_sm=&gs_upl=&bav=on.2,
or.r_gc.r_pw.,cf.osb&fp=d566e0fbd09c8604&biw=1382&bih=651
Unstructured Data
The Red Wheelbarrow, by
William Carlos Williams
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 6
Data Repositories, An Analyst Perspective
Data Islands Data Warehouses Analytic Sandbox
“Spreadmarts”
Centralized data containers Data assets gathered from multiple
Isolated data marts in a purpose-built space sources and technologies for analysis
• Spreadsheets and low- • Supports BI and reporting, but • Enables high performance analytics
volume DB‘s for restricts robust analyses using in-db processing
recordkeeping • Analyst dependent on IT & • Reduces costs associated with data
• Analyst dependent on DBAs for data access and replication into "shadow" file
data extracts schema changes systems
• Analysts must spend significant • “Analyst-owned” rather than “DBA
time to get extracts from owned”
multiple sources
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 7
Introduction to Big Data Analytics: Mini-Case Study
Yoyodyne Bank Scenario
• Evolving from small community bank to a global bank
• Needs to move away from its legacy mainframes to an environment that
supports more robust analytics
• Growing through mergers and acquisitions
• Subject to many new regulatory requirements
• Increasing customer base and increased product offerings
Your Thoughts?
Discussion Questions
1. Discuss how the bank’s data would change under these circumstances.
2. How are their needs changing with these business changes?
3. What do you need to consider from an analyst point of view? What are
some things to consider implementing as the bank grows?
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 9
Business Drivers for Analytics
Current Business Problems Provide Opportunities for Organizations to
Become More Analytical & Data Driven
Driver Examples
1
Desire to optimize business
Sales, pricing, profitability, efficiency
operations
2
Desire to identify business risk Customer churn, fraud, default
3
Predict new business opportunities Upsell, cross-sell, best new customer prospects
4
Comply with laws or regulatory
Anti-Money Laundering, Fair Lending, Basel II
requirements
2
EMC PROVEN PROFESSIONAL
Copyright © 2011
2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 10
Analytical Approaches for Meeting Business Drivers
Business Intelligence vs. Data Science
Predictive Analytics & Data Mining
(Data Science)
Typical • Optimization, predictive modeling,
Techniques & forecasting, statistical analysis
Data Types • Structured/unstructured data, many types
of sources, very large data sets
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 11
A Typical Analytical Architecture
1 Data
Sources
Non-Agile Models
2 Departmental
“Spread
Marts”
Warehouse
Enterprise 4
Departmental Applications
Warehouse
3 Prioritized
Operational
Processes
Static schemas
accrete over time Reporting Siloed
Analytics
2
EMC PROVEN PROFESSIONAL
Copyright © 2011
2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 12
Implications of Typical Architecture for Data Science
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 14
Opportunities for a New Approach to Analytics
New Applications Driving Data Volume
SMALL
Data
Users/Buyers
Catalog
4 Co-Ops
Phone/TV Retail
Media
Private
Media Credit List Investigators
Archives Bureaus Financial Brokers Delivery /Lawyers
Banks Service
Government
2
EMC PROVEN PROFESSIONAL
Copyright © 2011
2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 16
Considerations for Big Data Analytics
Criteria for Big Data Projects New Analytic Architecture
Analytic Sandbox
Data assets gathered from multiple sources
1. Speed of decision making and technologies for analysis
2. Throughput
• Enables high performance analytics
using in-db processing
3. Analysis flexibility • Reduces costs associated with data
replication into "shadow" file
systems
• “Analyst-owned” rather than “DBA
owned”
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 19
State of the Practice in Analytics: Mini-Case Study
Big Data Enabled Loan Processing at Yoyodyne
e t al
om on en y ing ais
c
In ati ym o r c or tory pr
ic plo is
t S is Ap
e rif Em H edit d H
V Cr An
2
EMC PROVEN PROFESSIONAL
Copyright © 2011
2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 20
Three Key Roles of the New Data Ecosystem
Note: Figures above reflect a projected talent gap in US in 2018, as shown in McKinsey May 2011 article Big Data: The next frontier for innovation,
competition, and productivity
2
EMC PROVEN PROFESSIONAL
Copyright © 2011
2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 21
Roles Needed for Analytical Projects
Data Scientist Key Activities
Data Scientists
Key Activities Data Data Bl LOB
Engineers Analyst Analyst
• Reframe business User
challenges as analytics
challenges Analytic Productivity Platform
2
EMC PROVEN PROFESSIONAL
Copyright © 2011
2012 EMC Corporation. All Rights Reserved. Module 1: Intro duction to BDA 22
Profile of a Data Scientist
Quantitative
Curious &
Technical
Creative
Skeptical Communicative
& Collaborative
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 23
Big Data Analytics: Industry Examples
1
Health Care
• Reducing Cost of Care Medical
• Preventing Pandemics
3 Life Sciences Data
Collectors
• Genomic Mapping
4 IT Infrastructure
• Unstructured Data Analysis
Phone/TV Retail
5 Online Services
Financial
• Social Media for Professionals
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 24
1
Big Data Analytics: Healthcare
• Dr. Jeffrey Brenner generated his own crime maps from medical
Use of Big Data billing records of 3 hospitals
• City hospitals & ER’s provided expensive care, low quality care
• Reduced hospital costs by 56% by realizing that 80% of city’s
Key medical costs came from 13% of its residents, mainly low-
Outcomes income or elderly
• Now offers preventative care over the phone or through home
visits
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 25
2
Big Data Analytics: Public Services
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 26
3
Big Data Analytics: Life Sciences
Situation • Broad Institute (MIT & Harvard) mapping the Human Genome
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 27
4
Big Data Analytics: IT Infrastructure
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 28
5
Big Data Analytics: Online Services
2
EMC PROVEN PROFESSIONAL
Copyright © 2012 EMC Corporation. All Rights Reserved. Module 1: Introduction to BDA 29