SlideShare a Scribd company logo
MAKING BIG DATA COME ALIVE
Adding Hadoop to Your Analytics Mix:
Challenges and Strategies
Madina Kassengaliyeva
July 23, 2015
2
Madina Kassengaliyeva
Director, Client Services, Think Big
Madina Kassengaliyeva is responsible for ensuring successful
delivery of Think Big’s service engagements. Madina has led
strategy, engineering and data science engagements in a variety
of areas, including recommendation engines, customer
interactions optimization, marketing analytics and compliance.
Madina holds an MBA from the University of Chicago and a BA in
International Studies from American University.
Presenters
© 2015 Think Big, a Teradata Company 8/3/2015
Paul Barsch
Director, Services Marketing, Think Big
Paul Barsch directs marketing programs for Think Big, a Teradata
Company. Paul has been in IT for 15+ years in variety of roles for
Teradata, HP Enterprise Services and KPMG Consulting.
3
Housekeeping
Use the widget bar below to…
Get valuable resources & complete exit survey
Ask Questions to the Presenters
Request online technical help
Go social….
…and follow the conversation
© 2015 Think Big, a Teradata Company 8/3/2015
4
• Hadoop Adoption Path
• Key Challenges – Data,
Organization, Capabilities
• Ideas for Solutions
Agenda
5
Common Hadoop Adoption Path
© 2015 Think Big, a Teradata Company 8/3/2015
1. Address
Immediate
Needs
2. Establish a
Data
Repository
3. Initial
Analytics
Exploration
4. Integrate
Hadoop into
the Analytics
Capabilities
• Hadoop used to
relieve a technology
pain point
• Reduce data
warehouse costs
• Speed up ETL
• The only users are in
technology teams
• More and more data gets
added to Hadoop as a
result of Phase 1
• Greater data variety,
more raw data, deeper
history
• Initial data transfer,
security, and governance
practices are established
• Still perceived as largely
a technology platform
• Limited number of people
or teams conduct POCs
using Hadoop
• Analytics techniques not
available on traditional
platforms are applied
• Early wins indicate
promising business impact
and excitement builds
• Multiple teams use
Hadoop as part of the
analytics infrastructure
• Techniques, methods,
best practices and access
patterns get codified
• Business begins to
capture consistent value
Transition from Phase 3
to Phase 4 is when key
challenges emerge
6
Hadoop Adoption – Critical Point
© 2015 Think Big, a Teradata Company 8/3/2015
7
Key Challenges
© 2015 Think Big, a Teradata Company 8/3/2015
Data
Organization
Capabilities
• Impact of schema on read
• Consistent taxonomies and reference data
• Architecture - access patterns and flows
• Skills, roles and responsibilities
• Lack of common vocabulary
• Knowledge capture and sharing
• Foundational capabilities at the whim of
changing business priorities
• Future that’s hard to envision is hard to build
8
Organization – Key Challenges
© 2015 Think Big, a Teradata Company 8/3/2015
• Skills, roles and responsibilities
o Significant skills gaps between what’s currently available and what is
needed
o Both business and technology do analytics and often engineering, blurring
lines of responsibility or ownership
o “Throw over the wall” doesn’t work
• Lack of common vocabulary
o Every BU (and every leader) have their own understanding of the same
words
o This is rarely discussed
• Knowledge capture and sharing
o Multiple teams work with the same data and similar techniques
o Organization silos do not naturally support broad knowledge transfer
9
• Cross-BU committee to guide
organizational change, define
common vocabulary, defend the
effort to executive leadership and
share success
• Thorough, honest skills assessments to
identify gaps, training needs,
augmentation needs, map to roles
and responsibilities
• Documented tools requirements
based on current and projected skills
• Collaboration architecture
• Plug into existing knowledge transfer
practices and tools and allow for
informal information exchange based
on data access privileges
Organization – Ideas for Solutions
© 2015 Think Big, a Teradata Company 8/3/2015
10
Organization – Key Functions
© 2015 Think Big, a Teradata Company 8/3/2015
Strategy
Data Management & Governance
Architecture Tools Market
Research
Roadmap
Planning
Value
Realization
Future Data
Sources
Services
Support
Visualization &
ReportingData SME’s
Core Platform
Development Testing
Operations
Core Platform
Management
Metrics Tracking &
Reporting Platform Integration
Program
Management
Roadmap
Execution
Cross Group
Coordination
Financial
Management
Small Project
Prioritization
Communication
& Change
Management
Application
Development
Analytic
Sandbox
Data Science
Integration,
Interfaces &
Ingestion
Training
Incident Management Config, Change,
Release ManagementProblem Management
Help DeskKnowledge
Management
Technology
Governanc
e
Data
Quality &
Metrics
Access
Controls
Data
Governance
Metadata
Management
11
• Foundational capabilities at the whim of changing business priorities
• Lack of consensus on what are foundational capabilities
• Let’s be honest, the “Top Project” changes often and the resources go
with it
• Foundational capabilities do not immediately impact the bottom line
• Future that’s hard to envision is hard to build
• Lack of shared vision
• Clarity needed at multiple levels – strategy, operational details, day to
day
Capabilities – Key Challenges
© 2015 Think Big, a Teradata Company 8/3/2015
12
• Consolidate ownership in a team that has
organizational influence and includes
representatives from the business, the
infrastructure, architecture, data, and
analytics
• Back to vocabulary – agree on what
capabilities mean for your business unit and
your technology partners
• Roadmaps are useful – visual representations
of high-level goals against a time line that
should define your projects
• Dedicate resource to capabilities and
protect them
• Check in with your roadmap – does it still
reflect your vision?
Capabilities – Ideas for Solutions
© 2015 Think Big, a Teradata Company 8/3/2015
Photo courtesy of Flickr. Creative Commons.
By E.Bass.
13
Capabilities Pyramid
© 2015 Think Big, a Teradata Company 8/3/2015
14
Capabilities: Roadmap Example
© 2015 Think Big, a Teradata Company 8/3/2015
Analytics
standardized
methods,
code, tools,
team roles
Operations
standardized
processes,
tools, team
roles
Skills and roles
matrix
Data Ingestion, Transfer,
Structuring,
and Governance approach
Unified Model Management
Integrated
Data Science
Variables based on single source
structured data
Variable selection in
Hadoop
Integration with existing
scoring engine
Batch data processing in HadoopIntegration Cross-channel and intraday variables generation
Batch scoring in Hadoop
Natural language processing
to analyze text and voice
Initial real-time scoring
Execution Methodology and
project management
Data and
Models
Organization
and
Managemen
t
Analytics Knowledge
Management
Scoring Architectural
and Analytical design
Data Lifecycle Management
Real-time scoring design
Statistical and machine-learning-based
modeling
Data Exploration of unstructured data
components (e.g. URL, chat text)
Data Exploration of structured data
components (e.g. page views,
Cross-channel variables, variables from unstructured data +
intraday variables
15
• Impact of schema on read
• Hadoop supports a variety of data structures, which simplifies data
ingestion and allows data users to define preferred schemas
• This shifts the burden of defining the schema to the data users
• Consistent taxonomies and reference data
• Meaningful data analysis requires known and consistent taxonomy
• New taxonomies can get created by individual teams
• Reference data changes
• Architecture - access patterns and flows
• Data flows across platforms, regular updates, physical and virtual
constraints
• Decisions on what should be done where
Data – Key Challenges
© 2015 Think Big, a Teradata Company 8/3/2015
16
• Big issue with lots of opinions – see Data Lake
et. al
• Test and define common data manipulation
patterns for different use cases –
aggregations, reductions, basic statistical
derivations
• Centralize the responsibility for data
governance, data architecture, taxonomy,
and maintenance
• Establish knowledge sharing for data post-
analytics
Data – Ideas for Solutions
© 2015 Think Big, a Teradata Company 8/3/2015
Photo courtesy of Flickr. Creative Commons.
By Renzo Ferrante
17
• Data management,
knowledge, architecture, and
processing assurance
• Investment justification,
research, knowledge sharing
• Data aggregation and
enhancement
Client Example – Centralized Data Group
© 2015 Think Big, a Teradata Company 8/3/2015
Data Source 1
Data Source 2
Data Source 3
Data Source 3
Business
Group
Product
Group
Central Tech
Group
18
Conclusions
© 2015 Think Big, a Teradata Company 8/3/2015
Data
Organization
Capabilities
• Centralize data management
• Knowledge of data = knowledge of business
• Technology is not enough – need the right
people and processes
• Executive commitment is key
• Tough conversations can yield much better
alignment
• Dedicate and protect resources to build
capabilities
19
• 100% Big Data Focus
• Founded in 2010 with100+ engagements across 70 clients
• Unlock value of big data with data science and data
engineering services
• Proven vendor-neutral open source integration expertise
• Agile team-based development methodology
• Think Big Academy for skills and organizational development
• Global delivery model
Who is Think Big?
20
Questions
and Answers
Thank You!
Ad

More Related Content

What's hot (20)

Gartner Business Intelligence & Analytics Summit Brochure
Gartner Business Intelligence & Analytics Summit BrochureGartner Business Intelligence & Analytics Summit Brochure
Gartner Business Intelligence & Analytics Summit Brochure
Nadia Smith
 
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Molly Alexander
 
Data Science in Action for an Insurance Product - Shawn Jin
Data Science in Action for an Insurance Product - Shawn JinData Science in Action for an Insurance Product - Shawn Jin
Data Science in Action for an Insurance Product - Shawn Jin
Molly Alexander
 
Business Value of Data
Business Value of Data Business Value of Data
Business Value of Data
UIResearchPark
 
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan PowerEnsuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
Molly Alexander
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
Capgemini
 
Virtual Governance in a Time of Crisis Workshop
Virtual Governance in a Time of Crisis WorkshopVirtual Governance in a Time of Crisis Workshop
Virtual Governance in a Time of Crisis Workshop
CCG
 
NLB Analytics Overview
NLB Analytics OverviewNLB Analytics Overview
NLB Analytics Overview
Kevin Dingle
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
DATAVERSITY
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | Qubole
Vasu S
 
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data ModelerThe Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
DATAVERSITY
 
Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That WorkKeys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Senturus
 
Business Analytics Overview
Business Analytics OverviewBusiness Analytics Overview
Business Analytics Overview
SAP Analytics
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.
Andrey Akulov
 
Real-Time Data Integration for Modern BI
Real-Time Data Integration for Modern BIReal-Time Data Integration for Modern BI
Real-Time Data Integration for Modern BI
ibi
 
Introduction to business intelligence
Introduction to business intelligenceIntroduction to business intelligence
Introduction to business intelligence
ThilinaWanshathilaka
 
Big data
Big dataBig data
Big data
Srinivasa Reddy
 
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
VMware Tanzu
 
Data Governance with Profisee, Microsoft & CCG
Data Governance with Profisee, Microsoft & CCG Data Governance with Profisee, Microsoft & CCG
Data Governance with Profisee, Microsoft & CCG
CCG
 
Predictions for the Future of Graph Database
Predictions for the Future of Graph DatabasePredictions for the Future of Graph Database
Predictions for the Future of Graph Database
Neo4j
 
Gartner Business Intelligence & Analytics Summit Brochure
Gartner Business Intelligence & Analytics Summit BrochureGartner Business Intelligence & Analytics Summit Brochure
Gartner Business Intelligence & Analytics Summit Brochure
Nadia Smith
 
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Molly Alexander
 
Data Science in Action for an Insurance Product - Shawn Jin
Data Science in Action for an Insurance Product - Shawn JinData Science in Action for an Insurance Product - Shawn Jin
Data Science in Action for an Insurance Product - Shawn Jin
Molly Alexander
 
Business Value of Data
Business Value of Data Business Value of Data
Business Value of Data
UIResearchPark
 
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan PowerEnsuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
Molly Alexander
 
Informatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake EcosystemInformatica Becomes Part of the Business Data Lake Ecosystem
Informatica Becomes Part of the Business Data Lake Ecosystem
Capgemini
 
Virtual Governance in a Time of Crisis Workshop
Virtual Governance in a Time of Crisis WorkshopVirtual Governance in a Time of Crisis Workshop
Virtual Governance in a Time of Crisis Workshop
CCG
 
NLB Analytics Overview
NLB Analytics OverviewNLB Analytics Overview
NLB Analytics Overview
Kevin Dingle
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
DATAVERSITY
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | Qubole
Vasu S
 
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data ModelerThe Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
The Heart of Data Modeling: The Best Data Modeler is a Lazy Data Modeler
DATAVERSITY
 
Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That WorkKeys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Keys toSuccess: Business Intelligence Proven, Practical Strategies That Work
Senturus
 
Business Analytics Overview
Business Analytics OverviewBusiness Analytics Overview
Business Analytics Overview
SAP Analytics
 
Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.Эволюция Big Data и Information Management. Reference Architecture.
Эволюция Big Data и Information Management. Reference Architecture.
Andrey Akulov
 
Real-Time Data Integration for Modern BI
Real-Time Data Integration for Modern BIReal-Time Data Integration for Modern BI
Real-Time Data Integration for Modern BI
ibi
 
Introduction to business intelligence
Introduction to business intelligenceIntroduction to business intelligence
Introduction to business intelligence
ThilinaWanshathilaka
 
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
VMware Tanzu
 
Data Governance with Profisee, Microsoft & CCG
Data Governance with Profisee, Microsoft & CCG Data Governance with Profisee, Microsoft & CCG
Data Governance with Profisee, Microsoft & CCG
CCG
 
Predictions for the Future of Graph Database
Predictions for the Future of Graph DatabasePredictions for the Future of Graph Database
Predictions for the Future of Graph Database
Neo4j
 

Viewers also liked (6)

Data Modeling on NoSQL
Data Modeling on NoSQLData Modeling on NoSQL
Data Modeling on NoSQL
Think Big, a Teradata Company
 
Industrial Analytics and Predictive Maintenance 2017 - 2022
Industrial Analytics and Predictive Maintenance 2017 - 2022Industrial Analytics and Predictive Maintenance 2017 - 2022
Industrial Analytics and Predictive Maintenance 2017 - 2022
Rising Media Ltd.
 
Predictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environmentPredictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environment
Capgemini
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...
PAPIs.io
 
Predictive Analytics: Extending asset management framework for multi-industry...
Predictive Analytics: Extending asset management framework for multi-industry...Predictive Analytics: Extending asset management framework for multi-industry...
Predictive Analytics: Extending asset management framework for multi-industry...
Capgemini
 
Deep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDeep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up Seattle
Domino Data Lab
 
Industrial Analytics and Predictive Maintenance 2017 - 2022
Industrial Analytics and Predictive Maintenance 2017 - 2022Industrial Analytics and Predictive Maintenance 2017 - 2022
Industrial Analytics and Predictive Maintenance 2017 - 2022
Rising Media Ltd.
 
Predictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environmentPredictive Maintenance by analysing acoustic data in an industrial environment
Predictive Maintenance by analysing acoustic data in an industrial environment
Capgemini
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...
PAPIs.io
 
Predictive Analytics: Extending asset management framework for multi-industry...
Predictive Analytics: Extending asset management framework for multi-industry...Predictive Analytics: Extending asset management framework for multi-industry...
Predictive Analytics: Extending asset management framework for multi-industry...
Capgemini
 
Deep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDeep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up Seattle
Domino Data Lab
 
Ad

Similar to Adding Hadoop to Your Analytics Mix? (20)

2013 ALPFA Leadership Submit, Data Analytics in Practice
2013 ALPFA Leadership Submit, Data Analytics in Practice2013 ALPFA Leadership Submit, Data Analytics in Practice
2013 ALPFA Leadership Submit, Data Analytics in Practice
Alejandro Jaramillo
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced Analytics
DATAVERSITY
 
Five Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data StrategyFive Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data Strategy
Perficient, Inc.
 
Bersin by Deloitte - Demystifying Big Data
Bersin by Deloitte - Demystifying Big DataBersin by Deloitte - Demystifying Big Data
Bersin by Deloitte - Demystifying Big Data
NetDimensions
 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data Governance
Roland Bullivant
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
DATAVERSITY
 
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data LandscapeData Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
DATAVERSITY
 
DAS Slides: Self-Service Reporting and Data Prep – Benefits & Risks
DAS Slides: Self-Service Reporting and Data Prep – Benefits & RisksDAS Slides: Self-Service Reporting and Data Prep – Benefits & Risks
DAS Slides: Self-Service Reporting and Data Prep – Benefits & Risks
DATAVERSITY
 
Data-Ed Online Webinar: Metadata Strategies
Data-Ed Online Webinar: Metadata StrategiesData-Ed Online Webinar: Metadata Strategies
Data-Ed Online Webinar: Metadata Strategies
DATAVERSITY
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
DataWorks Summit
 
Building a Data Strategy Your C-Suite Will Support
Building a Data Strategy Your C-Suite Will SupportBuilding a Data Strategy Your C-Suite Will Support
Building a Data Strategy Your C-Suite Will Support
Reid Colson
 
Analytic Roadmap Customer Overview - 2015 TUG Final-drs
Analytic Roadmap Customer Overview - 2015 TUG Final-drsAnalytic Roadmap Customer Overview - 2015 TUG Final-drs
Analytic Roadmap Customer Overview - 2015 TUG Final-drs
David Schiller
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing Strategies
Data Blueprint
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
DATAVERSITY
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platform
Haoran Du
 
Are you getting the most out of your data?
Are you getting the most out of your data?Are you getting the most out of your data?
Are you getting the most out of your data?
SAS Canada
 
Big data@work
Big data@workBig data@work
Big data@work
Rahul Ingle,P3O, PMP,Agile Practitioner,Prince2,6σ,ITIL
 
Data-Ed: Metadata Strategies
 Data-Ed: Metadata Strategies Data-Ed: Metadata Strategies
Data-Ed: Metadata Strategies
Data Blueprint
 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM
Data Blueprint
 
Data-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDMData-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDM
DATAVERSITY
 
2013 ALPFA Leadership Submit, Data Analytics in Practice
2013 ALPFA Leadership Submit, Data Analytics in Practice2013 ALPFA Leadership Submit, Data Analytics in Practice
2013 ALPFA Leadership Submit, Data Analytics in Practice
Alejandro Jaramillo
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced Analytics
DATAVERSITY
 
Five Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data StrategyFive Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data Strategy
Perficient, Inc.
 
Bersin by Deloitte - Demystifying Big Data
Bersin by Deloitte - Demystifying Big DataBersin by Deloitte - Demystifying Big Data
Bersin by Deloitte - Demystifying Big Data
NetDimensions
 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data Governance
Roland Bullivant
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
DATAVERSITY
 
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data LandscapeData Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
DATAVERSITY
 
DAS Slides: Self-Service Reporting and Data Prep – Benefits & Risks
DAS Slides: Self-Service Reporting and Data Prep – Benefits & RisksDAS Slides: Self-Service Reporting and Data Prep – Benefits & Risks
DAS Slides: Self-Service Reporting and Data Prep – Benefits & Risks
DATAVERSITY
 
Data-Ed Online Webinar: Metadata Strategies
Data-Ed Online Webinar: Metadata StrategiesData-Ed Online Webinar: Metadata Strategies
Data-Ed Online Webinar: Metadata Strategies
DATAVERSITY
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
DataWorks Summit
 
Building a Data Strategy Your C-Suite Will Support
Building a Data Strategy Your C-Suite Will SupportBuilding a Data Strategy Your C-Suite Will Support
Building a Data Strategy Your C-Suite Will Support
Reid Colson
 
Analytic Roadmap Customer Overview - 2015 TUG Final-drs
Analytic Roadmap Customer Overview - 2015 TUG Final-drsAnalytic Roadmap Customer Overview - 2015 TUG Final-drs
Analytic Roadmap Customer Overview - 2015 TUG Final-drs
David Schiller
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing Strategies
Data Blueprint
 
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
Data Architecture Strategies: Building an Enterprise Data Strategy – Where to...
DATAVERSITY
 
Building enterprise advance analytics platform
Building enterprise advance analytics platformBuilding enterprise advance analytics platform
Building enterprise advance analytics platform
Haoran Du
 
Are you getting the most out of your data?
Are you getting the most out of your data?Are you getting the most out of your data?
Are you getting the most out of your data?
SAS Canada
 
Data-Ed: Metadata Strategies
 Data-Ed: Metadata Strategies Data-Ed: Metadata Strategies
Data-Ed: Metadata Strategies
Data Blueprint
 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM
Data Blueprint
 
Data-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDMData-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDM
DATAVERSITY
 
Ad

Recently uploaded (20)

04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 

Adding Hadoop to Your Analytics Mix?

  • 1. MAKING BIG DATA COME ALIVE Adding Hadoop to Your Analytics Mix: Challenges and Strategies Madina Kassengaliyeva July 23, 2015
  • 2. 2 Madina Kassengaliyeva Director, Client Services, Think Big Madina Kassengaliyeva is responsible for ensuring successful delivery of Think Big’s service engagements. Madina has led strategy, engineering and data science engagements in a variety of areas, including recommendation engines, customer interactions optimization, marketing analytics and compliance. Madina holds an MBA from the University of Chicago and a BA in International Studies from American University. Presenters © 2015 Think Big, a Teradata Company 8/3/2015 Paul Barsch Director, Services Marketing, Think Big Paul Barsch directs marketing programs for Think Big, a Teradata Company. Paul has been in IT for 15+ years in variety of roles for Teradata, HP Enterprise Services and KPMG Consulting.
  • 3. 3 Housekeeping Use the widget bar below to… Get valuable resources & complete exit survey Ask Questions to the Presenters Request online technical help Go social…. …and follow the conversation © 2015 Think Big, a Teradata Company 8/3/2015
  • 4. 4 • Hadoop Adoption Path • Key Challenges – Data, Organization, Capabilities • Ideas for Solutions Agenda
  • 5. 5 Common Hadoop Adoption Path © 2015 Think Big, a Teradata Company 8/3/2015 1. Address Immediate Needs 2. Establish a Data Repository 3. Initial Analytics Exploration 4. Integrate Hadoop into the Analytics Capabilities • Hadoop used to relieve a technology pain point • Reduce data warehouse costs • Speed up ETL • The only users are in technology teams • More and more data gets added to Hadoop as a result of Phase 1 • Greater data variety, more raw data, deeper history • Initial data transfer, security, and governance practices are established • Still perceived as largely a technology platform • Limited number of people or teams conduct POCs using Hadoop • Analytics techniques not available on traditional platforms are applied • Early wins indicate promising business impact and excitement builds • Multiple teams use Hadoop as part of the analytics infrastructure • Techniques, methods, best practices and access patterns get codified • Business begins to capture consistent value Transition from Phase 3 to Phase 4 is when key challenges emerge
  • 6. 6 Hadoop Adoption – Critical Point © 2015 Think Big, a Teradata Company 8/3/2015
  • 7. 7 Key Challenges © 2015 Think Big, a Teradata Company 8/3/2015 Data Organization Capabilities • Impact of schema on read • Consistent taxonomies and reference data • Architecture - access patterns and flows • Skills, roles and responsibilities • Lack of common vocabulary • Knowledge capture and sharing • Foundational capabilities at the whim of changing business priorities • Future that’s hard to envision is hard to build
  • 8. 8 Organization – Key Challenges © 2015 Think Big, a Teradata Company 8/3/2015 • Skills, roles and responsibilities o Significant skills gaps between what’s currently available and what is needed o Both business and technology do analytics and often engineering, blurring lines of responsibility or ownership o “Throw over the wall” doesn’t work • Lack of common vocabulary o Every BU (and every leader) have their own understanding of the same words o This is rarely discussed • Knowledge capture and sharing o Multiple teams work with the same data and similar techniques o Organization silos do not naturally support broad knowledge transfer
  • 9. 9 • Cross-BU committee to guide organizational change, define common vocabulary, defend the effort to executive leadership and share success • Thorough, honest skills assessments to identify gaps, training needs, augmentation needs, map to roles and responsibilities • Documented tools requirements based on current and projected skills • Collaboration architecture • Plug into existing knowledge transfer practices and tools and allow for informal information exchange based on data access privileges Organization – Ideas for Solutions © 2015 Think Big, a Teradata Company 8/3/2015
  • 10. 10 Organization – Key Functions © 2015 Think Big, a Teradata Company 8/3/2015 Strategy Data Management & Governance Architecture Tools Market Research Roadmap Planning Value Realization Future Data Sources Services Support Visualization & ReportingData SME’s Core Platform Development Testing Operations Core Platform Management Metrics Tracking & Reporting Platform Integration Program Management Roadmap Execution Cross Group Coordination Financial Management Small Project Prioritization Communication & Change Management Application Development Analytic Sandbox Data Science Integration, Interfaces & Ingestion Training Incident Management Config, Change, Release ManagementProblem Management Help DeskKnowledge Management Technology Governanc e Data Quality & Metrics Access Controls Data Governance Metadata Management
  • 11. 11 • Foundational capabilities at the whim of changing business priorities • Lack of consensus on what are foundational capabilities • Let’s be honest, the “Top Project” changes often and the resources go with it • Foundational capabilities do not immediately impact the bottom line • Future that’s hard to envision is hard to build • Lack of shared vision • Clarity needed at multiple levels – strategy, operational details, day to day Capabilities – Key Challenges © 2015 Think Big, a Teradata Company 8/3/2015
  • 12. 12 • Consolidate ownership in a team that has organizational influence and includes representatives from the business, the infrastructure, architecture, data, and analytics • Back to vocabulary – agree on what capabilities mean for your business unit and your technology partners • Roadmaps are useful – visual representations of high-level goals against a time line that should define your projects • Dedicate resource to capabilities and protect them • Check in with your roadmap – does it still reflect your vision? Capabilities – Ideas for Solutions © 2015 Think Big, a Teradata Company 8/3/2015 Photo courtesy of Flickr. Creative Commons. By E.Bass.
  • 13. 13 Capabilities Pyramid © 2015 Think Big, a Teradata Company 8/3/2015
  • 14. 14 Capabilities: Roadmap Example © 2015 Think Big, a Teradata Company 8/3/2015 Analytics standardized methods, code, tools, team roles Operations standardized processes, tools, team roles Skills and roles matrix Data Ingestion, Transfer, Structuring, and Governance approach Unified Model Management Integrated Data Science Variables based on single source structured data Variable selection in Hadoop Integration with existing scoring engine Batch data processing in HadoopIntegration Cross-channel and intraday variables generation Batch scoring in Hadoop Natural language processing to analyze text and voice Initial real-time scoring Execution Methodology and project management Data and Models Organization and Managemen t Analytics Knowledge Management Scoring Architectural and Analytical design Data Lifecycle Management Real-time scoring design Statistical and machine-learning-based modeling Data Exploration of unstructured data components (e.g. URL, chat text) Data Exploration of structured data components (e.g. page views, Cross-channel variables, variables from unstructured data + intraday variables
  • 15. 15 • Impact of schema on read • Hadoop supports a variety of data structures, which simplifies data ingestion and allows data users to define preferred schemas • This shifts the burden of defining the schema to the data users • Consistent taxonomies and reference data • Meaningful data analysis requires known and consistent taxonomy • New taxonomies can get created by individual teams • Reference data changes • Architecture - access patterns and flows • Data flows across platforms, regular updates, physical and virtual constraints • Decisions on what should be done where Data – Key Challenges © 2015 Think Big, a Teradata Company 8/3/2015
  • 16. 16 • Big issue with lots of opinions – see Data Lake et. al • Test and define common data manipulation patterns for different use cases – aggregations, reductions, basic statistical derivations • Centralize the responsibility for data governance, data architecture, taxonomy, and maintenance • Establish knowledge sharing for data post- analytics Data – Ideas for Solutions © 2015 Think Big, a Teradata Company 8/3/2015 Photo courtesy of Flickr. Creative Commons. By Renzo Ferrante
  • 17. 17 • Data management, knowledge, architecture, and processing assurance • Investment justification, research, knowledge sharing • Data aggregation and enhancement Client Example – Centralized Data Group © 2015 Think Big, a Teradata Company 8/3/2015 Data Source 1 Data Source 2 Data Source 3 Data Source 3 Business Group Product Group Central Tech Group
  • 18. 18 Conclusions © 2015 Think Big, a Teradata Company 8/3/2015 Data Organization Capabilities • Centralize data management • Knowledge of data = knowledge of business • Technology is not enough – need the right people and processes • Executive commitment is key • Tough conversations can yield much better alignment • Dedicate and protect resources to build capabilities
  • 19. 19 • 100% Big Data Focus • Founded in 2010 with100+ engagements across 70 clients • Unlock value of big data with data science and data engineering services • Proven vendor-neutral open source integration expertise • Agile team-based development methodology • Think Big Academy for skills and organizational development • Global delivery model Who is Think Big?