SlideShare a Scribd company logo
Customer Intelligence –
Harnessing Elephants at Transamerica
Stephen Lloyd
BI Architect, Transamerica Life & Protection
#strataconf #hadoopworld
Vishal Bamba
Chief Architect Transamerica Life & Protection
Dave Beaudoin
AVP Transamerica Investments & Retirement
2#strataconf #hadoopworld
Agenda
• About Transamerica
• Current State and Data Architecture
• Use Cases / Learnings
• POC Highlights
• Solution Design
• Lessons Learned
• Questions / Discussion
3#strataconf #hadoopworld
Transamerica Org Structure
• Investments & Retirement
– Retirement and Benefit Plan services to employers/employees
– Mutual funds and variable annuities
– Mission to help people save and invest wisely to secure their
retirement dreams, the I&R business unit serves more than 3 million
retirement plan participants across the entire spectrum of defined
benefit and defined contribution plans.
• Life & Protection
– Term and Perm insurance products
– Medicare supplement, long term care, accidental death, final expense
– Mission to protect what you’ve built, secure what’s next.
4#strataconf #hadoopworld
Data Architecture
• Rich data environment across organizational
business units, comprised of many source
systems across various platforms
• A consistent enterprise view of data across
business units is required
5#strataconf #hadoopworld
Data Architecture
6#strataconf #hadoopworld
Data Architecture
As in many industries,
we are focused on
leveraging technology
to build data-driven
customer
relationships.
How can the current data architecture
support this strategic direction?
7#strataconf #hadoopworld
Data Architecture
Enterprise Data Management
Crossroads
Another traditional
warehouse project?
or
Enterprise data
hub/lake/ocean with new
technologies?
8#strataconf #hadoopworld
Data Architecture – New Direction?
9#strataconf #hadoopworld
• 360 degree view of consumers for marketing, planning, and analytics
• Discover and mine relationships
• Create highly targeted and individualized marketing programs
The Vision
Enterprise Marketing and Analytics Platform
The How
• Co-location, master data management, custom data quality and cleansing rules
and more
• Which allow integration of data from across and outside of the company to
create 360 view of consumers
10#strataconf #hadoopworld
Use Cases
Complete a Data Append Process
Append Data Elements from Enrichment sources to
Customer/Prospect records
Tests: Data Integration, Processing Power
11#strataconf #hadoopworld
Use Cases
Score Prospects with a Predictive Model
Score prospects with the current model to predict likelihood
to respond to a direct marketing offer
Tests: Processing Power, Predictive Modeling
12#strataconf #hadoopworld
Use Cases
Measure online contracts/sales, create visitor
personas, create simple attribution
Match weblog visitor data to Salesforce leads and to policy
holders to generate a sales pipeline. Join to enrichment sources
to create personas. Join to Direct Mail solicitation history and
test for correlation.
Tests: Data Integration, Processing Power, Analytics
13#strataconf #hadoopworld
Web Log
Salesforce
Epsilon
Direct Mail
History
POC Success
Join weblogs > CRM > Demographics > mailings
Enabled:
• Connected online activity to offline activity
• Created time series of events
• Created demographic profile
TOTAL ANALYSIS TIME = 2 hours
120k visits
4,252 ppl
14% in niche
35% income > $100k
83 sent DM after visit
**Use case figures are for illustration purposes only**
14#strataconf #hadoopworld
Summary of Key Benefits
 Provides a single platform to house key customer and
prospect data sources,
 Establishes persistent keys across previously disparate data
sources ,
 Provides for rapid intake of new data sources (structured and
unstructured),
 Eliminates today’s data intake and append bottleneck,
 Empowers Analysts to explore all data elements,
 Increases processing power for statistical analysis and,
 Improves recruiting and retention of Data Engineers and Data
Scientists
15#strataconf #hadoopworld
Solution Architecture
PowerCenterBigDataEdition
HDFSDataQualityBigDataEdition
IdentityResolution
HBase
Hive
Map
Reduce
Cleansed
Files
Individual Household
Informatica Big Data Edition Cloudera Big Data Platform
Visualizations
Big Data Analytics
Extract Load & Transform
Data Quality –Cleaning, Identity Resolution
Customer
Data
Partner
Files
Prospects
Enrichment
Inputs
CRM
Solicitation
History
Weblogs
PredictiveAnalytics
Visualizations
Consumption
Datameer
Predictive Analytics
16#strataconf #hadoopworld
Product Value Proposition
Cloudera Enterprise Data Hub
– Strong commitment to community driven, open source platform
– Security & Governance : Authentication; Authorization; Auditing; Data lineage
& Data discovery
– Strong presence in Financial Services Industry
Informatica
– Leverage existing skills; Visual development; Increased Productivity
– Prebuilt connectors: RDBMS, VSAM, OLAP, Salesforce, Social Media (Facebook,
LinkedIn, Twitter)
– Data Profiling & data quality on Hadoop: Identify data issues earlier; score
carding; cleanse, match and standardize on Hadoop; prebuilt data quality
rules, data masking
– Natural Language Processing (NLP): Mine semi-structured & unstructured data
(emails, twitter feeds, Facebook posts)
17#strataconf #hadoopworld
Product Value Proposition
Datameer
– Big Data Analytics: Analyze structured & semi-structured; data mining;
built for Hadoop
– Easy, Excel-like interface eliminates need to learn new programming
languages… Would appeal to more broad analytic user community
across the enterprise.
– Create, execute & test data pipelines natively on Hadoop
– Rapidly combine and enrich existing data sets; what-if scenarios
– Pre-stage data for Campaigns / Reporting
18#strataconf #hadoopworld
POC Highlights
 10 Node Cluster running CDH 5.0
 718 Million Rows of data from
1200+ input files
 Seven use cases with increasing
complexity
 30 TB of Data
19#strataconf #hadoopworld
POC Timeline
Task Participants Duration
Cloudera Install, Setup and
cluster certification for
Enterprise Data Hub, Security
review
Cloudera
Core Team
2 weeks
Installation, Data preparation,
ingestion, using PowerCenter
BigData Edition
Informatica /
TA Core Team
2 weeks
Profiling, Data cleansing and
standardization, aggregation,
de-duping, customer
identification using Data
Quality and Identity
Resolution
Informatica
TA Core Team
4 weeks
Visualizations, models, pmml,
campaign files using Big Data
Analytics using Datameer
Datameer
TA Core Team
2 weeks
SAS integration TA Core Team 1 week
Wrap up 1 week
20#strataconf #hadoopworld
POC Infrastructure
21#strataconf #hadoopworld
Infrastucuture
• Cloudera
• Enterprise Data Hub (5.0.1)
• Hive (0.12), Hue (3.5), Impala, Hbase (0.96),
Pig (0.12), Spark (0.9)
• 10 node cluster, 6 data nodes
• RHEL 6.5, 20 cores, 128 GB RAM
• 80TB usable space
• Informatica Big Data Governance Edition
• BDE 9.6.1
• Identity Match (MapIR)
• Datameer
22#strataconf #hadoopworld
Team
• Business (Sponsor, Data, Analytics, Campaign)
• PM
• BA
• IT (6)
• Operations (3 part time)
• Support staff
• Legal
• Procurement
• Security
23#strataconf #hadoopworld
Why Hbase?
• Faster update processing
• Leverage index to perform seek for upsert
processing instead of full scan against hdfs.
Huge gains in update processing times.
• Faster Analytics
• Can leverage Hbase index for improved
query performance
24#strataconf #hadoopworld
Informatica Developer (Big Data Edition)
25#strataconf #hadoopworld
Data Profiling on Hadoop
26#strataconf #hadoopworld
Score carding
27#strataconf #hadoopworld
Datameer – Workbooks
28#strataconf #hadoopworld
Datameer - Visualizations
29#strataconf #hadoopworld
Lessons Learned
• Invest in a PoC
– Tie to business use cases to demonstrate value
• Partner with key vendors
– Technology is changing rapidly
• Small team with the right skillset
– Naturally innovative individuals
– Can wear multiple hats
• Evangelize & sell your idea
– Partner with business
– Socialize the platform & the vision
• Big Data requires Data Governance
– Establish tools & processes to support data governance
– Data Stewards: Profile, validate, catalog, metadata creation, lineage
– Managed & Curated Data
• Align with larger enterprise strategy
30#strataconf #hadoopworld
Questions
31#strataconf #hadoopworld
Stephen Lloyd
stephen.lloyd@transamerica.com
http://about.me/stephenlloyd
David Beaudoin
david.beaudoin@transamerica.com
Vishal Bamba
vishal.bamba@transamerica.com
Twitter: @vishalbamba

More Related Content

PPTX
Strata NYC 2015 - Transamerica and INFA v1
PPTX
Detection of Anomalous Behavior
PPTX
Perspectives on Ethical Big Data Governance
PPTX
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
PDF
Why You Need to Govern Big Data
PPTX
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
PDF
Building Your Enterprise Data Marketplace with DMX-h
PDF
MPS Enterprise Content Management Solutions
Strata NYC 2015 - Transamerica and INFA v1
Detection of Anomalous Behavior
Perspectives on Ethical Big Data Governance
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Why You Need to Govern Big Data
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Building Your Enterprise Data Marketplace with DMX-h
MPS Enterprise Content Management Solutions

What's hot (20)

PPTX
New Innovations in Information Management for Big Data - Smarter Business 2013
PPTX
Data governance datalakes_multitenancy
PDF
Slides: Relational to NoSQL Migration
PDF
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
PDF
Accelerating Time to Success for Your Big Data Initiatives
PDF
The Merger is Happening, Now What Do We Do?
PDF
Unlocking Greater Insights with Integrated Data Quality for Collibra
PPTX
2. Getvisibility. Innovative data governance, control & oversight
PDF
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
PDF
Top 3 Hot Data Security And Privacy Technologies
PDF
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
PDF
Customer Case Studies of Self-Service Big Data Analytics
PDF
Accelerating Fast Data Strategy with Data Virtualization
PPT
Choosing the Right Big Data Architecture for your Business
PDF
What is big data - Architectures and Practical Use Cases
PDF
Deliver Data Governance with a “Yes”
PPT
NLB Analytics Overview
PDF
Webinar - Big Data: Power to the User
PDF
Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka
PPTX
Impact of BIG Data on MDM
New Innovations in Information Management for Big Data - Smarter Business 2013
Data governance datalakes_multitenancy
Slides: Relational to NoSQL Migration
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
Accelerating Time to Success for Your Big Data Initiatives
The Merger is Happening, Now What Do We Do?
Unlocking Greater Insights with Integrated Data Quality for Collibra
2. Getvisibility. Innovative data governance, control & oversight
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
Top 3 Hot Data Security And Privacy Technologies
Slides: Using Analytics and Fraud Management To Increase Revenues and Differe...
Customer Case Studies of Self-Service Big Data Analytics
Accelerating Fast Data Strategy with Data Virtualization
Choosing the Right Big Data Architecture for your Business
What is big data - Architectures and Practical Use Cases
Deliver Data Governance with a “Yes”
NLB Analytics Overview
Webinar - Big Data: Power to the User
Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka
Impact of BIG Data on MDM
Ad

Similar to Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1) (20)

PDF
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
PDF
When and How Data Lakes Fit into a Modern Data Architecture
PDF
BAR360 open data platform presentation at DAMA, Sydney
PPTX
Skillwise Big Data part 2
PPTX
Skilwise Big data
PDF
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
PPTX
Big Data's Impact on the Enterprise
PDF
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
PDF
Modern Data Challenges require Modern Graph Technology
PDF
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
PPT
Using Big Data and AI for Customer Analytics
PPTX
Pentaho Analytics on MongoDB
PDF
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
PPTX
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
PDF
The Connected Consumer – Real-time Customer 360
PDF
Accelerate Self-Service Analytics with Data Virtualization and Visualization
PPTX
Klarna Tech Talk - Mind the Data!
PPTX
KNIME Meetup 2016-04-16
PPTX
Hadoop 2015: what we larned -Think Big, A Teradata Company
PDF
Five Critical Success Factors for Big Data and Traditional BI
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...
When and How Data Lakes Fit into a Modern Data Architecture
BAR360 open data platform presentation at DAMA, Sydney
Skillwise Big Data part 2
Skilwise Big data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Big Data's Impact on the Enterprise
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Modern Data Challenges require Modern Graph Technology
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Using Big Data and AI for Customer Analytics
Pentaho Analytics on MongoDB
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
Learn How Financial Services Organizations Can Use Big Data to Mitigate Risks
The Connected Consumer – Real-time Customer 360
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Klarna Tech Talk - Mind the Data!
KNIME Meetup 2016-04-16
Hadoop 2015: what we larned -Think Big, A Teradata Company
Five Critical Success Factors for Big Data and Traditional BI
Ad

Customer Intelligence_ Harnessing Elephants at Transamerica Presentation (1)

  • 1. Customer Intelligence – Harnessing Elephants at Transamerica Stephen Lloyd BI Architect, Transamerica Life & Protection #strataconf #hadoopworld Vishal Bamba Chief Architect Transamerica Life & Protection Dave Beaudoin AVP Transamerica Investments & Retirement
  • 2. 2#strataconf #hadoopworld Agenda • About Transamerica • Current State and Data Architecture • Use Cases / Learnings • POC Highlights • Solution Design • Lessons Learned • Questions / Discussion
  • 3. 3#strataconf #hadoopworld Transamerica Org Structure • Investments & Retirement – Retirement and Benefit Plan services to employers/employees – Mutual funds and variable annuities – Mission to help people save and invest wisely to secure their retirement dreams, the I&R business unit serves more than 3 million retirement plan participants across the entire spectrum of defined benefit and defined contribution plans. • Life & Protection – Term and Perm insurance products – Medicare supplement, long term care, accidental death, final expense – Mission to protect what you’ve built, secure what’s next.
  • 4. 4#strataconf #hadoopworld Data Architecture • Rich data environment across organizational business units, comprised of many source systems across various platforms • A consistent enterprise view of data across business units is required
  • 6. 6#strataconf #hadoopworld Data Architecture As in many industries, we are focused on leveraging technology to build data-driven customer relationships. How can the current data architecture support this strategic direction?
  • 7. 7#strataconf #hadoopworld Data Architecture Enterprise Data Management Crossroads Another traditional warehouse project? or Enterprise data hub/lake/ocean with new technologies?
  • 9. 9#strataconf #hadoopworld • 360 degree view of consumers for marketing, planning, and analytics • Discover and mine relationships • Create highly targeted and individualized marketing programs The Vision Enterprise Marketing and Analytics Platform The How • Co-location, master data management, custom data quality and cleansing rules and more • Which allow integration of data from across and outside of the company to create 360 view of consumers
  • 10. 10#strataconf #hadoopworld Use Cases Complete a Data Append Process Append Data Elements from Enrichment sources to Customer/Prospect records Tests: Data Integration, Processing Power
  • 11. 11#strataconf #hadoopworld Use Cases Score Prospects with a Predictive Model Score prospects with the current model to predict likelihood to respond to a direct marketing offer Tests: Processing Power, Predictive Modeling
  • 12. 12#strataconf #hadoopworld Use Cases Measure online contracts/sales, create visitor personas, create simple attribution Match weblog visitor data to Salesforce leads and to policy holders to generate a sales pipeline. Join to enrichment sources to create personas. Join to Direct Mail solicitation history and test for correlation. Tests: Data Integration, Processing Power, Analytics
  • 13. 13#strataconf #hadoopworld Web Log Salesforce Epsilon Direct Mail History POC Success Join weblogs > CRM > Demographics > mailings Enabled: • Connected online activity to offline activity • Created time series of events • Created demographic profile TOTAL ANALYSIS TIME = 2 hours 120k visits 4,252 ppl 14% in niche 35% income > $100k 83 sent DM after visit **Use case figures are for illustration purposes only**
  • 14. 14#strataconf #hadoopworld Summary of Key Benefits  Provides a single platform to house key customer and prospect data sources,  Establishes persistent keys across previously disparate data sources ,  Provides for rapid intake of new data sources (structured and unstructured),  Eliminates today’s data intake and append bottleneck,  Empowers Analysts to explore all data elements,  Increases processing power for statistical analysis and,  Improves recruiting and retention of Data Engineers and Data Scientists
  • 15. 15#strataconf #hadoopworld Solution Architecture PowerCenterBigDataEdition HDFSDataQualityBigDataEdition IdentityResolution HBase Hive Map Reduce Cleansed Files Individual Household Informatica Big Data Edition Cloudera Big Data Platform Visualizations Big Data Analytics Extract Load & Transform Data Quality –Cleaning, Identity Resolution Customer Data Partner Files Prospects Enrichment Inputs CRM Solicitation History Weblogs PredictiveAnalytics Visualizations Consumption Datameer Predictive Analytics
  • 16. 16#strataconf #hadoopworld Product Value Proposition Cloudera Enterprise Data Hub – Strong commitment to community driven, open source platform – Security & Governance : Authentication; Authorization; Auditing; Data lineage & Data discovery – Strong presence in Financial Services Industry Informatica – Leverage existing skills; Visual development; Increased Productivity – Prebuilt connectors: RDBMS, VSAM, OLAP, Salesforce, Social Media (Facebook, LinkedIn, Twitter) – Data Profiling & data quality on Hadoop: Identify data issues earlier; score carding; cleanse, match and standardize on Hadoop; prebuilt data quality rules, data masking – Natural Language Processing (NLP): Mine semi-structured & unstructured data (emails, twitter feeds, Facebook posts)
  • 17. 17#strataconf #hadoopworld Product Value Proposition Datameer – Big Data Analytics: Analyze structured & semi-structured; data mining; built for Hadoop – Easy, Excel-like interface eliminates need to learn new programming languages… Would appeal to more broad analytic user community across the enterprise. – Create, execute & test data pipelines natively on Hadoop – Rapidly combine and enrich existing data sets; what-if scenarios – Pre-stage data for Campaigns / Reporting
  • 18. 18#strataconf #hadoopworld POC Highlights  10 Node Cluster running CDH 5.0  718 Million Rows of data from 1200+ input files  Seven use cases with increasing complexity  30 TB of Data
  • 19. 19#strataconf #hadoopworld POC Timeline Task Participants Duration Cloudera Install, Setup and cluster certification for Enterprise Data Hub, Security review Cloudera Core Team 2 weeks Installation, Data preparation, ingestion, using PowerCenter BigData Edition Informatica / TA Core Team 2 weeks Profiling, Data cleansing and standardization, aggregation, de-duping, customer identification using Data Quality and Identity Resolution Informatica TA Core Team 4 weeks Visualizations, models, pmml, campaign files using Big Data Analytics using Datameer Datameer TA Core Team 2 weeks SAS integration TA Core Team 1 week Wrap up 1 week
  • 21. 21#strataconf #hadoopworld Infrastucuture • Cloudera • Enterprise Data Hub (5.0.1) • Hive (0.12), Hue (3.5), Impala, Hbase (0.96), Pig (0.12), Spark (0.9) • 10 node cluster, 6 data nodes • RHEL 6.5, 20 cores, 128 GB RAM • 80TB usable space • Informatica Big Data Governance Edition • BDE 9.6.1 • Identity Match (MapIR) • Datameer
  • 22. 22#strataconf #hadoopworld Team • Business (Sponsor, Data, Analytics, Campaign) • PM • BA • IT (6) • Operations (3 part time) • Support staff • Legal • Procurement • Security
  • 23. 23#strataconf #hadoopworld Why Hbase? • Faster update processing • Leverage index to perform seek for upsert processing instead of full scan against hdfs. Huge gains in update processing times. • Faster Analytics • Can leverage Hbase index for improved query performance
  • 29. 29#strataconf #hadoopworld Lessons Learned • Invest in a PoC – Tie to business use cases to demonstrate value • Partner with key vendors – Technology is changing rapidly • Small team with the right skillset – Naturally innovative individuals – Can wear multiple hats • Evangelize & sell your idea – Partner with business – Socialize the platform & the vision • Big Data requires Data Governance – Establish tools & processes to support data governance – Data Stewards: Profile, validate, catalog, metadata creation, lineage – Managed & Curated Data • Align with larger enterprise strategy

Editor's Notes

  • #11: Get rid of append