SlideShare a Scribd company logo
How Big Data ISVs get marketing data
into lakes
Sumit Sarkar
Chief Data Evangelist
Progress DataDirect
Gary Angel
Advisory Digital Analytics
Center of Excellence
Principal
EY
© 2015 Progress Software Corporation. All rights reserved.2
Audio Bridge Options & Question Submission
How Big Data ISVs get marketing data
into lakes
Sumit Sarkar
Chief Data Evangelist
Progress DataDirect
Gary Angel
Advisory Digital Analytics
Center of Excellence
Principal
EY
© 2015 Progress Software Corporation. All rights reserved.4
Agenda
 What is a Marketing Data Lake?
 Industry trends around accessing marketing
data in SaaS applications
 How to ingest data with Apache Sqoop and
Apache Falcon directly from SaaS applications
 How big data vendors can embed SaaS
connectivity
© 2015 Progress Software Corporation. All rights reserved.5
What is a Marketing Data Lake?
© 2015 Progress Software Corporation. All rights reserved.6
A data lake is a large-scale storage repository and
processing engine. A data lake provides "massive
storage for any kind of data, enormous processing
power and the ability to handle virtually limitless
concurrent tasks or jobs”
- SAS Institute
What is a Marketing Data Lake?
© 2015 Progress Software Corporation. All rights reserved.7
Benefits of a Marketing Data Lake?
Some of the benefits of a data lake include:
 Store data in all shapes and sizes
 Flexible analytics with “schema on read”
 Query data using SQL or big data
programming frameworks
 Eliminate data silos
© 2015 Progress Software Corporation. All rights reserved.8
Why Marketing Data?
 CMOs will outspend CIOs on technology by 2017
(Gartner)
 Oracle spent $3B on a martech aquisition spree to
gain CMO mindshare.
 Expect more collaboration between CMO and CIO
(CIO.com)
 Modern Marketing Data Warehouse Webinar ~500
registrations (Progress)
© 2015 Progress Software Corporation. All rights reserved.9
Industry trends around accessing
marketing data in SaaS applications
© 2015 Progress Software Corporation. All rights reserved.10
It’s easy to forget that it’s still about solving real business problems.
Relevant data
Transaction / behavior history
Manage
Data
Perform
Analytics
Drive
Decisions
Insights
continuous feedback loop
Appropriate
data sources
Answers to
business questions
Strategy (Thinking) Moves Right to Left
Implementation Moves Left to Right
Before you think data, think decisions!
© 2015 Progress Software Corporation. All rights reserved.11
Our marketing data is almost all in the cloud
CRM
Web
Behavior
Mobile
Behavior
Search
Buys
Display
Buys
Owned
Social
Public SocialMeta-Data
And it’s almost all complex, stream data – which means APIs that only
give aggregations aren’t too useful
© 2015 Progress Software Corporation. All rights reserved.12
Detail is important because this digital data is true big data
The
relationship
between
events is
critical
© 2015 Progress Software Corporation. All rights reserved.13
We’re almost never solving for one problem with a big data system
Reporting Analytics
Summarized
Data
Segmented
Data
Detail
Data
We can’t just aggregate / We can’t not aggregate
Dashboarding
Campaign
Optimization
Customer
Drill-down
Attribution, CLTV,
Experience,
Personalization
Targeting
Forecasting
© 2015 Progress Software Corporation. All rights reserved.14
Segmentation is a one important technique to aggregate and join
Customer
segmentation
Visit type
identification
RFM models
KPDs and
metrics
Measu
remen
t
Found
ation
Customers
v. prospects
Owned
products
Persona
Product
focused
Shopping
focused
Social
focused
Customer
service
Measurement
of success
specific to
each segment
and visit
Recency and
frequency for
every
segment and
visit type
Additional
metrics that
help identify
drivers of
success
Segmentation allows for effective aggregation of the meaning and
outcome of streamed event data:
Measurement
foundation
© 2015 Progress Software Corporation. All rights reserved.15
End-to-End Strategies
ReportCubeParkFull Detail
ReportParkFull Detail
 Most organizations do
some combination of at
least 1 & 2
 Direct to Detail (2) has
many advantages if it can
be made performant (more
flexible reporting and much
less maintenance)
 Semi-Detail (3) is designed
to capture most of the
advantages of (2) when (2)
isn’t performantReport
Semi-
Detail
ParkFull Detail
1
2
3
© 2015 Progress Software Corporation. All rights reserved.16
How to ingest data with Apache
Sqoop and Apache Falcon directly
from SaaS applications
© 2015 Progress Software Corporation. All rights reserved.17
What is Apache Sqoop?
Apache Sqoop
Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache
Hadoop and structured datastores such as relational databases.
Sqoop successfully graduated from the Incubator in March of 2012 and is now a Top-Level
Apache project
http://sqoop.apache.org/
© 2015 Progress Software Corporation. All rights reserved.18
What is Apache Falcon?
Apache Falcon
Falcon is a feed processing and feed management system aimed at making it easier for
end consumers to onboard their feed processing and feed management on hadoop
clusters.
https://falcon.apache.org/
Note: Falcon uses Sqoop for import/export operation. Sqoop will require appropriate
database driver to connect to the relational database. Please refer to the Sqoop
documentation for any Sqoop related question. Please make sure the database driver
jar is copied into oozie share lib for Sqoop.
© 2015 Progress Software Corporation. All rights reserved.19
Data in SaaS Applications is Siloed, Protected by Proprietary APIs Designed
for Process Integration, not Data Integration
© 2015 Progress Software Corporation. All rights reserved.20
How to ingest data directly from SaaS applications
© 2015 Progress Software Corporation. All rights reserved.21
JDBC access to SaaS data
Progress DataDirect
JDBC Connector
Schema Manager
Apache Sqoop
Salesforce.com
Schema
User Defined
Schema
Driver uses
 SOAP API
 Bulk API
 Metadata API
© 2015 Progress Software Corporation. All rights reserved.22
Geek Speak
$ sqoop help import
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]
Common arguments:
--connect <jdbc-uri> Specify JDBC connect string
--connect-manager <jdbc-uri> Specify connection manager class to use
--driver <class-name> Manually specify JDBC driver class to use
--hadoop-mapred-home <dir>+ Override $HADOOP_MAPRED_HOME
--help Print usage instructions
-P Read password from console
--password <password> Set authentication password
--username <username> Set authentication username
--verbose Print more information while working
--hadoop-home <dir>+ Deprecated. Override $HADOOP_HOME
© 2015 Progress Software Corporation. All rights reserved.23
Why ISVs are turning to a single interface for SaaS?
 Get JDBC interface on top of any API
Data Source API
Eloqua Web Services API (REST/SOAP)
Bulk and non-Bulk APIs
No query language
Oracle Service Cloud Web Services APIs (REST/SOAP)
ROQL
Google Analytics Hypercube (query limits of 10 metrics grouped by
max of 7 dimensions)
Veeva CRM SOAP, BULK, Metadata APIs
SOQL
© 2015 Progress Software Corporation. All rights reserved.24
As the Market Switches from ETL to ELT,
Data Access is critical
ETLELT
Extract
Transform
Load View
Operational Systems Staging Area Data Warehouse Analytics Apps
Operational Systems
Extract &
Load
Big Data Warehouse
Transform
& View
Analytics, Data Prep,
and even traditional DW
© 2015 Progress Software Corporation. All rights reserved.25
How big data vendors are embeding
SaaS connectivity
© 2015 Progress Software Corporation. All rights reserved.26
Progress DataDirect
Embed Sales & Marketing Connectors into the Data Access Layer
© 2015 Progress Software Corporation. All rights reserved.27
Ingest data across 200+ data sources (beyond marketing data sources)
Big Data/NoSQL
 Apache Hadoop Hive
 Cloudera
 Hortonworks
 Pivotal HD
 MapR
 EMR
 Pivotal HAWQ
 Cloudera Impala
 MongoDB
 Spark SQL
 Cassandra
 SAP HANA
Data Warehouses
 Amazon Redshift
 SAP Sybase IQ
 Teradata
 Pivotal Greenplum
Relational
 Oracle DB
 Microsoft SQL Server
 IBM DB2
 MySQL
 PostgreSQL
 IBM Informix
 SAP Sybase
 Pervasive SQL
 Progress OpenEdge
 Progress Rollbase
SaaS/Cloud
 Salesforce.com
 Database.com
 FinancialForce
 Veeva CRM
 ServiceMAX
 Any Force.com App
 Hubspot
 Marketo
 Microsoft Dynamics CRM
 Microsoft SQL Azure
 Oracle Eloqua
 Oracle Service Cloud
 Google Analytics
EDI/XML/Text
 EDIFACT
 EDIG@S
 EANCOM
 X12
 IATA
 Healthcare EDI: X12, HIPAA,
ICD-10, HL7
 Custom EDI
 Flat files: CSV, TSV, dBase,
Clipper, Foxpro, Paradox
 Text Files
Any
 SDK
 SequeLink Socket Server
 Customer Engineering
© 2015 Progress Software Corporation. All rights reserved.28
Single API for data lake ingestion from SaaS sources
 Ingest data against a single API (JDBC)
 Get a single dedicated partner
 Connect to unlimited data with a single API
 Get unlimited support
How Big Data ISVs get marketing data
into lakes
Sumit Sarkar
Chief Data Evangelist
Progress DataDirect
Gary Angel
Advisory Digital Analytics
Center of Excellence
Principal
EY

More Related Content

What's hot (20)

PPTX
Lightning Connect: Lessons Learned
Sumit Sarkar
 
PDF
Oracle Data Integration - Overview
Jeffrey T. Pollock
 
PPTX
Data APIs Don't Discriminate [API World Stage Talk]
Sumit Sarkar
 
PPTX
OData Hackathon Challenge
Sumit Sarkar
 
PPTX
OData and the future of business objects universes
Sumit Sarkar
 
PPTX
Firewall friendly pipeline for secure data access
Sumit Sarkar
 
PPTX
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
Rittman Analytics
 
PDF
Flash session -goldengate--lht1053-lon
Jeffrey T. Pollock
 
PDF
The Power Of Snowflake for SAP BusinessObjects
Wiiisdom
 
PDF
Tapping into the Big Data Reservoir (CON7934)
Jeffrey T. Pollock
 
PDF
Moving OBIEE to Oracle Analytics Cloud
Edelweiss Kammermann
 
PPTX
SQL vs SOQL for Salesforce Analytics
Sumit Sarkar
 
PPTX
Expand a Data warehouse with Hadoop and Big Data
jdijcks
 
PPTX
Salesforce External Objects for Big Data
Sumit Sarkar
 
PDF
Pivotal Big Data Suite: A Technical Overview
VMware Tanzu
 
PPTX
Klarna Tech Talk - Mind the Data!
Jeffrey T. Pollock
 
PPTX
How to Capitalize on Big Data with Oracle Analytics Cloud
Perficient, Inc.
 
PPTX
Hortonworks Oracle Big Data Integration
Hortonworks
 
PDF
Oracle Enterprise Metadata Management
Andrey Akulov
 
PPTX
Talend MDM
Talend
 
Lightning Connect: Lessons Learned
Sumit Sarkar
 
Oracle Data Integration - Overview
Jeffrey T. Pollock
 
Data APIs Don't Discriminate [API World Stage Talk]
Sumit Sarkar
 
OData Hackathon Challenge
Sumit Sarkar
 
OData and the future of business objects universes
Sumit Sarkar
 
Firewall friendly pipeline for secure data access
Sumit Sarkar
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
Rittman Analytics
 
Flash session -goldengate--lht1053-lon
Jeffrey T. Pollock
 
The Power Of Snowflake for SAP BusinessObjects
Wiiisdom
 
Tapping into the Big Data Reservoir (CON7934)
Jeffrey T. Pollock
 
Moving OBIEE to Oracle Analytics Cloud
Edelweiss Kammermann
 
SQL vs SOQL for Salesforce Analytics
Sumit Sarkar
 
Expand a Data warehouse with Hadoop and Big Data
jdijcks
 
Salesforce External Objects for Big Data
Sumit Sarkar
 
Pivotal Big Data Suite: A Technical Overview
VMware Tanzu
 
Klarna Tech Talk - Mind the Data!
Jeffrey T. Pollock
 
How to Capitalize on Big Data with Oracle Analytics Cloud
Perficient, Inc.
 
Hortonworks Oracle Big Data Integration
Hortonworks
 
Oracle Enterprise Metadata Management
Andrey Akulov
 
Talend MDM
Talend
 

Viewers also liked (11)

PDF
The IBM Platform Cloud Service
inside-BigData.com
 
PDF
Webinar: Is Spark Hadoop's Friend or Foe?
Zaloni
 
PDF
IBM Becoming a Cloud Service Provider White Paper
Mauricio Godoy
 
PPTX
Analyze billions of records on Salesforce App Cloud with BigObject
Salesforce Developers
 
PDF
Understanding Metadata: Why it's essential to your big data solution and how ...
Zaloni
 
PDF
Creating a Modern Data Architecture
Zaloni
 
PDF
CRM Maturity Assessment
Demand Metric
 
PPTX
Ensemble modeling overview, Big Data meetup
OptimalBI Limited
 
PDF
Oracle Cloud Reference Architecture
Bob Rhubart
 
PPTX
2016 Stackies Awards: 41 Marketing Technology Stacks
Scott Brinker
 
PPT
Marketing data analytics
Canvass All-in-one Marketing Software
 
The IBM Platform Cloud Service
inside-BigData.com
 
Webinar: Is Spark Hadoop's Friend or Foe?
Zaloni
 
IBM Becoming a Cloud Service Provider White Paper
Mauricio Godoy
 
Analyze billions of records on Salesforce App Cloud with BigObject
Salesforce Developers
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Zaloni
 
Creating a Modern Data Architecture
Zaloni
 
CRM Maturity Assessment
Demand Metric
 
Ensemble modeling overview, Big Data meetup
OptimalBI Limited
 
Oracle Cloud Reference Architecture
Bob Rhubart
 
2016 Stackies Awards: 41 Marketing Technology Stacks
Scott Brinker
 
Marketing data analytics
Canvass All-in-one Marketing Software
 
Ad

Similar to Building a marketing data lake (20)

PDF
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
In-Memory Computing Summit
 
PDF
Automate Hadoop Jobs with Real World Business Impact
CA Technologies
 
PDF
Geekier Analytics for SaaS data
Progress
 
PDF
Self-service data discovery for business users and analysts using SAP Lumira
SAP Analytics
 
PDF
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
 
PDF
Oracle Data Integration CON9737 at OpenWorld
Jeffrey T. Pollock
 
PDF
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Rittman Analytics
 
PPTX
SAP Database Platform, ASE & IoT Roadmap
Paul Marriott
 
PDF
Deploy s4 hana
Divya Goel
 
PPTX
3.1 oracle salonika
technology_forum
 
PDF
Ciber SAP Tech Ed 2013 takeaway presentation
svleuken
 
PPTX
Pivotal Cloud Platform Roadshow Keynote
cornelia davis
 
PPTX
IBM Smarter Analytics
Adrian Turcu
 
PDF
SAP Strategy & Innovation Update - Feb 2016 APJ
Paul Marriott
 
PDF
01 sap inside_track_sapintegrationstrategy
shetkars
 
PPTX
Big data tim
T Weir
 
PDF
Rolta iot analytics 17 mar 2015
Ron Elias
 
PDF
SAP API Management sap insider webinar intelligent business operations netw...
Darren Crowder
 
PPTX
Applications Mobiles et Analytiques avec SAP HANA Cloud Platform
Laurent Rieu
 
PPTX
Data Integration for Both Self-Service Analytics and IT Users
Senturus
 
IMCSummit 2015 - Day 1 IT Business Track - In-memory computing with SAP HANA:...
In-Memory Computing Summit
 
Automate Hadoop Jobs with Real World Business Impact
CA Technologies
 
Geekier Analytics for SaaS data
Progress
 
Self-service data discovery for business users and analysts using SAP Lumira
SAP Analytics
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
 
Oracle Data Integration CON9737 at OpenWorld
Jeffrey T. Pollock
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Rittman Analytics
 
SAP Database Platform, ASE & IoT Roadmap
Paul Marriott
 
Deploy s4 hana
Divya Goel
 
3.1 oracle salonika
technology_forum
 
Ciber SAP Tech Ed 2013 takeaway presentation
svleuken
 
Pivotal Cloud Platform Roadshow Keynote
cornelia davis
 
IBM Smarter Analytics
Adrian Turcu
 
SAP Strategy & Innovation Update - Feb 2016 APJ
Paul Marriott
 
01 sap inside_track_sapintegrationstrategy
shetkars
 
Big data tim
T Weir
 
Rolta iot analytics 17 mar 2015
Ron Elias
 
SAP API Management sap insider webinar intelligent business operations netw...
Darren Crowder
 
Applications Mobiles et Analytiques avec SAP HANA Cloud Platform
Laurent Rieu
 
Data Integration for Both Self-Service Analytics and IT Users
Senturus
 
Ad

More from Sumit Sarkar (6)

PPTX
What serverless means for enterprise apps
Sumit Sarkar
 
PPTX
Digitize Enterprise Assets for Mobility
Sumit Sarkar
 
PPTX
Welcome to the Era of Open Analytics
Sumit Sarkar
 
PPTX
Salesforce Connect External Object Reports
Sumit Sarkar
 
PPTX
Webinar on MongoDB BI Connectors
Sumit Sarkar
 
PPTX
Ibis 2015 final template
Sumit Sarkar
 
What serverless means for enterprise apps
Sumit Sarkar
 
Digitize Enterprise Assets for Mobility
Sumit Sarkar
 
Welcome to the Era of Open Analytics
Sumit Sarkar
 
Salesforce Connect External Object Reports
Sumit Sarkar
 
Webinar on MongoDB BI Connectors
Sumit Sarkar
 
Ibis 2015 final template
Sumit Sarkar
 

Recently uploaded (20)

PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 

Building a marketing data lake

  • 1. How Big Data ISVs get marketing data into lakes Sumit Sarkar Chief Data Evangelist Progress DataDirect Gary Angel Advisory Digital Analytics Center of Excellence Principal EY
  • 2. © 2015 Progress Software Corporation. All rights reserved.2 Audio Bridge Options & Question Submission
  • 3. How Big Data ISVs get marketing data into lakes Sumit Sarkar Chief Data Evangelist Progress DataDirect Gary Angel Advisory Digital Analytics Center of Excellence Principal EY
  • 4. © 2015 Progress Software Corporation. All rights reserved.4 Agenda  What is a Marketing Data Lake?  Industry trends around accessing marketing data in SaaS applications  How to ingest data with Apache Sqoop and Apache Falcon directly from SaaS applications  How big data vendors can embed SaaS connectivity
  • 5. © 2015 Progress Software Corporation. All rights reserved.5 What is a Marketing Data Lake?
  • 6. © 2015 Progress Software Corporation. All rights reserved.6 A data lake is a large-scale storage repository and processing engine. A data lake provides "massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs” - SAS Institute What is a Marketing Data Lake?
  • 7. © 2015 Progress Software Corporation. All rights reserved.7 Benefits of a Marketing Data Lake? Some of the benefits of a data lake include:  Store data in all shapes and sizes  Flexible analytics with “schema on read”  Query data using SQL or big data programming frameworks  Eliminate data silos
  • 8. © 2015 Progress Software Corporation. All rights reserved.8 Why Marketing Data?  CMOs will outspend CIOs on technology by 2017 (Gartner)  Oracle spent $3B on a martech aquisition spree to gain CMO mindshare.  Expect more collaboration between CMO and CIO (CIO.com)  Modern Marketing Data Warehouse Webinar ~500 registrations (Progress)
  • 9. © 2015 Progress Software Corporation. All rights reserved.9 Industry trends around accessing marketing data in SaaS applications
  • 10. © 2015 Progress Software Corporation. All rights reserved.10 It’s easy to forget that it’s still about solving real business problems. Relevant data Transaction / behavior history Manage Data Perform Analytics Drive Decisions Insights continuous feedback loop Appropriate data sources Answers to business questions Strategy (Thinking) Moves Right to Left Implementation Moves Left to Right Before you think data, think decisions!
  • 11. © 2015 Progress Software Corporation. All rights reserved.11 Our marketing data is almost all in the cloud CRM Web Behavior Mobile Behavior Search Buys Display Buys Owned Social Public SocialMeta-Data And it’s almost all complex, stream data – which means APIs that only give aggregations aren’t too useful
  • 12. © 2015 Progress Software Corporation. All rights reserved.12 Detail is important because this digital data is true big data The relationship between events is critical
  • 13. © 2015 Progress Software Corporation. All rights reserved.13 We’re almost never solving for one problem with a big data system Reporting Analytics Summarized Data Segmented Data Detail Data We can’t just aggregate / We can’t not aggregate Dashboarding Campaign Optimization Customer Drill-down Attribution, CLTV, Experience, Personalization Targeting Forecasting
  • 14. © 2015 Progress Software Corporation. All rights reserved.14 Segmentation is a one important technique to aggregate and join Customer segmentation Visit type identification RFM models KPDs and metrics Measu remen t Found ation Customers v. prospects Owned products Persona Product focused Shopping focused Social focused Customer service Measurement of success specific to each segment and visit Recency and frequency for every segment and visit type Additional metrics that help identify drivers of success Segmentation allows for effective aggregation of the meaning and outcome of streamed event data: Measurement foundation
  • 15. © 2015 Progress Software Corporation. All rights reserved.15 End-to-End Strategies ReportCubeParkFull Detail ReportParkFull Detail  Most organizations do some combination of at least 1 & 2  Direct to Detail (2) has many advantages if it can be made performant (more flexible reporting and much less maintenance)  Semi-Detail (3) is designed to capture most of the advantages of (2) when (2) isn’t performantReport Semi- Detail ParkFull Detail 1 2 3
  • 16. © 2015 Progress Software Corporation. All rights reserved.16 How to ingest data with Apache Sqoop and Apache Falcon directly from SaaS applications
  • 17. © 2015 Progress Software Corporation. All rights reserved.17 What is Apache Sqoop? Apache Sqoop Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. Sqoop successfully graduated from the Incubator in March of 2012 and is now a Top-Level Apache project http://sqoop.apache.org/
  • 18. © 2015 Progress Software Corporation. All rights reserved.18 What is Apache Falcon? Apache Falcon Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters. https://falcon.apache.org/ Note: Falcon uses Sqoop for import/export operation. Sqoop will require appropriate database driver to connect to the relational database. Please refer to the Sqoop documentation for any Sqoop related question. Please make sure the database driver jar is copied into oozie share lib for Sqoop.
  • 19. © 2015 Progress Software Corporation. All rights reserved.19 Data in SaaS Applications is Siloed, Protected by Proprietary APIs Designed for Process Integration, not Data Integration
  • 20. © 2015 Progress Software Corporation. All rights reserved.20 How to ingest data directly from SaaS applications
  • 21. © 2015 Progress Software Corporation. All rights reserved.21 JDBC access to SaaS data Progress DataDirect JDBC Connector Schema Manager Apache Sqoop Salesforce.com Schema User Defined Schema Driver uses  SOAP API  Bulk API  Metadata API
  • 22. © 2015 Progress Software Corporation. All rights reserved.22 Geek Speak $ sqoop help import usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect <jdbc-uri> Specify JDBC connect string --connect-manager <jdbc-uri> Specify connection manager class to use --driver <class-name> Manually specify JDBC driver class to use --hadoop-mapred-home <dir>+ Override $HADOOP_MAPRED_HOME --help Print usage instructions -P Read password from console --password <password> Set authentication password --username <username> Set authentication username --verbose Print more information while working --hadoop-home <dir>+ Deprecated. Override $HADOOP_HOME
  • 23. © 2015 Progress Software Corporation. All rights reserved.23 Why ISVs are turning to a single interface for SaaS?  Get JDBC interface on top of any API Data Source API Eloqua Web Services API (REST/SOAP) Bulk and non-Bulk APIs No query language Oracle Service Cloud Web Services APIs (REST/SOAP) ROQL Google Analytics Hypercube (query limits of 10 metrics grouped by max of 7 dimensions) Veeva CRM SOAP, BULK, Metadata APIs SOQL
  • 24. © 2015 Progress Software Corporation. All rights reserved.24 As the Market Switches from ETL to ELT, Data Access is critical ETLELT Extract Transform Load View Operational Systems Staging Area Data Warehouse Analytics Apps Operational Systems Extract & Load Big Data Warehouse Transform & View Analytics, Data Prep, and even traditional DW
  • 25. © 2015 Progress Software Corporation. All rights reserved.25 How big data vendors are embeding SaaS connectivity
  • 26. © 2015 Progress Software Corporation. All rights reserved.26 Progress DataDirect Embed Sales & Marketing Connectors into the Data Access Layer
  • 27. © 2015 Progress Software Corporation. All rights reserved.27 Ingest data across 200+ data sources (beyond marketing data sources) Big Data/NoSQL  Apache Hadoop Hive  Cloudera  Hortonworks  Pivotal HD  MapR  EMR  Pivotal HAWQ  Cloudera Impala  MongoDB  Spark SQL  Cassandra  SAP HANA Data Warehouses  Amazon Redshift  SAP Sybase IQ  Teradata  Pivotal Greenplum Relational  Oracle DB  Microsoft SQL Server  IBM DB2  MySQL  PostgreSQL  IBM Informix  SAP Sybase  Pervasive SQL  Progress OpenEdge  Progress Rollbase SaaS/Cloud  Salesforce.com  Database.com  FinancialForce  Veeva CRM  ServiceMAX  Any Force.com App  Hubspot  Marketo  Microsoft Dynamics CRM  Microsoft SQL Azure  Oracle Eloqua  Oracle Service Cloud  Google Analytics EDI/XML/Text  EDIFACT  EDIG@S  EANCOM  X12  IATA  Healthcare EDI: X12, HIPAA, ICD-10, HL7  Custom EDI  Flat files: CSV, TSV, dBase, Clipper, Foxpro, Paradox  Text Files Any  SDK  SequeLink Socket Server  Customer Engineering
  • 28. © 2015 Progress Software Corporation. All rights reserved.28 Single API for data lake ingestion from SaaS sources  Ingest data against a single API (JDBC)  Get a single dedicated partner  Connect to unlimited data with a single API  Get unlimited support
  • 29. How Big Data ISVs get marketing data into lakes Sumit Sarkar Chief Data Evangelist Progress DataDirect Gary Angel Advisory Digital Analytics Center of Excellence Principal EY

Editor's Notes

  • #2: How Big Data ISVs get marketing data into lakes   Marketing data is driving significant new Big Data investments from CIO and CMO offices.  The latest Big Data trend is storing that data in lakes for analytics, providing massive storage for any type of data to be used for 360 customer views, predictive lead scoring, personalization, or sentiment analysis. However, marketing data is increasingly stored in the cloud creating a connectivity challenge.  Big Data vendors provide facilities to transfer core business data between relational database systems and Data Lakes, such as Apache Sqoop. But what about cloud data sources where existing Apache Sqoop connection managers do not work well with cloud SaaS APIs, each with a proprietary REST or SOAP API? The key to accelerating adoption of big data technology is providing easy access to disparate cloud data sources such as Salesforce, Oracle CX, Marketo, Eloqua, Google Analytics or Adobe Omniture. Competitive advantage then results from having embedded connectivity within your technology for data ingestion to an organization’s most important data, customer data.   Join this informative and entertaining webinar as we explore: What is a Marketing Data Lake? Industry trends around accessing marketing data in SaaS applications How to ingest data with Apache Sqoop and Apache Falcon directly from SaaS applications How big data vendors can embed SaaS connectivity Speaker(s): Sumit Sarkar, Data Connectivity Evangelist, Progress Software Gary Angel, Advisory Digital Analytics Center of Excellence Principle, Ernst & Young   Asset(s): Follow-up asset sent in email Mike Johnson’s blog: https://www.progress.com/blogs/are-you-ready-to-go-fishing-in-a-data-lake
  • #3: Give Attendees a closer look at the control panel and how they can participate. Join Audio: 2 ways to do so, 1) to use VoIP, click on “Mic & Speakers”, or 2) to use your telephone, click on “telephone” and dial-in using the numbers and information provided 2) All lines are muted for today’s webinar. We do plan to have a live Q&A session at the end of the presentations. However if you have a question at any time during this webinar, simply submit your questions via the “Question” section of the webinar interface located to the right of your screen – we will collect all questions through this “Question Window”. Final Note: we are recording today’s webinar and will posted to PartnerLink
  • #4: Why ISVs? Strata: big data vendors, data prep, data pipelines, data management, etc Data Lakes are part of the solution.
  • #7: Last webinar was around building a Marketing Data Warehouse. Data Warehouse is “Schema on Write” architecture and typically loaded with ETL tools Data Lakes are loaded with raw data (no “T”) and create the “Schema on Read” on business demand
  • #8: The kinds of data from which you can derive value are unlimited. You can store all types of structured and unstructured data in a data lake, from CRM data, to social media posts. You don’t have to have all the answers upfront. Simply store raw data—you can refine it as your understanding and insight improves. You have no limits on how you can query the data. You can use a variety of tools to gain insight into what the data means. You don’t create any more silos. You gain a democratized access with a single, unified view of data across the organization. http://info.zaloni.com/hubfs/Architecting_Data_Lakes_Zaloni.pdf By Ben Sharma and Alice LaPlante
  • #9: Source: http://www.cio.com/article/2825086/cio-role/is-the-cio-cmo-transition-of-power-becoming-a-reality.html
  • #11: http://info.zaloni.com/hubfs/Architecting_Data_Lakes_Zaloni.pdf By Ben Sharma and Alice LaPlante
  • #18: Traditionally positioned for RDBMS via JDBC. There are specialized connectors for sources such as MySQL or Postgres; and generic JDBC for any third party.
  • #19: Note: Falcon uses Sqoop for import/export operation. Sqoop will require appropriate database driver to connect to the relational database. Please refer to the Sqoop documentation for any Sqoop related question. Please make sure the database driver jar is copied into oozie share lib for Sqoop. Commercial data lake management solutions are available from many of Hadoop vendors (Cloudera Navigator), as well as standalone from companies such as Zaloni and Podium Data.
  • #23: bash-4.1$ sqoop import --connect "jdbc:datadirect:sforce:SecurityToken=3jZ0x4NcgClYDhxJqMa3c744://test.salesforce.com;[email protected];Password=informatica@123;DatabaseName=sandbox" --query 'SELECT TOP 10 t.* FROM Case as t WHERE $CONDITIONS' -m 1 --target-dir /sample/table/q50 --driver com.ddtek.jdbc.sforce.SForceDriver --verbose
  • #24: R&D challenges building SQL connectivity across cloud sources such as Marketo Not all SaaS APIs expose a standard query language. In those cases, the engineering team looks at each object individually. Each object may be exposed with a different API with unique rules for invoking, searching filtering, etc. It required a significant effort to provide a standard experience querying across the entire data model. Handling full join capabilities. In cases where the SaaS APIs do not support a query language with JOIN capability, the engineering team has to perform that operation. This requires a translation from SQL to efficiently call Marketo APIs to return the minimal amount of data prior to performing the join. When joining two very large objects, the data access layer may use up considerable resources on the application server or desktop. Therefore, deployment of the data access layer to an elastic cloud service such as DataDirect Cloud makes a lot of sense for two reasons: Faster performance and use fewer memory/CPU resources on the client application server or desktop Leverage the superior bandwidth between DataDirect Cloud and Marketo where pre-joined datasets get exchanged. How to handle data models? Is it static or dynamic? How are changes detected and communicated to the client? Each SaaS data source is different and in the case of Marketo, certain objects are better queried through views and others through tables. Handling this matrix of data models and objects across all SaaS sources was certainly a challenge.
  • #28: 350+ ISVs 10,000 DEUs We’re excited to get MongoDB data into the hands of more people through open data standards
  • #29: Develop against open standards Avoid vendor lock-in by adopting open industry standards. DataDirect is the leader in data connectivity standards having co-founded the ODBC specification and serves on the JDBC Expert Group, OData Technical Committee and ANSI SQL Committee.  Connect to unlimited data with a single API Access the full breadth of data sources using a single, decoupled, code base and API for the data access layer protecting you from changes in metadata, error handling, and API or protocol revisions. Get a single dedicated partner Deliver full support for the breadth of data sources in all shapes and sizes, with constant vigilance for the next security vulnerability (POODLE, FREAK, LOGJAM) in your data access layer.Focus your engineering resources on your core business.  Get unlimited support We live for your next big customer. Make sure your POC is a success with 24/7 partner support and access to expertise from our engineering teams, partnerships and leading technology companies such as Microsoft, Oracle, and IBM through our TSANet multi vendor support channel.