Pentaho Analytics on MongoDB

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-75551
Pentaho Analytics for
MongoDB
Mark Kromer
Pentaho Big Data Analytics Product Manager
@kromerbigdata

Modern, unified data integration and business
analytics platform
• Broadest and deepest big data integration
• Embeddable, cloud-ready analytics
• Big data blending at the source
Fast and Broad Innovation
• Open source development model
• 100% java, pluggable and extensible
Critical mass achieved
• Over 1,200 commercial customers
• Over 10,000 production deployments
Pentaho Mission
Enabling the future of analytics

Blending brings the two worlds together
Evolving big data architectures
P
D
I
Existing
ETL Tool
or PDI
EDW Data
Marts
Analytics
Existing
ETL Tool
or PDI
Customer
Provisioning
Billing
BI Tools
Location
Web
Social Media
Network
Existing
Process
or PDI
Hadoop
Cluster
P
D
I
Analytic
DB
On-Demand Integration & Blending

Pentaho 5.1
Powering Big Data
Analytics @ Scale

• Unleash operational analytics on MongoDB for IT
and Business Analysts
• Unlock value of data in MongoDB for analysts with no
coding required
• Offload data preparation for data scientists
• Focus on analytics, better understand customer
behavior
• Reduce complexity for big data developers
• Leverage existing skilled resources and reduce
complexity
• Improve efficiency and performance for analytics
Powering Big Data Analytics @ Scale
Meeting the demands of the big data-driven enterprise

ORCHESTRATE
ERP DW
Processing
CRM
Raw Data
Parsed Data
Analytic Datasets
Pentaho Analytics for MongoDB
Master Data
Analysis &
Reporting
A
N
A
L
Y
Z
E
Unstructured
Data
Structured
Data
I
N
G
E
S
T
Ingestion
AGG FRAMEWORK
Data Integration Analytics

❯ Simple, easy-to-use visual data
exploration
❯ Web-based thin client; in-memory
caching
❯ Rich library of interactive
visualizations
• Geo-mapping, heat grids,
scatter plots, bubble charts,
line over bar and more
• Pluggable visualizations
❯ Java ROLAP engine to analyze
structured and unstructured data,
with SQL dialects for querying data
from RDBMs
❯ Pluggable cache integrating with
leading caching architectures:
Infinispan (JBoss Data Grid) &
Memcached
Pentaho Interactive Analysis & Data Discovery
Highly Flexible Advanced Visualizations

“The Pentaho platform is meeting unmet market needs, allowing users to
directly analyze data in MongoDB. We have seen more accurate results with
new analyses and are no longer constrained by having to only pull part of our
data”
Business User (COO)
Reporting on
Operations and
Overhead
End Users
Dashboards and
Reports on Customer
Policy Data
PDI
Data
Marts
Data Scientist
Data Mining and
Data Governance
Web Services
Customer Portal
Log Files
Cross Department
Operations Data
PDI
Transaction and
Policy Data
RDBMS
PDI
JSON transformation
Analyzer tuned
for MongoDB
PDI

Data Integration
ETL, Scheduling, Events, Orchestration
• 100% Java engine
• Meta-data driven architecture – graphical ETL Designer
• Scale-out architecture, deployable to
• Desktop
• PDI clusters
• Hadoop clusters
• Plugin architecture for extensibility
• Batch, low-latency and real time processing
• Rapid onboarding of Analytics
• Embeddable

Concept – Data Transformations
INPUT(S) – PROCESS(ES) – OUTPUT(S)

Concept – Jobs (orchestrate)
START – CHECK – WATCH – EXECUTE – NOTIFY - FINISH

mongoDB
clusterPDI ETL
Analytics
Broad Connectivity
Broad connectivity combined with powerful data integration

• Ability to blend traditional data
sources with Big Data
• Rapid time to value through
drag/drop visual development
for Big Data integration
• Adaptive Big Data layer guards
system from changing Big Data
versions – reduces risk
• Comprehensive analytics:
visualizations, reports,
dashboards, ad hoc analysis
Why
Customer 360 – NoSQL Architecture
A Blended View to Drive Revenue Growth and Service Improvements
Reference Architecture Notes
• Financial services company: Ingest data from source systems into single
Big Data store, then process & summarize data at customer unique ID level
• Information is available in call center application for service, accessible by
research analysts, and leveraged in predictive applications as well
NoSQL
CRM
System
Documents &
Images
Admin.
Info
Claims
Online
Interactions
Call Center
View
Research
Analysts
Predictive
Analytics
PDI PDI
Analyzer
Reports

Flexible Schema for Big Data Variety
Every document in a single collection could have
different customer data
name: “jeff”,
eyes: “blue”,
loc: [40.7, 73.4],
boss: “ben”}
{name: “brendan”,
aliases: [“el diablo”]}
name: “ben”,
hat: ”yes”}
{name: “matt”,
pizza: “DiGiorno”,
height: 72,
loc: [44.6, 71.3]}
{name: “will”,
eyes: “blue”,
birthplace: “NY”,
aliases: [“bill”, “la
ciacco”],
loc: [32.7, 63.4],
boss: ”ben”}
50M Customers = 50M Documents = 1TB

• Reduces development effort
• Data is more useful than
independent representations
• Documents make it easy to
integrate data from multiple
schemas into a shared
representation
Documents Accelerate Time to Market

Scale Like an Accordion
Automatic horizontal scaling based on customer ID

New Book – Pentaho Analytics for MongoDB

Thank You

Pentaho Analytics on MongoDB

Recommended

More Related Content

What's hot (20)

Viewers also liked (18)

Similar to Pentaho Analytics on MongoDB (20)

More from Mark Kromer (20)

Recently uploaded (20)

Pentaho Analytics on MongoDB

Editor's Notes