2013 05 Oracle big_dataapplianceoverview

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.1
Big Data
Jean-Pierre Dijcks
Team Lead – Big Data Product Management

Agenda
 Big Data Implementation Patterns
 Big Data Products
 Q&A

Big Data Implementations

Big Data Usage Pattern
ETL and Batch Processing Workloads on Hadoop
Integrate
SQL
SQL
NoSQL
• Scalable
• Flexible
• Cost
Effective
DW & BI
Analytics
Web

Ad-hoc
Scale-out Information Discovery
• Online
• Scalable
• Flexible
• Cost
Effective
Data Factory
Continuous On-Demand

Expand Data Warehouse with Granular Data Store
MartsData Warehouse
Σ Σ
Business
Intelligence
Archiving
• Online
• Scalable
• Flexible
• Cost
Effective
Data Factory

Instant Responses to Streaming Data based on Historical Analysis
Data Warehouse
Business
Intelligence
• Online
• Scalable
• Flexible
• Cost
Effective
Data Factory
Event Decisions
NoSQL

Oracle Big Data Solution
Stream Acquire – Organize – Analyze
In-Database
Analytics
Data
Warehouse
Oracle
Advanced
Analytics
Oracle
Database
Oracle BI
Enterprise Edition
Oracle Real-Time
Decisions
Endeca Information
Discovery
Decide
Oracle Event
Processing
Apache
Flume
Applications
Oracle
NoSQL
Database
Cloudera
Hadoop
Oracle R
Distribution
Oracle Big Data
Connectors
Oracle Data
Integrator
• Complete
• Integrated
• Scalable

Big Data Products

Big Data Appliance X3-2
Sun Oracle X3-2L Servers with per server:
• 2 * 8 Core Intel Xeon E5 Processors
• 64 GB Memory
• 36TB Disk space
Totals per Full Rack:
• 288 Processor Cores
• 1152 GB of Memory
• 648TB Available Disk space

Big Data Appliance Software Stack
Integrated Software:
 Oracle Linux 5.8 with UEK
 Cloudera CDH 4.2 & Cloudera Manager 4.5
 Big Data Appliance Enterprise Manager Plug-In
 Oracle R Distribution
All integrated software is supported as part of Premier Support for
Systems and Premier Support for Operating Systems
Optional Software:
 Oracle NoSQL Database 2.x
 Oracle Big Data Connectors 2.x

BDA in Infrastructure as a Service
 Procurement option for H/W
 Low monthly fee spread out
over 3 to 5 years
 Ownership of the system
stays with Oracle
 Applies to all Engineered
Systems
 BDA Full Racks only
Month

Big Data Appliance Product Family
 Starter Rack is a fully cabled and
configured for growth with 6 servers
 In-Rack Expansion delivers 6 server
modular expansion block
 Full Rack delivers optimal blend of
capacity and expansion options
 Grow by adding rack – up to 18 racks
without additional switches

Big Data Appliance X3-2 Starter Rack
 6 Nodes fully cabled in Starter Rack
• 96 Intel® Xeon® E5 Processors
• 384 GB total memory
• 216TB total raw storage capacity
 6 Nodes In-Rack Expansion added in-rack
• 96 Intel® Xeon® E5 Processors
• 384 GB total memory
• 216TB total raw storage capacity
Start and grow in increments of six servers

Why Oracle Big Data Appliance?
 Beats DIY Clusters on:
– Initial Cost and Time to Value
– Performance and Scalability
 Pre-configured with leading Hadoop Distribution
– Proven at large scale
– Contributors across all components for better support
 Better Integration with your Oracle ecosystem with:
– High-performance connectivity to Exadata
– Unified analytics API (SQL, R, MapReduce etc.)
– Single Enterprise Manager Framework

Divide Full Rack BDA in
multiple clusters
Provide more flexible
configurations for
customers
Automatic reconfiguration
when expanding the
cluster
Flexible Configurations
6 Node Cluster
12 Node Cluster
Example Configuration

Engineered for Quicker Time to Value at Lower Cost
http://www.oracle.com/us/corporate/analystreports/industries/esg-big-data-wp-1914112.pdf
ESG believes that a "buy" versus "do-it-yourself"
approach will yield roughly one-third faster time-
to-market benefit improvement...
0
5
10
15
20
25
30
Oracle Big Data Appliance Build it yourself
Time to Market (Weeks)
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
Oracle Big Data Appliance Build it yourself
Cost: Initial Infrastructure/Tasks
[…] nearly 40% cost savings versus IT
architecting, designing, procuring, configuring, an
d implementing its own big data infrastructure.

Engineered for Performance
Compared with a DIY Cluster
0
5
10
Big Data
Appliance
DIY Hadoop
Cluster
Time(hours)
 Configured for exceptional
performance on delivery
 6x faster than custom 20-node
Hadoop cluster for large batch
transformation jobs
 Engineering done by Oracle and
Cloudera:
– OS and File System Tuning
– Java Virtual Machine Tuning
– Hadoop Configuration and Setup
6x

Engineered by Oracle and Cloudera
Why Cloudera and Cloudera CDH?
 Proven Track Record with the largest Hadoop Installed Base
 Proven in large scale enterprise implementations
 Demonstrated Leadership in Hadoop Community
– Breath and Depth across the Hadoop ecosystem and products
– Fast evolution in critical features
 Managed Distribution
– Components certified to work together and on Oracle Big Data Appliance in
regular updates
– Industry Leading Management Framework for Hadoop integrated with
Oracle Enterprise Manager

 Cloudera’s Hadoop Knowledge Engineered into the system:
– Master service lay-out designed for large clusters based on
experience with many large systems
– Optimized data block size for MapReduce workloads
– Optimized number of Map and Reduce slots fitting the system
capacity
– Optimized settings for a large number of Hadoop parameters
 Tested at Oracle and Cloudera on the same hardware/software
stack as our customers
Market Leading Hadoop Distribution Pre-configured

 Multi-Homing for Hadoop
– To leverage BDA’s InfiniBand and 10GiGE network, Hadoop needed to be able to
support multiple networks and IP addresses
– Committed to Apache Hadoop by Cloudera
 Highly Available NameNode Solution
– Remove dependency on a HA Filer to enable HA without required additional
hardware
– Build a journaling based HA solution for NameNode with automatic fail-over
 System Administration
– Tight integration between Oracle Enterprise Manager (Hardware and High-Level
Software Monitoring) and Cloudera Manager (Hadoop Details)
Driving Enterprise Class Requirements for Hadoop

Integrated Management Framework
Management Infrastructure combines EM and CM
Quick view of Hardware and Software status
in Oracle Enterprise Manager

Big Data Connectors
Optimized integration of Hadoop with Oracle Database
and Oracle Exadata
• Oracle Loader for Hadoop
• Oracle SQL Connector for Hadoop Distributed File System
(HDFS)
• Oracle Data Integrator Application Adapter for Hadoop
• Oracle R Connector for Hadoop
• Does not require Big Data Appliance – can be licensed for Hadoop
running on non-Oracle hardware

Analyze Data across your Oracle Systems
SQL Analytics on ALL data
SQL
Hadoop Oracle Database
IB
 Expand the data pool for
analytics leveraging Hadoop
 Stream Hadoop resident data
through Big Data Connectors
for SQL processing
 Use the full power of Oracle
SQL on all data
 Or use Oracle Loader for
Hadoop to integrate data in
Oracle Database

Analyze Data across your Oracle Systems
R Analytics on ALL data
R
Hadoop Oracle Database
IB
 Expand the data pool for
analytics leveraging Hadoop
 Improve scalability and
performance for R without
changes to your programs
 Dynamically leverage Hadoop
through Big Data Connectors
to execute R analytics

Oracle Data Integrator
Simplify Map Reduce
OLH
&
OSCH
Oracle
Data
Integrator
 Automatically generates
MapReduce code
 High performance loads into
Data Warehouse leveraging
both OLH and OSCH
 Manages the process across
platforms

Oracle NoSQL Database
Scalable, Highly Available, Key-Value Database
Application
Storage Nodes
Datacenter B
Storage Nodes
Datacenter A
Application
NoSQL DB Driver
Application
NoSQL DB Driver
Application
 Simple Key-Value Data Model
 Horizontally Scalable
 Highly Available
 Simple administration
 ACID Transactions at scale
 Transparent load balancing
 Elastic Configuration
 Commercial grade software and
support

Oracle NoSQL Database Use Cases
NoSQL DB Driver
Application
Oracle Event
Processor
Event
Stream
Web Scale Transaction Processing
• High velocity, High volume, High variety, Low information density data capture
• Uses Hadoop and/or Data Warehouse for analytics
• Applications: Web browsing, Web Retail, CDR processing, Sensor data capture
Last Mile Content Delivery
• Platform for real-time content delivery
• Content & market segmentation Acquired and Analyzed in Hadoop & RDBMS
• NoSQL provides low latency content lookup and delivery to end-customers
• OEP rules perform low latency lookups to Oracle NoSQL DB for additional data
Real Time Event Processing
• Real time events trigger rule execution in Oracle Event Processing
• OEP rules perform low latency lookups to Oracle NoSQL DB for additional data
• OEP actions are triggered
• Applications: Medical Monitoring, Factory Automation, Oil & Gas, Geo-location
Rule Action

2013 05 Oracle big_dataapplianceoverview

Recommended

More Related Content

What's hot (20)

Viewers also liked (6)

Similar to 2013 05 Oracle big_dataapplianceoverview (20)

Recently uploaded (20)

2013 05 Oracle big_dataapplianceoverview