SlideShare a Scribd company logo
IBM Analytics Platform Group
Enterprise Graph Analytics
Enterprise large scale graph analytics and computing base on distribute
graph database(Titan DB HBase/Solr) and distributed graph computing in
memory(TinkerPop Hadoop Gremlin SparkGraphComputer) and Hadoop2
Jun(Terry) Yang • yangjuncn@cn.ibm.com • Linkedin.com/in/terryjunyang
Jing Chen(Jerry) He • jinghe@us.ibm.com • Linkedin.com/in/jing-chen-jerry-he-1553511
Hadoop Summit 2017
2© IBM 2017 Hadoop Summit 2017
Agenda
• Challenges in hybrid data analytics
• Enterprise data quality analytics system based on graphed metadata
• Graph in enterprise data quality analytics solution
3© IBM 2017 Hadoop Summit 2017
Hybrid data analytics and challenges
How was “total quantity” calculated? Show me the lineage?
What are the source-to-target mappings for the DW?
Who read the “sales” data in non-working time? How to ensure data quality?
Data Warehouse Architect
Auditor
Business Person
Data Architect
4© IBM 2017 Hadoop Summit 2017
How to handle the challenges?
DataGovernance
Data Lifecycle
Management
Data Quality
Management
•Correctness
Consistency
Completeness
Timeliness
Metadata
…
Master Data
management
…
5© IBM 2017 Hadoop Summit 2017
What is Metadata?
• The data used to describe other data
− Simple Metadata
− Rich Metadata
• inode attributes for file management
• Filesystem object attributes include metadata,
like modify time, access, owner, permission, etc.
File systems metadata
• Schema for data management
• Ownership information of data
• Server/Database information of data
DBMS/DW/NOSQL metadata
How to manage the metadata in hybrid data analytics environment?
6© IBM 2017 Hadoop Summit 2017
Agenda
• Challenges in hybrid data analytics
• Enterprise data quality analytics system based on graphed metadata
• Graph in enterprise data quality analytics solution
7© IBM 2017 Hadoop Summit 2017
Advantage of Graph in Metadata management
Traditional solution
• Limited in one server/system
• Metadata managed within a
server/system
Property Graph based solution
• Integrate metadata
• Handle storage pressure
• Efficient Processing and Querying
• Lineage
• Wild range managed
8© IBM 2017 Hadoop Summit 2017
Property Graph
Key1:value1
Key2:value2
Key1:value1
Key2:value2
Label
Edge
Properties
Vertex
G = ( V, E )
Graph Vertices Edges
label1
• Born for relationship
• Intuitive modeling
• Expressive querying
• Native analysis
9© IBM 2017 Hadoop Summit 2017
Using Graph Analytics to Find Complex Patterns
1st degree relationship
2nd degree relationship
3rd degree relationship
• Graph queries are a natural
way for analyzing relationship
patterns
 Less complex than SQL
 Can handle high degrees of
relationship with ease
• Graph schema facilitates
visualization and exploration
of relationships
10© IBM 2017 Hadoop Summit 2017
Case study - Audit data access
• Data theft risk in enterprise in hybrid
– Most data stolen by internal person.
– Most data theft happened in non-working time.
– Over-granting of privileges may cause data theft.
11© IBM 2017 Hadoop Summit 2017
Enterprise data quality analytics system based
on graphed metadata
Data ingest
finance data
Consumption data
Credit data
Behavioral data
Graphed metadata
…
Feature Selection
Statistical learning
Data analysis
(Graphed) Metadata
analysis
…
Advanced Feature
Selection
Gradient Boosting
Decision Tree
Support Vector
Machine
Random Forests
PageRank(Graph)
…
Modeling
Customer risk rating
Consumption
Capacity
Graph model
…
Recommendation
Consumer behavior
Fraud detection
Risk analytics(Audit)
…
12© IBM 2017 Hadoop Summit 2017
Data ingest
user
programData
Run
Read
name,
job id,
params,
config,
inputs,
outputs,
start_ts,
finish_ts,
…
id,
name,
group,
permission,
…
name,
size,
location,
department,
permission,
parent,
children,
…
ts_hour,
ts_min,
ts_sec,
status,
…
Metadata Integration
Graph-based Traversal
• User
• Program
• Data
• …
•Entitles  Vertices
• User run program
• Program read data
• …
Relationships  Edges
• Name
• ….
Attributes  Properties
Identify entities and relationships Metadata to Graph
13© IBM 2017 Hadoop Summit 2017
Feature Selection
Who read the sensitive sales data in non-working time?
Query: userFeaSele = graph.traversal().
V().has("department","sales").inE("read").outV().hasLabel('progra
m').inE("run").has(“ts_hour",not(within(9,17))).outV()
Find the user who has the access to large amount data?
Query: … withComputer(SparkGraphComputer) …
userAdvFeaSele =
userFeaSele.pageRank().by('pageRank').order().by('pageRank').li
mit(30)
FeatureSelection
AdvancedFeature
Selection
14© IBM 2017 Hadoop Summit 2017
Modeling
• Modeling risk analysis with graphed metadata, information in ERP.
• Analyze the user with employee information from ERP, with years of
working, age, role, to identify suspect. A non-sales person, for
example, an application R&D person, will be the suspect.
• Audit Recommendation.
Risk analysis model
Graph: User List(userAdvFeaSele)
ERP: Employee information
ERP: Violation information
Audit Recommendation
Risk analysis report
Suspects who stole
sensitive data
Advanced
Feature
Selection
Other
system
15© IBM 2017 Hadoop Summit 2017
Agenda
• Challenges in hybrid data analytics
• Enterprise data quality analytics system based on graphed metadata
• Graph in enterprise data quality analytics solution
16© IBM 2017 Hadoop Summit 2017
User data
Machine data
log data
Behavioral data
Graphed metadata
Enterprise data quality system
Feature
analysis
Lineage Metadata
management
Cleansing
Hadoop Hbase Hive
HDFS Spark Titan
Solr
…
Data Source
third-party
data
Ingest(load)
Business Application
Risk management
Data audit
Graph in enterprise data quality analytics solution
……
Cost analytics
17© IBM 2017 Hadoop Summit 2017
How to choose Enterprise Graph Database?
Data storing features
Operation and manipulation features
Graph data structures
Query features
Schema and instance representation
Easy and centralized Management
Expose service
Security features
Fast computing
Evaluate Graph database from following perspective:
18© IBM 2017 Hadoop Summit 2017
Titan
• What is Titan
− Distributed Graph Database
− Based on TinkerPop (Gremlin)
− Open Source
• Titan Features
− Distribute
− Scalable : billions edges and vertices
− Real-time
− Transactional database (concurrent users/ACID/..)
− Global graph compute: graph data analytics, report, ETL
− Search: geo, numeric range, and full text search
19© IBM 2017 Hadoop Summit 2017
Titan solution architecture
application
Management API TinkerPop API - Gremlin
Internal API layer
Database layer(Tx, Data, Mgmt, Optimizer)
OLAPI/O
Interface
Storage and Index Interface Layer
HBase
Storage Backend
Solr
External Index Backend
Spark
Big Data Platform
Gremlin
GraphComputer
OLAP OLTP
Hadoop
 Optimized for storing and querying billions of vertices and edges over a cluster
 Supports thousands of concurrent users
 Can execute local queries (OLTP) or distributed queries across a cluster (OLAP)
20© IBM 2017 Hadoop Summit 2017
Backend – HBase & Solr
• HBase
− Tight integration with the Hadoop ecosystem.
− Native support for strong consistency.
− Linear scalability with the addition of more machines.
− Strictly consistent reads and writes.
− Convenient base classes for backing Hadoop MapReduce jobs with HBase tables.
− Support for exporting metrics via JMX.
− Open source under the liberal Apache 2 license.
• Solr
− Solr is the popular, blazing fast open source enterprise search platform from the
Apache Lucene project.
− Solr is a standalone enterprise search server with a REST-like API.
− Solr is highly reliable, scalable and fault tolerant, providing distributed indexing,
replication and load-balanced querying, automated failover and recovery, centralized
configuration and more.
 Data storing features
 Operation and manipulation features
 Graph data structures
 Query features
 Schema and instance representation
Easy and centralized Management
Expose service
Security features
Fast computing
21© IBM 2017 Hadoop Summit 2017
Integration and management
Titan in Ambari
Titan
Deployment
Installation
Uninstallation
Titan client
deployment
Titan server
deployment
Titan server
operation
Start server
Stop server
Service check
Titan
Configuration
HBase backend
Solr backend
SparkGraphComputer
Titan server
Titan environment
Titan security
Titan security
support
SSL
SASL
LDAP
Kerberos
Knox
HBase Access control
 Data storing features
 Operation and manipulation features
 Graph data structures
 Query features
 Schema and instance representation
 Easy and centralized Management
Expose service
Security features
Fast computing
22© IBM 2017 Hadoop Summit 2017
Remote
Titan service
Mgmt API TP API - Gremlin
Internal API layer
Database layer
OLAPI/O
Storage and Index Interface Layer
HBase Solr
Spark
Gremlin
GraphComputer
Gremlin Server Gremlin Console
Titan Engine
{RESTful} {Web Socket} Gremlin>
local
Titan server Titan client
 Data storing features
 Operation and manipulation features
 Graph data structures
 Query features
 Schema and instance representation
 Easy and centralized Management
 Expose service
Security features
Fast computing
23© IBM 2017 Hadoop Summit 2017
Cluster
Remote
Titan clientTitan server
Titan security enhancement
Spark
Gremlin
Graph
Computer
local
Mgmt API TP API - Gremlin
Internal API layer
Database layer
OLAPI/O
Interface
Storage and Index Interface Layer
HBase Solr
SSL
Knox
SASL
LDAP/OS
/Kerberized
Titan user
HBase
Access
control
Kerberized
Cluster
Security
Description
 Data storing features
 Operation and manipulation features
 Graph data structures
 Query features
 Schema and instance representation
 Easy and centralized Management
 Expose service
 Security features
Fast computing
24© IBM 2017 Hadoop Summit 2017
Integrate TinkerPop
SparkGraphComputer with Titan DB
Mgmt API TP API - Gremlin
Internal API layer
Database layer
OLAPI/O
Interface
Storage and Index Interface Layer
HBase Solr
Gremlin GraphComputer
Graph
RDD
PageRankVertexProgram
PeerPressureVertexProgram
BulkDumperVertexProgram
BulkLoaderVertexProgram
TraversalVertexProgram
Spark-gremlin
SparkGraphComputer
Hadoop gremlin
Spark
 Data storing features
 Operation and manipulation features
 Graph data structures
 Query features
 Schema and instance representation
 Easy and centralized Management
 Expose service
 Security features
 Fast computing
25© IBM 2017 Hadoop Summit 2017
Open source Graph Database
A new Linux Foundation project
formed to continue development of
the TitanDB graph database.
Last Titan 1.0.0 was
release on Sep 20 2015
26© IBM 2017 Hadoop Summit 2017
References & Contacts
• Graph
− Titan: http://titan.thinkaurelius.com
− JanusGraph: http://janusgraph.org
− TinkerPop: https://tinkerpop.apache.org
Jun(Terry) Yang
Team Lead
yangjuncn@cn.ibm.com
Linkedin.com/in/terryjunyang
Jing Chen(Jerry) He
Architect
jinghe@us.ibm.com
Linkedin.com/in/jing-chen-jerry-he-1553511
27© IBM 2017 Hadoop Summit 2017
zzzz
z
z
z
Thanks!
Questions?
Ad

More Related Content

What's hot (17)

Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe?
Zaloni
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
Skillwise Group
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Dataconomy Media
 
The importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital TransformationThe importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital Transformation
MongoDB
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
Tapdata
 
Data Mesh
Data MeshData Mesh
Data Mesh
Piethein Strengholt
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
Jordan Chung
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
CCG
 
Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricks
Brandon Berlinrut
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
Eyal Ben Ivri
 
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
Torsten Steinbach
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
Roy Kim
 
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
StampedeCon
 
Formulating Power BI Enterprise Strategy
Formulating Power BI Enterprise StrategyFormulating Power BI Enterprise Strategy
Formulating Power BI Enterprise Strategy
Teo Lachev
 
Enable the business and make Artificial Intelligence accessible for everyone!
Enable the business and make Artificial Intelligence accessible for everyone! Enable the business and make Artificial Intelligence accessible for everyone!
Enable the business and make Artificial Intelligence accessible for everyone!
Marc Lelijveld
 
Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe?
Zaloni
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
Skillwise Group
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Dataconomy Media
 
The importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital TransformationThe importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital Transformation
MongoDB
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
Tapdata
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
Jordan Chung
 
Analytics in a Day Virtual Workshop
Analytics in a Day Virtual WorkshopAnalytics in a Day Virtual Workshop
Analytics in a Day Virtual Workshop
CCG
 
Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricks
Brandon Berlinrut
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
Eyal Ben Ivri
 
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
Torsten Steinbach
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
Roy Kim
 
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
StampedeCon
 
Formulating Power BI Enterprise Strategy
Formulating Power BI Enterprise StrategyFormulating Power BI Enterprise Strategy
Formulating Power BI Enterprise Strategy
Teo Lachev
 
Enable the business and make Artificial Intelligence accessible for everyone!
Enable the business and make Artificial Intelligence accessible for everyone! Enable the business and make Artificial Intelligence accessible for everyone!
Enable the business and make Artificial Intelligence accessible for everyone!
Marc Lelijveld
 

Similar to Hadoop summit 2017 enterprise graph analytics (20)

ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
Karan Sachdeva
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
Vikas Manoria
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
DataWorks Summit
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
Big Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
Big Data Joe™ Rossi
 
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
email2jl
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
Perficient, Inc.
 
Analytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
Analytical Systems Evolution: From Excel to Big Data Platforms and Data LakesAnalytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
Analytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
Provectus
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en Azure
Elena Lopez
 
Microservices+Approach+with+IBM+Cloud+Pak+for+Data+-+BACon+2019.pdf
Microservices+Approach+with+IBM+Cloud+Pak+for+Data+-+BACon+2019.pdfMicroservices+Approach+with+IBM+Cloud+Pak+for+Data+-+BACon+2019.pdf
Microservices+Approach+with+IBM+Cloud+Pak+for+Data+-+BACon+2019.pdf
AnandSivan7
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
DataWorks Summit
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
Martin Bém
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
DataWorks Summit
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Matt Stubbs
 
Is your data paying you dividends?
Is your data paying you dividends? Is your data paying you dividends?
Is your data paying you dividends?
Karan Sachdeva
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
 
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
Agile Testing Alliance
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
Karan Sachdeva
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
Vikas Manoria
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
DataWorks Summit
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
Big Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
Big Data Joe™ Rossi
 
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
email2jl
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
Perficient, Inc.
 
Analytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
Analytical Systems Evolution: From Excel to Big Data Platforms and Data LakesAnalytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
Analytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
Provectus
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en Azure
Elena Lopez
 
Microservices+Approach+with+IBM+Cloud+Pak+for+Data+-+BACon+2019.pdf
Microservices+Approach+with+IBM+Cloud+Pak+for+Data+-+BACon+2019.pdfMicroservices+Approach+with+IBM+Cloud+Pak+for+Data+-+BACon+2019.pdf
Microservices+Approach+with+IBM+Cloud+Pak+for+Data+-+BACon+2019.pdf
AnandSivan7
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
DataWorks Summit
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
Martin Bém
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
DataWorks Summit
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Matt Stubbs
 
Is your data paying you dividends?
Is your data paying you dividends? Is your data paying you dividends?
Is your data paying you dividends?
Karan Sachdeva
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
 
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
Agile Testing Alliance
 
Ad

Recently uploaded (20)

Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
How iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost FundsHow iCode cybertech Helped Me Recover My Lost Funds
How iCode cybertech Helped Me Recover My Lost Funds
ireneschmid345
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Ad

Hadoop summit 2017 enterprise graph analytics

  • 1. IBM Analytics Platform Group Enterprise Graph Analytics Enterprise large scale graph analytics and computing base on distribute graph database(Titan DB HBase/Solr) and distributed graph computing in memory(TinkerPop Hadoop Gremlin SparkGraphComputer) and Hadoop2 Jun(Terry) Yang • [email protected] • Linkedin.com/in/terryjunyang Jing Chen(Jerry) He • [email protected] • Linkedin.com/in/jing-chen-jerry-he-1553511 Hadoop Summit 2017
  • 2. 2© IBM 2017 Hadoop Summit 2017 Agenda • Challenges in hybrid data analytics • Enterprise data quality analytics system based on graphed metadata • Graph in enterprise data quality analytics solution
  • 3. 3© IBM 2017 Hadoop Summit 2017 Hybrid data analytics and challenges How was “total quantity” calculated? Show me the lineage? What are the source-to-target mappings for the DW? Who read the “sales” data in non-working time? How to ensure data quality? Data Warehouse Architect Auditor Business Person Data Architect
  • 4. 4© IBM 2017 Hadoop Summit 2017 How to handle the challenges? DataGovernance Data Lifecycle Management Data Quality Management •Correctness Consistency Completeness Timeliness Metadata … Master Data management …
  • 5. 5© IBM 2017 Hadoop Summit 2017 What is Metadata? • The data used to describe other data − Simple Metadata − Rich Metadata • inode attributes for file management • Filesystem object attributes include metadata, like modify time, access, owner, permission, etc. File systems metadata • Schema for data management • Ownership information of data • Server/Database information of data DBMS/DW/NOSQL metadata How to manage the metadata in hybrid data analytics environment?
  • 6. 6© IBM 2017 Hadoop Summit 2017 Agenda • Challenges in hybrid data analytics • Enterprise data quality analytics system based on graphed metadata • Graph in enterprise data quality analytics solution
  • 7. 7© IBM 2017 Hadoop Summit 2017 Advantage of Graph in Metadata management Traditional solution • Limited in one server/system • Metadata managed within a server/system Property Graph based solution • Integrate metadata • Handle storage pressure • Efficient Processing and Querying • Lineage • Wild range managed
  • 8. 8© IBM 2017 Hadoop Summit 2017 Property Graph Key1:value1 Key2:value2 Key1:value1 Key2:value2 Label Edge Properties Vertex G = ( V, E ) Graph Vertices Edges label1 • Born for relationship • Intuitive modeling • Expressive querying • Native analysis
  • 9. 9© IBM 2017 Hadoop Summit 2017 Using Graph Analytics to Find Complex Patterns 1st degree relationship 2nd degree relationship 3rd degree relationship • Graph queries are a natural way for analyzing relationship patterns  Less complex than SQL  Can handle high degrees of relationship with ease • Graph schema facilitates visualization and exploration of relationships
  • 10. 10© IBM 2017 Hadoop Summit 2017 Case study - Audit data access • Data theft risk in enterprise in hybrid – Most data stolen by internal person. – Most data theft happened in non-working time. – Over-granting of privileges may cause data theft.
  • 11. 11© IBM 2017 Hadoop Summit 2017 Enterprise data quality analytics system based on graphed metadata Data ingest finance data Consumption data Credit data Behavioral data Graphed metadata … Feature Selection Statistical learning Data analysis (Graphed) Metadata analysis … Advanced Feature Selection Gradient Boosting Decision Tree Support Vector Machine Random Forests PageRank(Graph) … Modeling Customer risk rating Consumption Capacity Graph model … Recommendation Consumer behavior Fraud detection Risk analytics(Audit) …
  • 12. 12© IBM 2017 Hadoop Summit 2017 Data ingest user programData Run Read name, job id, params, config, inputs, outputs, start_ts, finish_ts, … id, name, group, permission, … name, size, location, department, permission, parent, children, … ts_hour, ts_min, ts_sec, status, … Metadata Integration Graph-based Traversal • User • Program • Data • … •Entitles  Vertices • User run program • Program read data • … Relationships  Edges • Name • …. Attributes  Properties Identify entities and relationships Metadata to Graph
  • 13. 13© IBM 2017 Hadoop Summit 2017 Feature Selection Who read the sensitive sales data in non-working time? Query: userFeaSele = graph.traversal(). V().has("department","sales").inE("read").outV().hasLabel('progra m').inE("run").has(“ts_hour",not(within(9,17))).outV() Find the user who has the access to large amount data? Query: … withComputer(SparkGraphComputer) … userAdvFeaSele = userFeaSele.pageRank().by('pageRank').order().by('pageRank').li mit(30) FeatureSelection AdvancedFeature Selection
  • 14. 14© IBM 2017 Hadoop Summit 2017 Modeling • Modeling risk analysis with graphed metadata, information in ERP. • Analyze the user with employee information from ERP, with years of working, age, role, to identify suspect. A non-sales person, for example, an application R&D person, will be the suspect. • Audit Recommendation. Risk analysis model Graph: User List(userAdvFeaSele) ERP: Employee information ERP: Violation information Audit Recommendation Risk analysis report Suspects who stole sensitive data Advanced Feature Selection Other system
  • 15. 15© IBM 2017 Hadoop Summit 2017 Agenda • Challenges in hybrid data analytics • Enterprise data quality analytics system based on graphed metadata • Graph in enterprise data quality analytics solution
  • 16. 16© IBM 2017 Hadoop Summit 2017 User data Machine data log data Behavioral data Graphed metadata Enterprise data quality system Feature analysis Lineage Metadata management Cleansing Hadoop Hbase Hive HDFS Spark Titan Solr … Data Source third-party data Ingest(load) Business Application Risk management Data audit Graph in enterprise data quality analytics solution …… Cost analytics
  • 17. 17© IBM 2017 Hadoop Summit 2017 How to choose Enterprise Graph Database? Data storing features Operation and manipulation features Graph data structures Query features Schema and instance representation Easy and centralized Management Expose service Security features Fast computing Evaluate Graph database from following perspective:
  • 18. 18© IBM 2017 Hadoop Summit 2017 Titan • What is Titan − Distributed Graph Database − Based on TinkerPop (Gremlin) − Open Source • Titan Features − Distribute − Scalable : billions edges and vertices − Real-time − Transactional database (concurrent users/ACID/..) − Global graph compute: graph data analytics, report, ETL − Search: geo, numeric range, and full text search
  • 19. 19© IBM 2017 Hadoop Summit 2017 Titan solution architecture application Management API TinkerPop API - Gremlin Internal API layer Database layer(Tx, Data, Mgmt, Optimizer) OLAPI/O Interface Storage and Index Interface Layer HBase Storage Backend Solr External Index Backend Spark Big Data Platform Gremlin GraphComputer OLAP OLTP Hadoop  Optimized for storing and querying billions of vertices and edges over a cluster  Supports thousands of concurrent users  Can execute local queries (OLTP) or distributed queries across a cluster (OLAP)
  • 20. 20© IBM 2017 Hadoop Summit 2017 Backend – HBase & Solr • HBase − Tight integration with the Hadoop ecosystem. − Native support for strong consistency. − Linear scalability with the addition of more machines. − Strictly consistent reads and writes. − Convenient base classes for backing Hadoop MapReduce jobs with HBase tables. − Support for exporting metrics via JMX. − Open source under the liberal Apache 2 license. • Solr − Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. − Solr is a standalone enterprise search server with a REST-like API. − Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more.  Data storing features  Operation and manipulation features  Graph data structures  Query features  Schema and instance representation Easy and centralized Management Expose service Security features Fast computing
  • 21. 21© IBM 2017 Hadoop Summit 2017 Integration and management Titan in Ambari Titan Deployment Installation Uninstallation Titan client deployment Titan server deployment Titan server operation Start server Stop server Service check Titan Configuration HBase backend Solr backend SparkGraphComputer Titan server Titan environment Titan security Titan security support SSL SASL LDAP Kerberos Knox HBase Access control  Data storing features  Operation and manipulation features  Graph data structures  Query features  Schema and instance representation  Easy and centralized Management Expose service Security features Fast computing
  • 22. 22© IBM 2017 Hadoop Summit 2017 Remote Titan service Mgmt API TP API - Gremlin Internal API layer Database layer OLAPI/O Storage and Index Interface Layer HBase Solr Spark Gremlin GraphComputer Gremlin Server Gremlin Console Titan Engine {RESTful} {Web Socket} Gremlin> local Titan server Titan client  Data storing features  Operation and manipulation features  Graph data structures  Query features  Schema and instance representation  Easy and centralized Management  Expose service Security features Fast computing
  • 23. 23© IBM 2017 Hadoop Summit 2017 Cluster Remote Titan clientTitan server Titan security enhancement Spark Gremlin Graph Computer local Mgmt API TP API - Gremlin Internal API layer Database layer OLAPI/O Interface Storage and Index Interface Layer HBase Solr SSL Knox SASL LDAP/OS /Kerberized Titan user HBase Access control Kerberized Cluster Security Description  Data storing features  Operation and manipulation features  Graph data structures  Query features  Schema and instance representation  Easy and centralized Management  Expose service  Security features Fast computing
  • 24. 24© IBM 2017 Hadoop Summit 2017 Integrate TinkerPop SparkGraphComputer with Titan DB Mgmt API TP API - Gremlin Internal API layer Database layer OLAPI/O Interface Storage and Index Interface Layer HBase Solr Gremlin GraphComputer Graph RDD PageRankVertexProgram PeerPressureVertexProgram BulkDumperVertexProgram BulkLoaderVertexProgram TraversalVertexProgram Spark-gremlin SparkGraphComputer Hadoop gremlin Spark  Data storing features  Operation and manipulation features  Graph data structures  Query features  Schema and instance representation  Easy and centralized Management  Expose service  Security features  Fast computing
  • 25. 25© IBM 2017 Hadoop Summit 2017 Open source Graph Database A new Linux Foundation project formed to continue development of the TitanDB graph database. Last Titan 1.0.0 was release on Sep 20 2015
  • 26. 26© IBM 2017 Hadoop Summit 2017 References & Contacts • Graph − Titan: http://titan.thinkaurelius.com − JanusGraph: http://janusgraph.org − TinkerPop: https://tinkerpop.apache.org Jun(Terry) Yang Team Lead [email protected] Linkedin.com/in/terryjunyang Jing Chen(Jerry) He Architect [email protected] Linkedin.com/in/jing-chen-jerry-he-1553511
  • 27. 27© IBM 2017 Hadoop Summit 2017 zzzz z z z Thanks! Questions?