SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
Enterprise Metadata Integration
Mirko Kämpf | Cloudera
GraphConnect 2017 – London
2© Cloudera, Inc. All rights reserved.
Who is speaking?
Solutions Architect @ Cloudera
-time series analysis, network analysis, data enrichment pipelines
-personal interest: QA-Systems and semantic search
Data Science Activities
The Detection of Emerging Trends Using Wikipedia Traffic Data
and Context Networks (PLOS ONE, 2015)
Hadoop.TS (IJCA, 2013)
Fluctuations in Wikipedia Access-Rate and Edit-Event Data.
(Physica A, 2012).
3© Cloudera, Inc. All rights reserved.
Our Approach: Multilayer Metadata Integration …
• Status dashboards are provided per Use-Case.
• Each dashboard offers facts from multiple layers:
- (L1) technical layer
- (L2) operational metadata (Hadoop specific only)
- (L3) application specific operational metadata
- (L4) quality metrics (second order metadata)
• Our Achievements:
• Graph database (Neo4J) allows context exploration.
• Cluster spanning metadata exploration is possible now.
• Exposure of inherent but sometimes hidden facts becomes as easy as writing an email.
Integration of facts
to gain business
knowledge
4© Cloudera, Inc. All rights reserved.
Intro
5© Cloudera, Inc. All rights reserved.
People do mining … for centuries!
http://www.montanregion-erzgebirge.de/welterbe-erleben/montanregion-fuer-bergbauspezialisten/geschichtliches.html
gold & diamonds,
ore & coal,
minerals,
oil …
Outcome drives whole economy
6© Cloudera, Inc. All rights reserved.
People use computers … for decades!
1938
Z1: World’s first free programmable
device, created by Conrad Zuse.
U.S. Department of Energy uses Intel
Supercomputer at Argonne National Laboratory.
2015
http://www.intel.com/content/dam/www/public/us/en/images/photography-business/RWD/aurora-aerial-reflection-floor-rwd.png
http://www.horst-zuse.homepage.t-online.de/z1.html
7© Cloudera, Inc. All rights reserved.
DATA
MINING
http://codecondo.com/9-free-books-for-learning-data-mining-data-analysis/
Blog: About Learning Data Mining & Data Analysis
8© Cloudera, Inc. All rights reserved.
If data is the new oil …
… metadata are nuggets
and brilliants of our age.
Screenshot taken from:
https://www.quora.com/Who-should-get-credit-for-the-quote-data-is-the-new-oil
9© Cloudera, Inc. All rights reserved.
Diamonds: beautiful even as raw material Brilliant: result of expert’s work
Even more exciting in combination
with other material and skills …
10© Cloudera, Inc. All rights reserved.
• Idea & Vision
• Material
• Skills / Methods
• Tools
Success Factors:
http://www.burkhard-beyer.net/Reportage_Goldschmied.html
11© Cloudera, Inc. All rights reserved.
Be very careful with initial success …
… work towards a professional level!
High quality and reproducibility
are results of a
Professional Management
It is hard to believe what
you can get and which
options arise …
Manage overwhelming
excitement!
Start new activities
not randomly …
12© Cloudera, Inc. All rights reserved.
Let’s Think Data Driven!
• Build a mid-term or better a long-term strategy.
• Try to stay independent of a particular technology or tool.
Not the fancy toolset but rather data is what matters most.
• After initial success you should slow down and control speed of expansion.
• Focus on: maximized accessibility of data.
Google’s goal was to make the data of the internet accessible.
You should become your own Google!
• Idea & Vision
• Material
• Skills / Methods
• Tools
13© Cloudera, Inc. All rights reserved.
Dataset Profiles / Flow Descriptors
•Our material is data & metadata:
- Data about data : descriptive data, Dublin core metadata model, …
- Derived data : statistics extracted from processes, documents, …
- Results of ML/AI procedures : extracted structure and learned models
- Outcome of crowd based operations : Wikipedia with its inherent
structure, communication logs, access and edit history.
• Idea & Vision
• Material
• Skills / Methods
• Tools
14© Cloudera, Inc. All rights reserved.
Knowledge Extraction for
Better Data Science
15© Cloudera, Inc. All rights reserved.
Science:
According to Wikipedia:
Science is a systematic
enterprise that builds and
organizes knowledge in the
form of testable explanations
and predictions about
the universe.
https://en.wikipedia.org/wiki/Science
16© Cloudera, Inc. All rights reserved.
Data Science:
My observation:
Commercial Data Science
is a systematic enterprise
that builds and organizes
knowledge in the form of
testable explanations and
predictions about the
market / business context.
https://en.wikipedia.org/wiki/Infographic#/media/File:Gartner_Hype_Cycle_for_Emerging_Technologies.gif
17© Cloudera, Inc. All rights reserved.
Details
Look into nature ….
18© Cloudera, Inc. All rights reserved.
Context
Look into nature ….
19© Cloudera, Inc. All rights reserved.
Result: Visualization of Facts
• An image shows what the text says.
> Multi-channel communication
• Data Science benefits from such an approach.
> Today we still use infographics
Difference:
Biologist who created this one on the left observed by
eye. Today, we use more and
more data analysis methods.
20© Cloudera, Inc. All rights reserved.
Process: Knowledge Extraction is a Natural Process
• Combine multiple sources
• Repeat observation
• Incorporate context to explain
differences/variation
• Cross-checks to identify
anomalies
21© Cloudera, Inc. All rights reserved.
Process: Knowledge Extraction is a Natural Process
Knowledge
Facts
Data
22© Cloudera, Inc. All rights reserved.
How did we implement EMDM?
- Hadoop Based: for scalability.
- Open Graph Data Model: for flexibility and connectivity
- Data Centric: following the Big Data paradigm
23© Cloudera, Inc. All rights reserved.
Big Data Processing:
e.g., with Hadoop
24© Cloudera, Inc. All rights reserved.
Big Graph Processing on Hadoop:
e.g., with Giraph
25© Cloudera, Inc. All rights reserved.
Project Name should stand for:
Graphs, Hadoop, and the ecosystem …
26© Cloudera, Inc. All rights reserved.
Project Name should stand for:
Graphs, Hadoop, and the ecosystem …
27© Cloudera, Inc. All rights reserved.
Data Science Process Model (DSPM)
• DSPM defines core artifacts for knowledge management
• Describes analysis / transformation context
• Allows repeatable execution
• Process properties become measurable
• Supports comparison of results from multiple procedures
• All those fatcs are essential ingredients to business optimization.
• But: Logging & tracking should never block creativity!
• Remember: Scientists often act like artists.
• Idea & Vision
• Material
• Skills / Methods
• Tools
Toolbox and
Management Methods
28© Cloudera, Inc. All rights reserved.
Data Science Process Model (DSPM)
• Idea & Vision
• Material
• Skills / Methods
• Tools
Representation of domain knowledge
(in our case it is data science in general)
Human
Interaction
Ontology Toolbox and
Management Methods
Ability to solve
a problem using
IT and data
Technology Aspects
- represent and inter-
act with facts & data
Data Governance
Certified QM
29© Cloudera, Inc. All rights reserved.
• Idea & Vision
• Material
• Skills / Methods
• Tools
Semantic Logging
• Property with name: (K,V) : key-value-pair
• Property of a thing: S => (K,V) : (S,P,O) is a triple
K becomes P; V becomes O
• Many of those triples in one common context with name G:
G => (S,P,O) is called quad or named graph
• Log4J is the logging standard we build on.
• Using structured data instead of plain strings allows easy parsing
(e.g., apache log format).
• Triple representation avoids specific parsing and makes log data
part of the linked data graph.
30© Cloudera, Inc. All rights reserved.
• Idea & Vision
• Material
• Skills / Methods
• Tools
Etosha Toolbox
Data extractors,
Data transformers,
Ontology based orchestration,
People and machines,
contribute facts,
Iterative approach with
closed feedback-loops,
Scalable environment …
C
O
N
C
E
P
T
31© Cloudera, Inc. All rights reserved.
• Idea & Vision
• Material
• Skills / Methods
• Tools
Multi-layer metadata capturing
Operational metrics
Metrics about fast & static data
Business metrics
Contextualized presentation
Ad-hoc queries for exploration
Graph-analytics
> Knowledge exposure
> Self-Service DS and BI can
speak the same language.
I
N
I
T
I
A
L
I
M
P
L
E
M
E
N
T
A
T
I
O
N
32© Cloudera, Inc. All rights reserved.
Results: Access Facts & Context of Critical Processes
DEMO of context exploration:
https://www.youtube.com/watch?v=ZE7Gcanv90s&feature=youtu.be
33© Cloudera, Inc. All rights reserved.
Results: Better Collaboration for
(Hadoop) Knowledge Workers
• Our Achievements:
• The open graph model is language-, OS-, and hardware-independent.
• Merging of knowledge partitions enables cluster spanning metadata exploration.
• Query beans expose facts from multiple stores to a web-based interfaces.
• Next Steps:
• Improve implicit triplification (Query Solr-index and get RDF data)
• Standardize the process and integrate with existing ontologies.
• Grow a community … and enter the Apache Incubator.
34© Cloudera, Inc. All rights reserved.
Thank you!
mirko@cloudera.com
@semanpix
Ad

More Related Content

What's hot (20)

Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Databricks
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
Valdas Maksimavičius
 
Future of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native worldFuture of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native world
Srivatsan Srinivasan
 
Lambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingLambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB Testing
Trieu Nguyen
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Deepak Chandramouli
 
platform for Machine Learning
 platform for Machine Learning platform for Machine Learning
platform for Machine Learning
SivapriyaS12
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
Databricks
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
Itai Yaffe
 
Quantum Computing: The next new technology in computing
Quantum Computing: The next new technology in computingQuantum Computing: The next new technology in computing
Quantum Computing: The next new technology in computing
Data Con LA
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
John Archer
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
Guido Schmutz
 
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Khalid Salama
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
 
TechEvent Building a Data Lake
TechEvent Building a Data LakeTechEvent Building a Data Lake
TechEvent Building a Data Lake
Trivadis
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Big Data Spain
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
Novita Sari
 
Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019
Steven Moy
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0 Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0
ShiHeng1
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j
 
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for DatabricksMigrate and Modernize Hadoop-Based Security Policies for Databricks
Migrate and Modernize Hadoop-Based Security Policies for Databricks
Databricks
 
Future of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native worldFuture of Data Platform in Cloud Native world
Future of Data Platform in Cloud Native world
Srivatsan Srinivasan
 
Lambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingLambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB Testing
Trieu Nguyen
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Deepak Chandramouli
 
platform for Machine Learning
 platform for Machine Learning platform for Machine Learning
platform for Machine Learning
SivapriyaS12
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
Databricks
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
Itai Yaffe
 
Quantum Computing: The next new technology in computing
Quantum Computing: The next new technology in computingQuantum Computing: The next new technology in computing
Quantum Computing: The next new technology in computing
Data Con LA
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
John Archer
 
Architecture of Big Data Solutions
Architecture of Big Data SolutionsArchitecture of Big Data Solutions
Architecture of Big Data Solutions
Guido Schmutz
 
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Khalid Salama
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Big Data Spain
 
TechEvent Building a Data Lake
TechEvent Building a Data LakeTechEvent Building a Data Lake
TechEvent Building a Data Lake
Trivadis
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan KolmarNext generation Polyglot Architectures using Neo4j by Stefan Kolmar
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Big Data Spain
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
Novita Sari
 
Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019Data Mesh @ Yelp - 2019
Data Mesh @ Yelp - 2019
Steven Moy
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0 Privacy-Preserving AI Network - PlatON 2.0
Privacy-Preserving AI Network - PlatON 2.0
ShiHeng1
 
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j
 

Similar to Enterprise Metadata Integration, Cloudera (20)

Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
The Hive
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
DataWorks Summit
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
Cloudera, Inc.
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
Cloudera, Inc.
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Juarez Junior
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
Cloudera, Inc.
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
Cloudera, Inc.
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
hadooparchbook
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

Cloudera, Inc.
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
Cloudera, Inc.
 
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Data Con LA
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
Josh Yeh
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
Toronto-Oracle-Users-Group
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
DataStax Academy
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
Dataconomy Media
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
Data Science in the Enterprise
Data Science in the EnterpriseData Science in the Enterprise
Data Science in the Enterprise
The Hive
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
DataWorks Summit
 
The Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine LearningThe Vision & Challenge of Applied Machine Learning
The Vision & Challenge of Applied Machine Learning
Cloudera, Inc.
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
Cloudera, Inc.
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Juarez Junior
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

Cloudera, Inc.
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
Cloudera, Inc.
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
Cloudera, Inc.
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
hadooparchbook
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

Cloudera, Inc.
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
Cloudera, Inc.
 
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Big Data Day LA 2015 - Brainwashed: Building an IDE for Feature Engineering b...
Data Con LA
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
Josh Yeh
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Cloudera, Inc.
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
DataStax Academy
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
Dataconomy Media
 
Ad

More from Neo4j (20)

Graphs & GraphRAG - Essential Ingredients for GenAI
Graphs & GraphRAG - Essential Ingredients for GenAIGraphs & GraphRAG - Essential Ingredients for GenAI
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
Neo4j Knowledge for Customer Experience.pptx
Neo4j Knowledge for Customer Experience.pptxNeo4j Knowledge for Customer Experience.pptx
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
GraphTalk New Zealand - The Art of The Possible.pptx
GraphTalk New Zealand - The Art of The Possible.pptxGraphTalk New Zealand - The Art of The Possible.pptx
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
Neo4j: The Art of the Possible with Graph
Neo4j: The Art of the Possible with GraphNeo4j: The Art of the Possible with Graph
Neo4j: The Art of the Possible with Graph
Neo4j
 
Smarter Knowledge Graphs For Public Sector
Smarter Knowledge Graphs For Public  SectorSmarter Knowledge Graphs For Public  Sector
Smarter Knowledge Graphs For Public Sector
Neo4j
 
GraphRAG and Knowledge Graphs Exploring AI's Future
GraphRAG and Knowledge Graphs Exploring AI's FutureGraphRAG and Knowledge Graphs Exploring AI's Future
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
Matinée GenAI & GraphRAG Paris - Décembre 24
Matinée GenAI & GraphRAG Paris - Décembre 24Matinée GenAI & GraphRAG Paris - Décembre 24
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
ANZ Presentation: GraphSummit Melbourne 2024
ANZ Presentation: GraphSummit Melbourne 2024ANZ Presentation: GraphSummit Melbourne 2024
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
Démonstration Digital Twin Building Wire Management
Démonstration Digital Twin Building Wire ManagementDémonstration Digital Twin Building Wire Management
Démonstration Digital Twin Building Wire Management
Neo4j
 
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
Démonstration Supply Chain - GraphTalk Paris
Démonstration Supply Chain - GraphTalk ParisDémonstration Supply Chain - GraphTalk Paris
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
The Art of Possible - GraphTalk Paris Opening Session
The Art of Possible - GraphTalk Paris Opening SessionThe Art of Possible - GraphTalk Paris Opening Session
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
How Siemens bolstered supply chain resilience with graph-powered AI insights ...How Siemens bolstered supply chain resilience with graph-powered AI insights ...
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j Graph Data Modelling Session - GraphTalkNeo4j Graph Data Modelling Session - GraphTalk
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
Neo4j: The Art of Possible with Graph Technology
Neo4j: The Art of Possible with Graph TechnologyNeo4j: The Art of Possible with Graph Technology
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life SciencesAstra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Neo4j
 
Graphs & GraphRAG - Essential Ingredients for GenAI
Graphs & GraphRAG - Essential Ingredients for GenAIGraphs & GraphRAG - Essential Ingredients for GenAI
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
Neo4j Knowledge for Customer Experience.pptx
Neo4j Knowledge for Customer Experience.pptxNeo4j Knowledge for Customer Experience.pptx
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
GraphTalk New Zealand - The Art of The Possible.pptx
GraphTalk New Zealand - The Art of The Possible.pptxGraphTalk New Zealand - The Art of The Possible.pptx
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
Neo4j: The Art of the Possible with Graph
Neo4j: The Art of the Possible with GraphNeo4j: The Art of the Possible with Graph
Neo4j: The Art of the Possible with Graph
Neo4j
 
Smarter Knowledge Graphs For Public Sector
Smarter Knowledge Graphs For Public  SectorSmarter Knowledge Graphs For Public  Sector
Smarter Knowledge Graphs For Public Sector
Neo4j
 
GraphRAG and Knowledge Graphs Exploring AI's Future
GraphRAG and Knowledge Graphs Exploring AI's FutureGraphRAG and Knowledge Graphs Exploring AI's Future
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
Matinée GenAI & GraphRAG Paris - Décembre 24
Matinée GenAI & GraphRAG Paris - Décembre 24Matinée GenAI & GraphRAG Paris - Décembre 24
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
ANZ Presentation: GraphSummit Melbourne 2024
ANZ Presentation: GraphSummit Melbourne 2024ANZ Presentation: GraphSummit Melbourne 2024
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
Démonstration Digital Twin Building Wire Management
Démonstration Digital Twin Building Wire ManagementDémonstration Digital Twin Building Wire Management
Démonstration Digital Twin Building Wire Management
Neo4j
 
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
Démonstration Supply Chain - GraphTalk Paris
Démonstration Supply Chain - GraphTalk ParisDémonstration Supply Chain - GraphTalk Paris
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
The Art of Possible - GraphTalk Paris Opening Session
The Art of Possible - GraphTalk Paris Opening SessionThe Art of Possible - GraphTalk Paris Opening Session
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
How Siemens bolstered supply chain resilience with graph-powered AI insights ...How Siemens bolstered supply chain resilience with graph-powered AI insights ...
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j Graph Data Modelling Session - GraphTalkNeo4j Graph Data Modelling Session - GraphTalk
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
Neo4j: The Art of Possible with Graph Technology
Neo4j: The Art of Possible with Graph TechnologyNeo4j: The Art of Possible with Graph Technology
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life SciencesAstra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Neo4j
 
Ad

Recently uploaded (20)

What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 

Enterprise Metadata Integration, Cloudera

  • 1. 1© Cloudera, Inc. All rights reserved. Enterprise Metadata Integration Mirko Kämpf | Cloudera GraphConnect 2017 – London
  • 2. 2© Cloudera, Inc. All rights reserved. Who is speaking? Solutions Architect @ Cloudera -time series analysis, network analysis, data enrichment pipelines -personal interest: QA-Systems and semantic search Data Science Activities The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks (PLOS ONE, 2015) Hadoop.TS (IJCA, 2013) Fluctuations in Wikipedia Access-Rate and Edit-Event Data. (Physica A, 2012).
  • 3. 3© Cloudera, Inc. All rights reserved. Our Approach: Multilayer Metadata Integration … • Status dashboards are provided per Use-Case. • Each dashboard offers facts from multiple layers: - (L1) technical layer - (L2) operational metadata (Hadoop specific only) - (L3) application specific operational metadata - (L4) quality metrics (second order metadata) • Our Achievements: • Graph database (Neo4J) allows context exploration. • Cluster spanning metadata exploration is possible now. • Exposure of inherent but sometimes hidden facts becomes as easy as writing an email. Integration of facts to gain business knowledge
  • 4. 4© Cloudera, Inc. All rights reserved. Intro
  • 5. 5© Cloudera, Inc. All rights reserved. People do mining … for centuries! http://www.montanregion-erzgebirge.de/welterbe-erleben/montanregion-fuer-bergbauspezialisten/geschichtliches.html gold & diamonds, ore & coal, minerals, oil … Outcome drives whole economy
  • 6. 6© Cloudera, Inc. All rights reserved. People use computers … for decades! 1938 Z1: World’s first free programmable device, created by Conrad Zuse. U.S. Department of Energy uses Intel Supercomputer at Argonne National Laboratory. 2015 http://www.intel.com/content/dam/www/public/us/en/images/photography-business/RWD/aurora-aerial-reflection-floor-rwd.png http://www.horst-zuse.homepage.t-online.de/z1.html
  • 7. 7© Cloudera, Inc. All rights reserved. DATA MINING http://codecondo.com/9-free-books-for-learning-data-mining-data-analysis/ Blog: About Learning Data Mining & Data Analysis
  • 8. 8© Cloudera, Inc. All rights reserved. If data is the new oil … … metadata are nuggets and brilliants of our age. Screenshot taken from: https://www.quora.com/Who-should-get-credit-for-the-quote-data-is-the-new-oil
  • 9. 9© Cloudera, Inc. All rights reserved. Diamonds: beautiful even as raw material Brilliant: result of expert’s work Even more exciting in combination with other material and skills …
  • 10. 10© Cloudera, Inc. All rights reserved. • Idea & Vision • Material • Skills / Methods • Tools Success Factors: http://www.burkhard-beyer.net/Reportage_Goldschmied.html
  • 11. 11© Cloudera, Inc. All rights reserved. Be very careful with initial success … … work towards a professional level! High quality and reproducibility are results of a Professional Management It is hard to believe what you can get and which options arise … Manage overwhelming excitement! Start new activities not randomly …
  • 12. 12© Cloudera, Inc. All rights reserved. Let’s Think Data Driven! • Build a mid-term or better a long-term strategy. • Try to stay independent of a particular technology or tool. Not the fancy toolset but rather data is what matters most. • After initial success you should slow down and control speed of expansion. • Focus on: maximized accessibility of data. Google’s goal was to make the data of the internet accessible. You should become your own Google! • Idea & Vision • Material • Skills / Methods • Tools
  • 13. 13© Cloudera, Inc. All rights reserved. Dataset Profiles / Flow Descriptors •Our material is data & metadata: - Data about data : descriptive data, Dublin core metadata model, … - Derived data : statistics extracted from processes, documents, … - Results of ML/AI procedures : extracted structure and learned models - Outcome of crowd based operations : Wikipedia with its inherent structure, communication logs, access and edit history. • Idea & Vision • Material • Skills / Methods • Tools
  • 14. 14© Cloudera, Inc. All rights reserved. Knowledge Extraction for Better Data Science
  • 15. 15© Cloudera, Inc. All rights reserved. Science: According to Wikipedia: Science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe. https://en.wikipedia.org/wiki/Science
  • 16. 16© Cloudera, Inc. All rights reserved. Data Science: My observation: Commercial Data Science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the market / business context. https://en.wikipedia.org/wiki/Infographic#/media/File:Gartner_Hype_Cycle_for_Emerging_Technologies.gif
  • 17. 17© Cloudera, Inc. All rights reserved. Details Look into nature ….
  • 18. 18© Cloudera, Inc. All rights reserved. Context Look into nature ….
  • 19. 19© Cloudera, Inc. All rights reserved. Result: Visualization of Facts • An image shows what the text says. > Multi-channel communication • Data Science benefits from such an approach. > Today we still use infographics Difference: Biologist who created this one on the left observed by eye. Today, we use more and more data analysis methods.
  • 20. 20© Cloudera, Inc. All rights reserved. Process: Knowledge Extraction is a Natural Process • Combine multiple sources • Repeat observation • Incorporate context to explain differences/variation • Cross-checks to identify anomalies
  • 21. 21© Cloudera, Inc. All rights reserved. Process: Knowledge Extraction is a Natural Process Knowledge Facts Data
  • 22. 22© Cloudera, Inc. All rights reserved. How did we implement EMDM? - Hadoop Based: for scalability. - Open Graph Data Model: for flexibility and connectivity - Data Centric: following the Big Data paradigm
  • 23. 23© Cloudera, Inc. All rights reserved. Big Data Processing: e.g., with Hadoop
  • 24. 24© Cloudera, Inc. All rights reserved. Big Graph Processing on Hadoop: e.g., with Giraph
  • 25. 25© Cloudera, Inc. All rights reserved. Project Name should stand for: Graphs, Hadoop, and the ecosystem …
  • 26. 26© Cloudera, Inc. All rights reserved. Project Name should stand for: Graphs, Hadoop, and the ecosystem …
  • 27. 27© Cloudera, Inc. All rights reserved. Data Science Process Model (DSPM) • DSPM defines core artifacts for knowledge management • Describes analysis / transformation context • Allows repeatable execution • Process properties become measurable • Supports comparison of results from multiple procedures • All those fatcs are essential ingredients to business optimization. • But: Logging & tracking should never block creativity! • Remember: Scientists often act like artists. • Idea & Vision • Material • Skills / Methods • Tools Toolbox and Management Methods
  • 28. 28© Cloudera, Inc. All rights reserved. Data Science Process Model (DSPM) • Idea & Vision • Material • Skills / Methods • Tools Representation of domain knowledge (in our case it is data science in general) Human Interaction Ontology Toolbox and Management Methods Ability to solve a problem using IT and data Technology Aspects - represent and inter- act with facts & data Data Governance Certified QM
  • 29. 29© Cloudera, Inc. All rights reserved. • Idea & Vision • Material • Skills / Methods • Tools Semantic Logging • Property with name: (K,V) : key-value-pair • Property of a thing: S => (K,V) : (S,P,O) is a triple K becomes P; V becomes O • Many of those triples in one common context with name G: G => (S,P,O) is called quad or named graph • Log4J is the logging standard we build on. • Using structured data instead of plain strings allows easy parsing (e.g., apache log format). • Triple representation avoids specific parsing and makes log data part of the linked data graph.
  • 30. 30© Cloudera, Inc. All rights reserved. • Idea & Vision • Material • Skills / Methods • Tools Etosha Toolbox Data extractors, Data transformers, Ontology based orchestration, People and machines, contribute facts, Iterative approach with closed feedback-loops, Scalable environment … C O N C E P T
  • 31. 31© Cloudera, Inc. All rights reserved. • Idea & Vision • Material • Skills / Methods • Tools Multi-layer metadata capturing Operational metrics Metrics about fast & static data Business metrics Contextualized presentation Ad-hoc queries for exploration Graph-analytics > Knowledge exposure > Self-Service DS and BI can speak the same language. I N I T I A L I M P L E M E N T A T I O N
  • 32. 32© Cloudera, Inc. All rights reserved. Results: Access Facts & Context of Critical Processes DEMO of context exploration: https://www.youtube.com/watch?v=ZE7Gcanv90s&feature=youtu.be
  • 33. 33© Cloudera, Inc. All rights reserved. Results: Better Collaboration for (Hadoop) Knowledge Workers • Our Achievements: • The open graph model is language-, OS-, and hardware-independent. • Merging of knowledge partitions enables cluster spanning metadata exploration. • Query beans expose facts from multiple stores to a web-based interfaces. • Next Steps: • Improve implicit triplification (Query Solr-index and get RDF data) • Standardize the process and integrate with existing ontologies. • Grow a community … and enter the Apache Incubator.
  • 34. 34© Cloudera, Inc. All rights reserved. Thank you! [email protected] @semanpix

Editor's Notes

  • #4: Results tell us about very specific properties of the system: Lets look into a thermodynamics: http://images.google.de/imgres?imgurl=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttp%2F3.bp.blogspot.com%2F-tEkIR2kcyCY%2FVEcQJGrqb3I%2FAAAAAAAAABU%2F9Nj4hxeuqa0%2Fs1600%2FTHAI1.jpg&imgrefurl=https%3A%2F%2Fptop.only.wip.la%3A443%2Fhttp%2Fkonwersatorium1-ms-pjwstk.blogspot.com%2F2014%2F10%2Fthe-human-artificial-intelligence_22.html&h=958&w=965&tbnid=WscyQ01kH-s7CM%3A&docid=sGVehcJYs2-e1M&ei=gy6aV4zmJMX1UqSwsYAO&tbm=isch&iact=rc&uact=3&dur=774&page=1&start=0&ndsp=36&ved=0ahUKEwjMs_6BxpbOAhXFuhQKHSRYDOAQMwhEKAowCg&bih=1058&biw=1804 https://openclipart.org/download/242296/remix-fossasia-2016-contest4.svg
  • #6: http://www.montanregion-erzgebirge.de/welterbe-erleben/montanregion-fuer-bergbauspezialisten/geschichtliches.html
  • #7: http://www.horst-zuse.homepage.t-online.de/z1.html Z1 Der Rechner Z1 gilt als der erste frei programmierbare Rechner der Welt. Er wurde 1938 fertiggestellt und vollständig aus privaten Mitteln finanziert. Konrad Zuses erster - in den Jahren 1936-1938 - entstandener Rechner Z1 wurde ein Opfer der Bomben des 2. Weltkrieges und mit ihm sämtliche Konstruktionsunterlagen. Im Jahr 1986 entschloß sich Konrad Zuse, den Rechner Z1 nachzubauen. Der Rechner Z1 enthält alle Bausteine eines modernen Computers, wie z.B. Leitwerk, Programmsteuerung, Speicher, Mikrosequenzen, Gleitkommarithmetik. Konrad Zuse konstruierte die Z1 in der elterlichen Wohnung. Dort wurde ihm dafür das Wohnzimmer von seinen Eltern zur Verfügung gestellt. Um den Rechner Z1 zu bauen, gab Zuse 1936 seine Stelle bei den Henschel Flugzeugwerken auf und richtete die Werkstatt im Wohnzimmer seiner Eltern ein. Die Eltern Zuses waren von dem Vorhaben nicht gerade begeistert, unterstützten ihn aber wo sie konnten.
  • #8: http://codecondo.com/9-free-books-for-learning-data-mining-data-analysis/
  • #9: https://www.quora.com/Who-should-get-credit-for-the-quote-data-is-the-new-oil
  • #10: OBEN: https://pagewizz.com/edelsteine-1/
  • #11: http://www.burkhard-beyer.net/Reportage_Goldschmied.html
  • #14: This has to be managed or culticated. Creativity is good but often not scalable !!!
  • #16: wikipedia
  • #17: https://en.wikipedia.org/wiki/Infographic#/media/File:Gartner_Hype_Cycle_for_Emerging_Technologies.gif
  • #18: wikipedia
  • #19: wikipedia
  • #20: wikipedia
  • #21: http://clipart-work.net/clipart/onion-clipart.html
  • #22: http://clipart-work.net/clipart/onion-clipart.html
  • #30: CSV => Neo4J https://www.youtube.com/watch?v=Eh_79goBRUk https://blog.logentries.com/2016/06/self-describing-logging-using-log4j/ => JSON Structure contains meaning => Using a standard format gives us Semantic-Logging
  • #31: There are two key characteristics of RDF stores (aka triple stores): the first and by far the most relevant is that they represent, store and query data as a graph. The second is that they are semantic, which is a rather pompous way of saying that they can store not only data but also explicit descriptions of the meaning of that data. The RDF and linked data community often refer to these explicit descriptions as ontologies. In case you’re not familiar with the concept, an ontology is a machine readable description of a domain that typically includes a vocabulary of terms and some specification of how these terms inter-relate, imposing a structure on the data for such domain. This is also known as a schema. In this post both terms schema and ontology will be used interchangeably to refer to these explicitly described semantics. https://github.com/SciGraph/SciGraph/wiki/Neo4jMapping