SlideShare a Scribd company logo
Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deep Learning using Spark and DL4J for
fun and profit
Adam Gibson and Dhruv Kumar
2015
Version 1.0
Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Who are we?
Adam Gibson
- Co founder of Skymind
- Wrote DeepLearning4J, ND4J
Dhruv Kumar
- Sr Solutions Architect, HWX
- MS Umass, Mahout, ASF
Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
In this talk
- What’s Deep Learning?
- Architectures
- Implementation and Libraries in Real Life
- Demo!
Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deep Learning
• One of the many pattern recognition techniques in Data
Science
• Excels at rich media applications:
• Image recognition
• Speech translation
• Voice recognition
• Loosely inspired by human brain models
• Synonymous with Artificial Neural Networks, Multi Layer
Networks
Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Enterprise use cases
Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Doing this in real life for enterprise
Page7 © Hortonworks Inc. 2011 – 2014. All Rights ReservedPage7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP FOR DATA AT
REST
HDF FOR DATA IN
MOTION
ACTIONABLE
INTELLIGENCE
MODERN DATA APPS
Modern Data Applications
in Enterprise: Connected,
Fast, Intelligent
PERISHABLE
INSIGHTS
HISTORICAL
INSIGHTS
INTERNET
OF
ANYTHING
How do we realize MDA in a Hadoop Centric World?
HDF
Hadoop
HDFS
HBase Hive SOLR
YARN
Storm
Service
Management /
Workflow
SIEM
Spark
Raw Network Stream
Network Metadata Stream
Data Stores
Syslog
Raw Application Logs
Other Streaming Telemetry
www.hortonworks.com
NiFi 1
NiFi 2
Storm 1
Kafka 1
Storm 2
Kafka 2
Storm 3
Kafka 3
DataNode 1
HBase 1
Source 1
Source 2
Source 3
Source N
NiFi Nodes
Edge Nodes
Master NodesClients 1
Clients 2
DataNode 2
Hbase 2
DataNode 3
Hbase 3
DataNode 4
Hbase 4
DataNode 5
Hbase 5
DataNode 6
Hbase 6
DataNode 7
Hbase 7
DataNode 8
Hbase 8
DataNode 9 DataNode 10
DataNode 31 DataNode 32
Master 1
Master 2
Master 3
Master 4
Master 5
Worker Nodes
HDF
HDP
World Azure
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Storm/Spark Streaming
Storm
Detailed Reference Architecture
HDF
Flume
Sink to
HDFS
Transform
Interactive
UI Framework
Hive
Hive
HDFS
HDFS
SOURCE DATA
Server logs
Application Logs
Firewall Logs
CRM/ERP
Sensor
Kafka
Kafka
Stream to
HDF
Forward to
Storm
Real Time Storage
Spark-ML
Pig
Alerts
Bolt to
HDFS
Dashboard
Silk
JMS
Alerts
Hive Server
HiveServer
Reporting
BI Tools
High Speed
Ingest
Real-Time
Batch Interactive
Machine Learning
Models
Spark
Pig
Alerts SQOOP
Flume
Iterative ML
Hbase/Pheonix
HBaseEvent Enrichment
Spark-Thrift
Pig
Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
For Model Building: Typical Workflow
11
1.Ingest training data and store it
2.Split data set into: training, testing and validation sets
3.Vectorize and extract features to go into next step
4.Architect multi layer network, initialize
5.Feed data and train
6.Test and Validate
7.Repeat steps 4 and 5 until desired
8.Store model
9.Put model in app, start generalizing on real data.
Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
So what do you get?
12
1.Ingest training data and store it using Nifi or other ingest tools
2.Split data set into: training, testing and validation sets
3.Vectorize and extract features to go into next step
4.Architect multi layer network, initialize
5.Feed data and train
6.Test and Validate
7.Repeat steps 4 and 5 until desired
8.Store model
9.Put model in app, start generalizing on real data.
Steps 2, 3, 4 and 5:
Use libraries such as
Deeplearning4j
Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deeplearning4j Architecture
13
Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
DL4J: Canova for Vectorization and Ingest
• Canova uses an input/output format system (similar to
how Hadoop uses MapReduce)
• Supports all major types of input data (text, CSV, audio,
image and video)
• Can be extended for specialized input formats
• Connects to Kafka
14
Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
ND4J:
• N-dimensional vector library
• Scientific computing for JVM
• DL4J uses it to do linear algebra for backpropagation
• Supports GPUs via CUDA and Native via Jblas
• Deploys on Android
• DL4J code remains unchanged whether using GPU or
CPU
15
Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 16
How to chose a
Neural Net in
DL4J core?
Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Demo!
Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Thank You
hortonworks.com
Ad

More Related Content

What's hot (20)

Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
DataWorks Summit
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
DataWorks Summit
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
DataWorks Summit/Hadoop Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
DataWorks Summit
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
Hybrid Data Platform
Hybrid Data Platform Hybrid Data Platform
Hybrid Data Platform
DataWorks Summit/Hadoop Summit
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
DataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
Jan Pieter Posthuma
 
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
DataWorks Summit/Hadoop Summit
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
DataWorks Summit
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
DataWorks Summit
 
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARNDeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
 
Big Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeNBig Data Simplified - Is all about Ab'strakSHeN
Big Data Simplified - Is all about Ab'strakSHeN
DataWorks Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
DataWorks Summit
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
DataWorks Summit
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
Jan Pieter Posthuma
 
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
DataWorks Summit/Hadoop Summit
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
DataWorks Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
 

Viewers also liked (20)

Brief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep LearningBrief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep Learning
Adam Gibson
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
DataWorks Summit
 
I'm being followed by drones
I'm being followed by dronesI'm being followed by drones
I'm being followed by drones
DataWorks Summit/Hadoop Summit
 
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
Yahoo Developer Network
 
Deep Learning for Java (DL4J)
Deep Learning for Java (DL4J)Deep Learning for Java (DL4J)
Deep Learning for Java (DL4J)
신동 강
 
Distributed deep rl on spark strata singapore
Distributed deep rl on spark   strata singaporeDistributed deep rl on spark   strata singapore
Distributed deep rl on spark strata singapore
Adam Gibson
 
Deep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecDeep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVec
Josh Patterson
 
Drones and the Internet of Things: realising the potential of airborne comput...
Drones and the Internet of Things: realising the potential of airborne comput...Drones and the Internet of Things: realising the potential of airborne comput...
Drones and the Internet of Things: realising the potential of airborne comput...
Prayukth K V
 
Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016
Adam Gibson
 
Drones Collaboration and IoT enabling digitalization of sensors - Angelo Fien...
Drones Collaboration and IoT enabling digitalization of sensors - Angelo Fien...Drones Collaboration and IoT enabling digitalization of sensors - Angelo Fien...
Drones Collaboration and IoT enabling digitalization of sensors - Angelo Fien...
Codemotion
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
Adam Gibson
 
Recurrent nets and sensors
Recurrent nets and sensorsRecurrent nets and sensors
Recurrent nets and sensors
Adam Gibson
 
Hubba Deep Learning
Hubba Deep LearningHubba Deep Learning
Hubba Deep Learning
Ivan Goloskokovic
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
Matthias Feys
 
Business Disruption with Robots, Drones and Bots The IoT Tech Expo Oct 20 2016
Business Disruption with Robots, Drones and Bots The IoT Tech Expo Oct 20 2016Business Disruption with Robots, Drones and Bots The IoT Tech Expo Oct 20 2016
Business Disruption with Robots, Drones and Bots The IoT Tech Expo Oct 20 2016
Sudha Jamthe
 
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopDeep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Hortonworks
 
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraphlarge scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraph
DataWorks Summit
 
Future of ai on the jvm
Future of ai on the jvmFuture of ai on the jvm
Future of ai on the jvm
Adam Gibson
 
Skymind 深度学习 - T11 Summit
Skymind 深度学习 - T11 SummitSkymind 深度学习 - T11 Summit
Skymind 深度学习 - T11 Summit
Shu Wei Goh
 
Deep Learning on Production with Spark
Deep Learning on Production with SparkDeep Learning on Production with Spark
Deep Learning on Production with Spark
Shu Wei Goh
 
Brief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep LearningBrief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep Learning
Adam Gibson
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
DataWorks Summit
 
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
Yahoo Developer Network
 
Deep Learning for Java (DL4J)
Deep Learning for Java (DL4J)Deep Learning for Java (DL4J)
Deep Learning for Java (DL4J)
신동 강
 
Distributed deep rl on spark strata singapore
Distributed deep rl on spark   strata singaporeDistributed deep rl on spark   strata singapore
Distributed deep rl on spark strata singapore
Adam Gibson
 
Deep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecDeep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVec
Josh Patterson
 
Drones and the Internet of Things: realising the potential of airborne comput...
Drones and the Internet of Things: realising the potential of airborne comput...Drones and the Internet of Things: realising the potential of airborne comput...
Drones and the Internet of Things: realising the potential of airborne comput...
Prayukth K V
 
Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016
Adam Gibson
 
Drones Collaboration and IoT enabling digitalization of sensors - Angelo Fien...
Drones Collaboration and IoT enabling digitalization of sensors - Angelo Fien...Drones Collaboration and IoT enabling digitalization of sensors - Angelo Fien...
Drones Collaboration and IoT enabling digitalization of sensors - Angelo Fien...
Codemotion
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
Adam Gibson
 
Recurrent nets and sensors
Recurrent nets and sensorsRecurrent nets and sensors
Recurrent nets and sensors
Adam Gibson
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
Matthias Feys
 
Business Disruption with Robots, Drones and Bots The IoT Tech Expo Oct 20 2016
Business Disruption with Robots, Drones and Bots The IoT Tech Expo Oct 20 2016Business Disruption with Robots, Drones and Bots The IoT Tech Expo Oct 20 2016
Business Disruption with Robots, Drones and Bots The IoT Tech Expo Oct 20 2016
Sudha Jamthe
 
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopDeep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Hortonworks
 
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraphlarge scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraph
DataWorks Summit
 
Future of ai on the jvm
Future of ai on the jvmFuture of ai on the jvm
Future of ai on the jvm
Adam Gibson
 
Skymind 深度学习 - T11 Summit
Skymind 深度学习 - T11 SummitSkymind 深度学习 - T11 Summit
Skymind 深度学习 - T11 Summit
Shu Wei Goh
 
Deep Learning on Production with Spark
Deep Learning on Production with SparkDeep Learning on Production with Spark
Deep Learning on Production with Spark
Shu Wei Goh
 
Ad

Similar to Deep Learning using Spark and DL4J for fun and profit (20)

Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
Raúl Marín
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Curb your insecurity with HDP
Curb your insecurity with HDPCurb your insecurity with HDP
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
Rommel Garcia
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
Rommel Garcia
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
MapR Technologies
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
HortonworksJapan
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Hortonworks
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
Hortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
skumpf
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
Raúl Marín
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
Rommel Garcia
 
Realtime Analytics in Hadoop
Realtime Analytics in HadoopRealtime Analytics in Hadoop
Realtime Analytics in Hadoop
Rommel Garcia
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
Ameet Paranjape
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
Hortonworks
 
Webinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop SolutionWebinar: Selecting the Right SQL-on-Hadoop Solution
Webinar: Selecting the Right SQL-on-Hadoop Solution
MapR Technologies
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
HortonworksJapan
 
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Hortonworks
 
Discover hdp 2.2 hdfs - final
Discover hdp 2.2   hdfs - finalDiscover hdp 2.2   hdfs - final
Discover hdp 2.2 hdfs - final
Hortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
skumpf
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 

Deep Learning using Spark and DL4J for fun and profit

  • 1. Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deep Learning using Spark and DL4J for fun and profit Adam Gibson and Dhruv Kumar 2015 Version 1.0
  • 2. Page2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Who are we? Adam Gibson - Co founder of Skymind - Wrote DeepLearning4J, ND4J Dhruv Kumar - Sr Solutions Architect, HWX - MS Umass, Mahout, ASF
  • 3. Page3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved In this talk - What’s Deep Learning? - Architectures - Implementation and Libraries in Real Life - Demo!
  • 4. Page4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deep Learning • One of the many pattern recognition techniques in Data Science • Excels at rich media applications: • Image recognition • Speech translation • Voice recognition • Loosely inspired by human brain models • Synonymous with Artificial Neural Networks, Multi Layer Networks
  • 5. Page5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Enterprise use cases
  • 6. Page6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Doing this in real life for enterprise
  • 7. Page7 © Hortonworks Inc. 2011 – 2014. All Rights ReservedPage7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP FOR DATA AT REST HDF FOR DATA IN MOTION ACTIONABLE INTELLIGENCE MODERN DATA APPS Modern Data Applications in Enterprise: Connected, Fast, Intelligent PERISHABLE INSIGHTS HISTORICAL INSIGHTS INTERNET OF ANYTHING
  • 8. How do we realize MDA in a Hadoop Centric World? HDF Hadoop HDFS HBase Hive SOLR YARN Storm Service Management / Workflow SIEM Spark Raw Network Stream Network Metadata Stream Data Stores Syslog Raw Application Logs Other Streaming Telemetry
  • 9. www.hortonworks.com NiFi 1 NiFi 2 Storm 1 Kafka 1 Storm 2 Kafka 2 Storm 3 Kafka 3 DataNode 1 HBase 1 Source 1 Source 2 Source 3 Source N NiFi Nodes Edge Nodes Master NodesClients 1 Clients 2 DataNode 2 Hbase 2 DataNode 3 Hbase 3 DataNode 4 Hbase 4 DataNode 5 Hbase 5 DataNode 6 Hbase 6 DataNode 7 Hbase 7 DataNode 8 Hbase 8 DataNode 9 DataNode 10 DataNode 31 DataNode 32 Master 1 Master 2 Master 3 Master 4 Master 5 Worker Nodes HDF HDP World Azure
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Storm/Spark Streaming Storm Detailed Reference Architecture HDF Flume Sink to HDFS Transform Interactive UI Framework Hive Hive HDFS HDFS SOURCE DATA Server logs Application Logs Firewall Logs CRM/ERP Sensor Kafka Kafka Stream to HDF Forward to Storm Real Time Storage Spark-ML Pig Alerts Bolt to HDFS Dashboard Silk JMS Alerts Hive Server HiveServer Reporting BI Tools High Speed Ingest Real-Time Batch Interactive Machine Learning Models Spark Pig Alerts SQOOP Flume Iterative ML Hbase/Pheonix HBaseEvent Enrichment Spark-Thrift Pig
  • 11. Page11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved For Model Building: Typical Workflow 11 1.Ingest training data and store it 2.Split data set into: training, testing and validation sets 3.Vectorize and extract features to go into next step 4.Architect multi layer network, initialize 5.Feed data and train 6.Test and Validate 7.Repeat steps 4 and 5 until desired 8.Store model 9.Put model in app, start generalizing on real data.
  • 12. Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So what do you get? 12 1.Ingest training data and store it using Nifi or other ingest tools 2.Split data set into: training, testing and validation sets 3.Vectorize and extract features to go into next step 4.Architect multi layer network, initialize 5.Feed data and train 6.Test and Validate 7.Repeat steps 4 and 5 until desired 8.Store model 9.Put model in app, start generalizing on real data. Steps 2, 3, 4 and 5: Use libraries such as Deeplearning4j
  • 13. Page13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deeplearning4j Architecture 13
  • 14. Page14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved DL4J: Canova for Vectorization and Ingest • Canova uses an input/output format system (similar to how Hadoop uses MapReduce) • Supports all major types of input data (text, CSV, audio, image and video) • Can be extended for specialized input formats • Connects to Kafka 14
  • 15. Page15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved ND4J: • N-dimensional vector library • Scientific computing for JVM • DL4J uses it to do linear algebra for backpropagation • Supports GPUs via CUDA and Native via Jblas • Deploys on Android • DL4J code remains unchanged whether using GPU or CPU 15
  • 16. Page16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 16 How to chose a Neural Net in DL4J core?
  • 17. Page17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Demo!
  • 18. Page18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Thank You hortonworks.com

Editor's Notes

  • #8: TALK TRACK I’m about to go over the products, consulting and training that Hortonworks offers, and I want you to keep this image in mind. Remember: The Internet of Anything is doubling the amount of data in the world every 2 years. Connected Data Platforms deliver an open-architected solution to manage data, both in motion and at rest, empowering your organization to gain Actionable Intelligence delivered to your end users through Modern Data Apps. Hortonworks DataFlow (aka HDF) manages your data in motion—bringing it to where you need it for real-time analysis to capture perishable insights or into storage for historical analysis. Hortonworks Data Platform (aka HDP) stores the data at rest and provides historical insights through deep, detailed analysis of everything that’s already happened. Those historical insights from HDP help optimize your data ingest with HDF, which in turn optimizes your data at rest. This is how HDF, HDP, and Modern Data Applications deliver actionable intelligence to your end users. And Actionable Intelligence is the beating heart animating the Future of Data. [NEXT SLIDE]
  • #9: CapOne – Ingesting from everywhere Email, Syslog, Applog, Netflow… Moving to “Cloud Only model”….even looking to use “docker Containers” in Amazon…
  • #11: The team puts together a detailed architecture of the proposed solution using HDP and HDF. The architecture considers sources data from the numerous sources including Server Logs, Application Logs, XML and Senso data. This data is easily accepted into the flexible schema of HDP using HDF and Sqoop. The data is processed using Pig and analyzed using Spark. Then the data is made available in a real-time dashboard as well as to visualization and reporting tools. [NEXT SLIDE]