SlideShare a Scribd company logo
JDG 7 & Spark
Integration
Infinispan Spark connector
tedwon
Agenda
● Brief Introduction to JBoss Data Grid 7
● Brief Introduction to Apache Spark
● Features of Infinispan Spark connector
● Demo
What is JBoss Data Grid
● Distributed cache
● In-memory NoSQL
● Key/value data store
● Built from the Infinispan community project
● In architecture, there is no master node
● Peer-to-peer membership clustering
● The most strength is extremely highly availability and scalability as in-memory
○ In my personal opinion
● Good harmony with real-time data processing framework for data services
● As a metaphor, In-memory HDFS for real-time processing
What is JBoss Data Grid 7
● JDG 7 was released at July, 2016
● JDG 7 is based on Infinispan 8.3.0 and EAP 7.0.0.GA
○ The previous version JDG 6.6.0 is based on Infinispan 6.4.0
● JDG 7 now becomes much more powerful as a competitive software product
● There are new features and enhancements
○ New GUI Admin console
○ Easy to install and configure clustering and cache
○ Easy to monitor and change configuration
○ Easy to create a new cache through admin console even in runtime
○ Provides API for integration with Apache Spark
JDG 7 - New Features and Enhancements
1. DISTRIBUTED STREAMS
2. REMOTE TASK EXECUTION
3. APACHE SPARK INTEGRATION
4. APACHE HADOOP INTEGRATION
5. NEW ADMINISTRATION CONSOLE FOR SERVER DEPLOYMENTS
6. CONTROLLED SHUTDOWN AND RESTART OF CLUSTER
7. NODE.JS (JAVASCRIPT) HOT ROD CLIENT
8. CASSANDRA CACHE STORE
9. HOT ROD C++ ENHANCEMENTS
10. HOT ROD C# ENHANCEMENTS
What is Apache Spark
● General-purpose Lightning-fast cluster computing framework
● General-purpose
○ One common data processing engine
○ Batch, SQL, Streaming, MLlib, Graph
○ One platform to rule them all
● Lightning-fast
○ RDD - Resilient Distributed Dataset
○ Read-only multiset of data items distributed over a cluster of machines
● Works with Hadoop HDFS and process that data in parallel
○ Distributed file System
● MapReduce interfaces with funtional programming style
○ No MR chaining workflow in Hadoop
What is Apache Spark
Apache Spark - One platform to rule them all
What is Apache Spark RDD
● RDD consists of mulitle data partitions over a cluster of machines
● RDD is intermediate data during a Spark data processing job
● RDD is distributed in memory of each worker node JVM
● Data processing job is transformations of RDDs
● RDDs are disappeared after finishing a job
● Not able to share RDDs between Spark jobs
What is Apache Spark RDD
Infinispan Spark connector
JDG 7’s new feautre
What is Infinispan Spark connector
● https://github.com/infinispan/infinispan-spark
● JDG 7 Document https://goo.gl/9BXp98
● RDD and DStream integration with Apache Spark 1.6
○ DStream is a continuous sequence of RDDs for real-time stream processing
● Use JDG as a data source for Spark
● Easy to read & write cache data in a Spark job
● Provides seamless funtional programming style and syntactic sugar
● Good to share RDD with other Spark jobs
Supported Configurations
Features of Infinispan Spark connector
1. Create an Spark RDD from a JDG cache data
○ Read cache data from Spark job
2. Write any key/value based RDD to a JDG cache
○ Write intermediate or final data to cache
3. Create a Spark DStream from cache-level events
○ Insert, Modify and Delete event in a cache
4. Write any key/value DStream to JDG
○ Write any DStream to cache
5. Use JDG server side filters to create a cache based RDD
○ Using Infinispan Query DSL
Prerequisite & Version compatibility
● JDK 8, Scala 2.10.4, Spark 1.6
● JBoss Data Grid 7.0.0 Server
○ Infinispan Server 8.2.4.Final
● Run JDG in Remote Client-Server mode
Demo
https://github.com/tedwon/infinispan-spark-connector-examples
Performance Considerations
● The number of Spark workers should be greater than JDG nodes
○ To take advantage of the parallelism
● Support locality with co-located in the same node as JDG and Spark worker
○ Spark worker only processes data in the local node with connector
Thank you
References
1. https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/7.0/html-single/Developer_Guide/index.
html#Integration_with_Apache_Spark
2. http://spark.apache.org/docs/1.6.2/programming-guide.html
3. https://github.com/infinispan/infinispan-spark
4. https://github.com/infinispan/infinispan
5. https://github.com/tedwon/infinispan-spark-connector-examples
6. https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/7.0/html-single/7.0.0_Release_Notes/in
dex.html#chap-New_Features_and_Enhancements
7. https://en.wikipedia.org/wiki/Apache_Spark
8. https://hub.docker.com/r/gustavonalle/infinispan-spark/
Ad

More Related Content

What's hot (20)

Infinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql databaseInfinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql database
Alexander Petrov
 
ClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudClustrixDB at Samsung Cloud
ClustrixDB at Samsung Cloud
MariaDB plc
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
MongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin HansonMongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin Hanson
hungarianhc
 
Cosmosdb graph
Cosmosdb graphCosmosdb graph
Cosmosdb graph
Mohit Chhabra
 
GridFS: The Perfect Solution for Media Storage
GridFS: The Perfect Solution for Media StorageGridFS: The Perfect Solution for Media Storage
GridFS: The Perfect Solution for Media Storage
MongoDB
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
MongoDB
 
Conquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to PostgresConquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to Postgres
EDB
 
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
Christopher Dubois
 
WiredTiger Overview
WiredTiger OverviewWiredTiger Overview
WiredTiger Overview
WiredTiger
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
Ashnikbiz
 
Spark Core
Spark CoreSpark Core
Spark Core
Todd McGrath
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DB
Mohit Chhabra
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
KafkaZone
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon
 
An Overview of Apache Spark
An Overview of Apache SparkAn Overview of Apache Spark
An Overview of Apache Spark
Yasoda Jayaweera
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
ElieHannouch
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
MongoDB
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
Infinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql databaseInfinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql database
Alexander Petrov
 
ClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudClustrixDB at Samsung Cloud
ClustrixDB at Samsung Cloud
MariaDB plc
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
MongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin HansonMongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin Hanson
hungarianhc
 
GridFS: The Perfect Solution for Media Storage
GridFS: The Perfect Solution for Media StorageGridFS: The Perfect Solution for Media Storage
GridFS: The Perfect Solution for Media Storage
MongoDB
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
MongoDB
 
Conquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to PostgresConquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to Postgres
EDB
 
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
Christopher Dubois
 
WiredTiger Overview
WiredTiger OverviewWiredTiger Overview
WiredTiger Overview
WiredTiger
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
Ashnikbiz
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DB
Mohit Chhabra
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
KafkaZone
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon
 
An Overview of Apache Spark
An Overview of Apache SparkAn Overview of Apache Spark
An Overview of Apache Spark
Yasoda Jayaweera
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
ElieHannouch
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
MongoDB
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 

Viewers also liked (20)

Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
Ted Won
 
RHQ 공감 Seminar 6th
RHQ 공감 Seminar 6thRHQ 공감 Seminar 6th
RHQ 공감 Seminar 6th
Ted Won
 
JBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring PlatformJBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring Platform
Ted Won
 
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
Ted Won
 
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Ted Won
 
Nara - Personalized Web Recommendation Service Quick Review
Nara - Personalized Web Recommendation Service Quick ReviewNara - Personalized Web Recommendation Service Quick Review
Nara - Personalized Web Recommendation Service Quick Review
Ted Won
 
Real-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured DataReal-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured Data
Ted Won
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Cloudera, Inc.
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
Ted Won
 
How to Avoid Problems with Lump-sum Relocation Allowances
How to Avoid Problems with Lump-sum Relocation AllowancesHow to Avoid Problems with Lump-sum Relocation Allowances
How to Avoid Problems with Lump-sum Relocation Allowances
Parsifal Corporation
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
Samuel Lampa
 
지금 핫한 Real-time In-memory Stream Processing 이야기
지금 핫한 Real-time In-memory Stream Processing 이야기지금 핫한 Real-time In-memory Stream Processing 이야기
지금 핫한 Real-time In-memory Stream Processing 이야기
Ted Won
 
Tools For jQuery Application Architecture (Extended Slides)
Tools For jQuery Application Architecture (Extended Slides)Tools For jQuery Application Architecture (Extended Slides)
Tools For jQuery Application Architecture (Extended Slides)
Addy Osmani
 
SimplifyStreamingArchitecture
SimplifyStreamingArchitectureSimplifyStreamingArchitecture
SimplifyStreamingArchitecture
Maheedhar Gunturu
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Ververica
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
Hari Shreedharan
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Charles Givre
 
Going bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesGoing bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data types
Pawel Szulc
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
Lev Brailovskiy
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
Ted Won
 
RHQ 공감 Seminar 6th
RHQ 공감 Seminar 6thRHQ 공감 Seminar 6th
RHQ 공감 Seminar 6th
Ted Won
 
JBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring PlatformJBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring Platform
Ted Won
 
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
Ted Won
 
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Ted Won
 
Nara - Personalized Web Recommendation Service Quick Review
Nara - Personalized Web Recommendation Service Quick ReviewNara - Personalized Web Recommendation Service Quick Review
Nara - Personalized Web Recommendation Service Quick Review
Ted Won
 
Real-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured DataReal-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured Data
Ted Won
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Cloudera, Inc.
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
Ted Won
 
How to Avoid Problems with Lump-sum Relocation Allowances
How to Avoid Problems with Lump-sum Relocation AllowancesHow to Avoid Problems with Lump-sum Relocation Allowances
How to Avoid Problems with Lump-sum Relocation Allowances
Parsifal Corporation
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
Hari Shreedharan
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
Samuel Lampa
 
지금 핫한 Real-time In-memory Stream Processing 이야기
지금 핫한 Real-time In-memory Stream Processing 이야기지금 핫한 Real-time In-memory Stream Processing 이야기
지금 핫한 Real-time In-memory Stream Processing 이야기
Ted Won
 
Tools For jQuery Application Architecture (Extended Slides)
Tools For jQuery Application Architecture (Extended Slides)Tools For jQuery Application Architecture (Extended Slides)
Tools For jQuery Application Architecture (Extended Slides)
Addy Osmani
 
SimplifyStreamingArchitecture
SimplifyStreamingArchitectureSimplifyStreamingArchitecture
SimplifyStreamingArchitecture
Maheedhar Gunturu
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Ververica
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Charles Givre
 
Going bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesGoing bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data types
Pawel Szulc
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
Lev Brailovskiy
 
Ad

Similar to JDG 7 & Spark Integration (20)

Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
inoshg
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
Synergetics Learning and Cloud Consulting
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Manager
haridasnss
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
Swiss Data Forum Swiss Data Forum
 
Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009
fschupp
 
Apache Spark e AWS Glue
Apache Spark e AWS GlueApache Spark e AWS Glue
Apache Spark e AWS Glue
Laercio Serra
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystified
Omid Vahdaty
 
spark ...................................
spark ...................................spark ...................................
spark ...................................
itsTIM66
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
Anirudh
 
Spark Worshop
Spark WorshopSpark Worshop
Spark Worshop
Juan Pedro Moreno
 
Putting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsPutting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech Analystics
Gareth Rogers
 
Apache Spark
Apache SparkApache Spark
Apache Spark
masifqadri
 
Spark
SparkSpark
Spark
fatemehjamalii
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
Juan Pedro Moreno
 
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
yafora8192
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Databricks
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
Robert Sanders
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
inoshg
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Manager
haridasnss
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
Swiss Data Forum Swiss Data Forum
 
Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009
fschupp
 
Apache Spark e AWS Glue
Apache Spark e AWS GlueApache Spark e AWS Glue
Apache Spark e AWS Glue
Laercio Serra
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystified
Omid Vahdaty
 
spark ...................................
spark ...................................spark ...................................
spark ...................................
itsTIM66
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
Anirudh
 
Putting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsPutting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech Analystics
Gareth Rogers
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
Juan Pedro Moreno
 
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
yafora8192
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
Petr Zapletal
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Databricks
 
Ad

More from Ted Won (9)

Undertow RequestBufferingHandler 소개
Undertow RequestBufferingHandler 소개Undertow RequestBufferingHandler 소개
Undertow RequestBufferingHandler 소개
Ted Won
 
JBoss EAP 7 & JDG 7 최신 기술 소개
JBoss EAP 7 & JDG 7 최신 기술 소개JBoss EAP 7 & JDG 7 최신 기술 소개
JBoss EAP 7 & JDG 7 최신 기술 소개
Ted Won
 
JBoss Modules Internal
JBoss Modules InternalJBoss Modules Internal
JBoss Modules Internal
Ted Won
 
오픈 소스 컨트리뷰션 가이드
오픈 소스 컨트리뷰션 가이드오픈 소스 컨트리뷰션 가이드
오픈 소스 컨트리뷰션 가이드
Ted Won
 
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Ted Won
 
Jenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Jenkins X - automated CI/CD solution for cloud native applications on KubernetesJenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Jenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Ted Won
 
Hawkular overview
Hawkular overviewHawkular overview
Hawkular overview
Ted Won
 
Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects
Ted Won
 
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
Ted Won
 
Undertow RequestBufferingHandler 소개
Undertow RequestBufferingHandler 소개Undertow RequestBufferingHandler 소개
Undertow RequestBufferingHandler 소개
Ted Won
 
JBoss EAP 7 & JDG 7 최신 기술 소개
JBoss EAP 7 & JDG 7 최신 기술 소개JBoss EAP 7 & JDG 7 최신 기술 소개
JBoss EAP 7 & JDG 7 최신 기술 소개
Ted Won
 
JBoss Modules Internal
JBoss Modules InternalJBoss Modules Internal
JBoss Modules Internal
Ted Won
 
오픈 소스 컨트리뷰션 가이드
오픈 소스 컨트리뷰션 가이드오픈 소스 컨트리뷰션 가이드
오픈 소스 컨트리뷰션 가이드
Ted Won
 
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Ted Won
 
Jenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Jenkins X - automated CI/CD solution for cloud native applications on KubernetesJenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Jenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Ted Won
 
Hawkular overview
Hawkular overviewHawkular overview
Hawkular overview
Ted Won
 
Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects
Ted Won
 
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
Ted Won
 

Recently uploaded (20)

Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Automation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath CertificateAutomation Techniques in RPA - UiPath Certificate
Automation Techniques in RPA - UiPath Certificate
VICTOR MAESTRE RAMIREZ
 
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage DashboardsAdobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
Adobe Marketo Engage Champion Deep Dive - SFDC CRM Synch V2 & Usage Dashboards
BradBedford3
 
Kubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptxKubernetes_101_Zero_to_Platform_Engineer.pptx
Kubernetes_101_Zero_to_Platform_Engineer.pptx
CloudScouts
 
How can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptxHow can one start with crypto wallet development.pptx
How can one start with crypto wallet development.pptx
laravinson24
 
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Exceptional Behaviors: How Frequently Are They Tested? (AST 2025)
Andre Hora
 
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software DevelopmentSecure Test Infrastructure: The Backbone of Trustworthy Software Development
Secure Test Infrastructure: The Backbone of Trustworthy Software Development
Shubham Joshi
 
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AIScaling GraphRAG:  Efficient Knowledge Retrieval for Enterprise AI
Scaling GraphRAG: Efficient Knowledge Retrieval for Enterprise AI
danshalev
 
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
F-Secure Freedome VPN 2025 Crack Plus Activation  New VersionF-Secure Freedome VPN 2025 Crack Plus Activation  New Version
F-Secure Freedome VPN 2025 Crack Plus Activation New Version
saimabibi60507
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025Avast Premium Security Crack FREE Latest Version 2025
Avast Premium Security Crack FREE Latest Version 2025
mu394968
 
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and CollaborateMeet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Meet the Agents: How AI Is Learning to Think, Plan, and Collaborate
Maxim Salnikov
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Societal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainabilitySocietal challenges of AI: biases, multilinguism and sustainability
Societal challenges of AI: biases, multilinguism and sustainability
Jordi Cabot
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]Get & Download Wondershare Filmora Crack Latest [2025]
Get & Download Wondershare Filmora Crack Latest [2025]
saniaaftab72555
 
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New VersionPixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
Pixologic ZBrush Crack Plus Activation Key [Latest 2025] New Version
saimabibi60507
 

JDG 7 & Spark Integration

  • 1. JDG 7 & Spark Integration Infinispan Spark connector tedwon
  • 2. Agenda ● Brief Introduction to JBoss Data Grid 7 ● Brief Introduction to Apache Spark ● Features of Infinispan Spark connector ● Demo
  • 3. What is JBoss Data Grid ● Distributed cache ● In-memory NoSQL ● Key/value data store ● Built from the Infinispan community project ● In architecture, there is no master node ● Peer-to-peer membership clustering ● The most strength is extremely highly availability and scalability as in-memory ○ In my personal opinion ● Good harmony with real-time data processing framework for data services ● As a metaphor, In-memory HDFS for real-time processing
  • 4. What is JBoss Data Grid 7 ● JDG 7 was released at July, 2016 ● JDG 7 is based on Infinispan 8.3.0 and EAP 7.0.0.GA ○ The previous version JDG 6.6.0 is based on Infinispan 6.4.0 ● JDG 7 now becomes much more powerful as a competitive software product ● There are new features and enhancements ○ New GUI Admin console ○ Easy to install and configure clustering and cache ○ Easy to monitor and change configuration ○ Easy to create a new cache through admin console even in runtime ○ Provides API for integration with Apache Spark
  • 5. JDG 7 - New Features and Enhancements 1. DISTRIBUTED STREAMS 2. REMOTE TASK EXECUTION 3. APACHE SPARK INTEGRATION 4. APACHE HADOOP INTEGRATION 5. NEW ADMINISTRATION CONSOLE FOR SERVER DEPLOYMENTS 6. CONTROLLED SHUTDOWN AND RESTART OF CLUSTER 7. NODE.JS (JAVASCRIPT) HOT ROD CLIENT 8. CASSANDRA CACHE STORE 9. HOT ROD C++ ENHANCEMENTS 10. HOT ROD C# ENHANCEMENTS
  • 7. ● General-purpose Lightning-fast cluster computing framework ● General-purpose ○ One common data processing engine ○ Batch, SQL, Streaming, MLlib, Graph ○ One platform to rule them all ● Lightning-fast ○ RDD - Resilient Distributed Dataset ○ Read-only multiset of data items distributed over a cluster of machines ● Works with Hadoop HDFS and process that data in parallel ○ Distributed file System ● MapReduce interfaces with funtional programming style ○ No MR chaining workflow in Hadoop What is Apache Spark
  • 8. Apache Spark - One platform to rule them all
  • 9. What is Apache Spark RDD ● RDD consists of mulitle data partitions over a cluster of machines ● RDD is intermediate data during a Spark data processing job ● RDD is distributed in memory of each worker node JVM ● Data processing job is transformations of RDDs ● RDDs are disappeared after finishing a job ● Not able to share RDDs between Spark jobs
  • 10. What is Apache Spark RDD
  • 11. Infinispan Spark connector JDG 7’s new feautre
  • 12. What is Infinispan Spark connector ● https://github.com/infinispan/infinispan-spark ● JDG 7 Document https://goo.gl/9BXp98 ● RDD and DStream integration with Apache Spark 1.6 ○ DStream is a continuous sequence of RDDs for real-time stream processing ● Use JDG as a data source for Spark ● Easy to read & write cache data in a Spark job ● Provides seamless funtional programming style and syntactic sugar ● Good to share RDD with other Spark jobs
  • 14. Features of Infinispan Spark connector 1. Create an Spark RDD from a JDG cache data ○ Read cache data from Spark job 2. Write any key/value based RDD to a JDG cache ○ Write intermediate or final data to cache 3. Create a Spark DStream from cache-level events ○ Insert, Modify and Delete event in a cache 4. Write any key/value DStream to JDG ○ Write any DStream to cache 5. Use JDG server side filters to create a cache based RDD ○ Using Infinispan Query DSL
  • 15. Prerequisite & Version compatibility ● JDK 8, Scala 2.10.4, Spark 1.6 ● JBoss Data Grid 7.0.0 Server ○ Infinispan Server 8.2.4.Final ● Run JDG in Remote Client-Server mode
  • 17. Performance Considerations ● The number of Spark workers should be greater than JDG nodes ○ To take advantage of the parallelism ● Support locality with co-located in the same node as JDG and Spark worker ○ Spark worker only processes data in the local node with connector
  • 19. References 1. https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/7.0/html-single/Developer_Guide/index. html#Integration_with_Apache_Spark 2. http://spark.apache.org/docs/1.6.2/programming-guide.html 3. https://github.com/infinispan/infinispan-spark 4. https://github.com/infinispan/infinispan 5. https://github.com/tedwon/infinispan-spark-connector-examples 6. https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/7.0/html-single/7.0.0_Release_Notes/in dex.html#chap-New_Features_and_Enhancements 7. https://en.wikipedia.org/wiki/Apache_Spark 8. https://hub.docker.com/r/gustavonalle/infinispan-spark/