SlideShare a Scribd company logo
IoT Applications and Patterns
using Apache Spark &
Apache Bahir
Luciano Resende
June 14th, 2018
© 2018 IBM Corporation 1
About me - Luciano Resende
2
Data Science Platform Architect – IBM – CODAIT
• Have been contributing to open source at ASF for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Toree, Apache Spark among other projects related to AI/ML platforms
lresende@apache.org
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
Open Source Community Leadership
C O D A I T
Founding Partner 188+ Project Committers 77+ Projects
Key Open source steering committee
memberships OSS Advisory Board
Open Source
Center for Open Source
Data and AI Technologies
CODAIT
codait.org
codait (French)
= coder/coded
https://m.interglot.com/fr/en/codait
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
5
Agenda
6
Introductions
- Apache Spark
- Apache Bahir
IoT Applications
Live Demo
Summary
References
Q&A
Apache Spark
7
Apache Spark Introduction
8
Spark Core
Spark
SQL
Spark
Streaming
Spark
ML
Spark
GraphX
executes SQL
statements
performs
streaming
analytics using
micro-batches
common
machine
learning and
statistical
algorithms
distributed
graph
processing
framework
general compute engine, handles
distributed task dispatching,
scheduling and basic I/O
functions
large variety of data sources
and formats can be supported,
both on-premise or cloud
BigInsights
(HDFS)
Cloudant
dashDB
SQL
DB
Apache Spark Evolution
9
Apache Spark – Spark SQL
10
Spark
SQL
Unified data access APIS: Query
structured data sets with SQL or
Dataset/DataFrame APIs
Fast, familiar query language across
all of your enterprise data
RDBMS
Data Sources
Structured
Streaming
Data Sources
Apache Spark – Spark SQL
11
You can run SQL statement with SparkSession.sql(…) interface:
val spark = SparkSession.builder()
.appName(“Demo”)
.getOrCreate()
spark.sql(“create table T1 (c1 int, c2 int) stored as parquet”)
val ds = spark.sql(“select * from T1”)
You can further transform the resultant dataset:
val ds1 = ds.groupBy(“c1”).agg(“c2”-> “sum”)
val ds2 = ds.orderBy(“c1”)
The result is a DataFrame / Dataset[Row]
ds.show() displays the rows
Apache Spark – Spark SQL
You can read from data sources using SparkSession.read.format(…)
val spark = SparkSession.builder()
.appName(“Demo”)
.getOrCreate()
case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer)
// loading csv data to a Dataset of Bank type
val bankFromCSV = spark.read.csv(“hdfs://localhost:9000/data/bank.csv").as[Bank]
// loading JSON data to a Dataset of Bank type
val bankFromJSON = spark.read.json(“hdfs://localhost:9000/data/bank.json").as[Bank]
// select a column value from the Dataset
bankFromCSV.select(‘age).show() will return all rows of column “age” from this dataset.
12
Apache Spark – Spark SQL
You can also configure a specific data source with specific options
val spark = SparkSession.builder()
.appName(“Demo”)
.getOrCreate()
case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer)
// loading csv data to a Dataset of Bank type
val bankFromCSV = sparkSession.read
.option("header", ”true") // Use first line of all files as header
.option("inferSchema", ”true") // Automatically infer data types
.option("delimiter", " ")
.csv("/users/lresende/data.csv”)
.as[Bank]
bankFromCSV.select(‘age).show() // will return all rows of column “age” from this dataset.
13
Apache Spark – Spark SQL – Data Sources
Data Sources under the covers
- Data source registration (e.g. spark.read.datasource)
- Provide BaseRelation implementation
• That implements support for table scans:
– TableScans, PrunedScan, PrunedFilteredScan, CatalystScan
- Detailed information available at
• https://developer.ibm.com/code/2016/11/10/exploring-apache-spark-datasource-api/
14
Apache Spark – Spark SQL – Data Sources
Data Sources V1 Limitations
- Leak upper-level API in the data source (DataFrame/SQLContext)
- Hard to extend the Data Sources API for more optimizations
- Zero transaction guarantee in the write APIs
- Limited Extensibility
15
Apache Spark – Spark SQL – Data Sources
Data Sources V2
- Support for row-based scan and columnar scan
- Column pruning and filter push-down
- Can report basic statistics and data partitioning
- Transactional write API
- Streaming source and sink support for micro-batch and continuous
mode
- Detailed information available at
• https://developer.ibm.com/code/2018/04/16/introducing-apache-spark-data-sources-api-v2/
16
Apache Spark – Spark SQL Structured Streaming
Unified programming model for streaming, interactive and batch queries
17Image source: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
Considers the data stream as unbounded
table
Apache Spark – Spark SQL Structured Streaming
SQL regular APIs
val spark = SparkSession.builder()
.appName(“Demo”)
.getOrCreate()
val input = spark.read
.schema(schema)
.format(”csv")
.load(”input-path")
val result = input
.select(”age”)
.where(”age > 18”)
result.write
.format(”json”)
. save(” dest-path”)
18
Structured Streaming APIs
val spark = SparkSession.builder()
.appName(“Demo”)
.getOrCreate()
val input = spark.readStream
.schema(schema)
.format(”csv")
.load(”input-path")
val result = input
.select(”age”)
.where(”age > 18”)
result.write
.format(”json”)
. startStream(” dest-path”)
Apache Spark – Spark Streaming
19
Spark
Streaming
Micro-batch event processing for
near-real time analytics
e.g. Internet of Things (IoT) devices,
Twitter feeds, Kafka (event hub), etc.
No multi-threading or parallel process
programming required
Apache Spark – Spark Streaming
Also known as discretized stream or DStream
Abstracts a continuous stream of data
Based on micro-batching
Based on RDDs
20
Apache Spark – Spark Streaming
val sparkConf = new SparkConf()
.setAppName("MQTTWordCount")
val ssc = new StreamingContext(sparkConf, Seconds(2))
val lines = MQTTUtils.createStream(ssc, brokerUrl, topic, StorageLevel.MEMORY_ONLY_SER_2)
val words = lines.flatMap(x => x.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
ssc.start()
ssc.awaitTermination()
21
Apache Bahir
22
Origins of the Apache Bahir Project
MAY/2016: Established as a top-level Apache Project.
- PMC formed by Apache Spark committers/pmc, Apache Members
- Initial contributions imported from Apache Spark
AUG/2016: Apache Flink community join Apache Bahir
- Initial contributions of Flink extensions
- In October 2016 Robert Metzger elected committer
Origins of the Bahir name
Naming an Apache Project is a science !!!
- We needed a name that wasn’t used yet
- Needed to be related to Spark
We ended up with : Bahir
- A name of Arabian origin that means Sparkling,
- Also associated with a guy who succeeds at everything
Why Apache Bahir
It’s an Apache project
- And if you are here, you know what it means
Benefits of curating your extensions at Apache Bahir
- Apache Governance
- Apache License
- Apache Community
- Apache Brand
25
Why Apache Bahir
Flexibility
- Release flexibility
• Bounded to platform or component release
Shared infrastructure
- Release, CI, etc
Shared knowledge
- Collaborate with experts on both platform and component areas
26
Bahir extensions for Apache Spark
MQTT – Enables reading data from MQTT Servers using Spark Streaming or Structured streaming.
• http://bahir.apache.org/docs/spark/current/spark-sql-streaming-mqtt/
• http://bahir.apache.org/docs/spark/current/spark-streaming-mqtt/
Couch DB/Cloudant – Enables reading data from CouchDB/Cloudant using Spark SQL and Spark
Streaming.
Twitter – Enables reading social data from twitter using Spark Streaming.
• http://bahir.apache.org/docs/spark/current/spark-streaming-twitter/
Akka – Enables reading data from Akka Actors using Spark Streaming or Structured Streaming.
• http://bahir.apache.org/docs/spark/current/spark-streaming-akka/
ZeroMQ – Enables reading data from ZeroMQ using Spark Streaming.
• http://bahir.apache.org/docs/spark/current/spark-streaming-zeromq/
27
Bahir extensions for Apache Spark
Google Cloud Pub/Sub – Add spark streaming connector to Google Cloud Pub/Sub
28
Apache Spark extensions in Bahir
Adding Bahir extensions into your application
- Using SBT
libraryDependencies += "org.apache.bahir" %% "spark-streaming-mqtt" % "2.2.0”
- Using Maven
<dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>spark-streaming-mqtt_2.11 </artifactId>
<version>2.2.0</version>
</dependency>
29
Apache Spark extensions in Bahir
Submitting applications with Bahir extensions to Spark
- Spark-shell
bin/spark-shell --packages org.apache.bahir:spark-streaming_mqtt_2.11:2.2.0 …..
- Spark-submit
bin/spark-submit --packages org.apache.bahir:spark-streaming_mqtt_2.11:2.2.0 …..
30
Internet of
Things - IoT
31
IoT – Definition by Wikipedia
The Internet of things (IoT) is the network of
physical devices, vehicles, home
appliances, and other
items embedded with electronics, software,
sensors, actuators, and network
connectivity which enable these objects to
connect and exchange data.
32
IoT – Interaction between multiple entities
33
Things Software
People
control
observe
inform
command
actuate
inform
IoT Patterns – Some of them …
35
• Remote control
• Security analysis
• Edge analytics
• Historical data analysis
• Distributed Platforms
• Real-time decisions
MQTT – M2M / IoT Connectivity Protocol
37
Connect
+
Publish
+
Subscribe
~1990
IBM / Eurotech
2010
Published
2011
Eclipse M2M / Paho
2014
OASIS
Open spec
+ 40 client
implementatio
ns
Minimal
overhead
Tiny
Clients
(Java 170KB)
History
Header
2-4 bytes
(publish)
14 bytes
(connect)
V5
May 2018
MQTT – Quality of Service
38
MQTT
Broker
QoS0
QoS1
QoS2
At most once
At least once
Exactly once
. No connection failover
. Never duplicate
. Has connection failover
. Can duplicate
. Has connection failover
. Never duplicate
MQTT – World usage
Smart Home Automation
Messaging
Notable Mentions:
- IBM IoT Platform
- AWS IoT
- Microsoft IoT Hub
- Facebook Messanger
39
Live Demo
40
IoT Simulator using MQTT
The demo environment
https://github.com/lresende/bahir-iot-demo
41
Node.js Web app
Simulates Elevator IoT devices
Elevator simulator
Metrics:
• Weight
• Speed
• Power
• Temperature
• System
MQTT
Mosquitto
Summary
42
Summary – Take away points
Apache Spark
- IoT Analytics Runtime with support for ”Continuous Applications”
Apache Bahir
- Bring access to IoT data via supported connectors (e.g. MQTT)
IoT Applications
- Using Spark and Bahir to start processing IoT data in near real
time using Spark Streaming and Spark Structured Streaming
43
Join the Apache
Bahir community
44
References
Apache Bahir
http://bahir.apache.org
Documentation for Apache Spark extensions
http://bahir.apache.org/docs/spark/current/documentation/
Source Repositories
https://github.com/apache/bahir
https://github.com/apache/bahir-website
Demo Repository
https://github.com/lresende/bahir-iot-demo
45Image source: http://az616578.vo.msecnd.net/files/2016/03/21/6359412499310138501557867529_thank-you-1400x800-c-default.gif
46March 30 2018 / © 2018 IBM Corporation
Ad

More Related Content

What's hot (20)

Using Apache Spark with IBM SPSS Modeler
Using Apache Spark with IBM SPSS ModelerUsing Apache Spark with IBM SPSS Modeler
Using Apache Spark with IBM SPSS Modeler
Global Knowledge Training
 
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve LoughranSpark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve Loughran
Spark Summit
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
felixcss
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
All Things Open
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
Data Con LA
 
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
 
Apache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and BeyondApache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and Beyond
DataWorks Summit/Hadoop Summit
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
Timothy Spann
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
airisData
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
DataWorks Summit/Hadoop Summit
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
W2O Group
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Databricks
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto Comparition
Sandish Kumar H N
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache Oozie
Yahoo Developer Network
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & Zeppelin
Vinay Shukla
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Slim Baltagi
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
 
Spark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve LoughranSpark Summit EU talk by Steve Loughran
Spark Summit EU talk by Steve Loughran
Spark Summit
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
felixcss
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
All Things Open
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
Data Con LA
 
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + Livy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
Hortonworks
 
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
Timothy Spann
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
airisData
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
W2O Group
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Databricks
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto Comparition
Sandish Kumar H N
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache Oozie
Yahoo Developer Network
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & Zeppelin
Vinay Shukla
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Slim Baltagi
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
 

Similar to IoT Applications and Patterns using Apache Spark & Apache Bahir (20)

Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
Luciano Resende
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
Databricks
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
Sri Ambati
 
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
CloudxLab
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Storlets fb session_16_9
Storlets fb session_16_9Storlets fb session_16_9
Storlets fb session_16_9
Eran Rom
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
Vienna Data Science Group
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Provectus
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
Databricks
 
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
HostedbyConfluent
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Jen Aman
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot Architectures
Thanigai Vellore
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
Building highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache SparkBuilding highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache Spark
Martin Toshev
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Building iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache BahirBuilding iot applications with Apache Spark and Apache Bahir
Building iot applications with Apache Spark and Apache Bahir
Luciano Resende
 
Writing Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark APIWriting Continuous Applications with Structured Streaming PySpark API
Writing Continuous Applications with Structured Streaming PySpark API
Databricks
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
Sri Ambati
 
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
CloudxLab
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Storlets fb session_16_9
Storlets fb session_16_9Storlets fb session_16_9
Storlets fb session_16_9
Eran Rom
 
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Writing Continuous Applications with Structured Streaming Python APIs in Apac...
Databricks
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Provectus
 
Writing Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySparkWriting Continuous Applications with Structured Streaming in PySpark
Writing Continuous Applications with Structured Streaming in PySpark
Databricks
 
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
Wikipedia’s Event Data Platform, Or: JSON Is Okay Too With Andrew Otto | Curr...
HostedbyConfluent
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Jen Aman
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot Architectures
Thanigai Vellore
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
Building highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache SparkBuilding highly scalable data pipelines with Apache Spark
Building highly scalable data pipelines with Apache Spark
Martin Toshev
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Ad

More from Luciano Resende (20)

A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
Luciano Resende
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
Luciano Resende
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Luciano Resende
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
Luciano Resende
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooks
Luciano Resende
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Luciano Resende
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
Luciano Resende
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway Overview
Luciano Resende
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
Luciano Resende
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
Luciano Resende
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernels
Luciano Resende
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
Luciano Resende
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
Luciano Resende
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
Luciano Resende
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
Luciano Resende
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
Luciano Resende
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
Luciano Resende
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
Luciano Resende
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
Luciano Resende
 
Asf icfoss-mentoring
Asf icfoss-mentoringAsf icfoss-mentoring
Asf icfoss-mentoring
Luciano Resende
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
Luciano Resende
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
Luciano Resende
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Luciano Resende
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
Luciano Resende
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooks
Luciano Resende
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Luciano Resende
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
Luciano Resende
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway Overview
Luciano Resende
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
Luciano Resende
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
Luciano Resende
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernels
Luciano Resende
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
Luciano Resende
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
Luciano Resende
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
Luciano Resende
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
Luciano Resende
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
Luciano Resende
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
Luciano Resende
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
Luciano Resende
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
Luciano Resende
 
Ad

Recently uploaded (20)

定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
Taqyea
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
pankaj6188303
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Modern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx AaModern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx Aa
MuhammadAwaisKamboh
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证定制学历(美国Purdue毕业证)普渡大学电子版毕业证
定制学历(美国Purdue毕业证)普渡大学电子版毕业证
Taqyea
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Customer Segmentation using K-Means clustering
Customer Segmentation using K-Means clusteringCustomer Segmentation using K-Means clustering
Customer Segmentation using K-Means clustering
Ingrid Nyakerario
 
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptxISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
ISO 9001_2015 FINALaaaaaaaaaaaaaaaa - MDX - Copy.pptx
pankaj6188303
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Modern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx AaModern_Distribution_Presentation.pptx Aa
Modern_Distribution_Presentation.pptx Aa
MuhammadAwaisKamboh
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 

IoT Applications and Patterns using Apache Spark & Apache Bahir

  • 1. IoT Applications and Patterns using Apache Spark & Apache Bahir Luciano Resende June 14th, 2018 © 2018 IBM Corporation 1
  • 2. About me - Luciano Resende 2 Data Science Platform Architect – IBM – CODAIT • Have been contributing to open source at ASF for over 10 years • Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache Toree, Apache Spark among other projects related to AI/ML platforms [email protected] https://www.linkedin.com/in/lresende @lresende1975 https://github.com/lresende
  • 3. Open Source Community Leadership C O D A I T Founding Partner 188+ Project Committers 77+ Projects Key Open source steering committee memberships OSS Advisory Board Open Source
  • 4. Center for Open Source Data and AI Technologies CODAIT codait.org codait (French) = coder/coded https://m.interglot.com/fr/en/codait CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission 5
  • 5. Agenda 6 Introductions - Apache Spark - Apache Bahir IoT Applications Live Demo Summary References Q&A
  • 7. Apache Spark Introduction 8 Spark Core Spark SQL Spark Streaming Spark ML Spark GraphX executes SQL statements performs streaming analytics using micro-batches common machine learning and statistical algorithms distributed graph processing framework general compute engine, handles distributed task dispatching, scheduling and basic I/O functions large variety of data sources and formats can be supported, both on-premise or cloud BigInsights (HDFS) Cloudant dashDB SQL DB
  • 9. Apache Spark – Spark SQL 10 Spark SQL Unified data access APIS: Query structured data sets with SQL or Dataset/DataFrame APIs Fast, familiar query language across all of your enterprise data RDBMS Data Sources Structured Streaming Data Sources
  • 10. Apache Spark – Spark SQL 11 You can run SQL statement with SparkSession.sql(…) interface: val spark = SparkSession.builder() .appName(“Demo”) .getOrCreate() spark.sql(“create table T1 (c1 int, c2 int) stored as parquet”) val ds = spark.sql(“select * from T1”) You can further transform the resultant dataset: val ds1 = ds.groupBy(“c1”).agg(“c2”-> “sum”) val ds2 = ds.orderBy(“c1”) The result is a DataFrame / Dataset[Row] ds.show() displays the rows
  • 11. Apache Spark – Spark SQL You can read from data sources using SparkSession.read.format(…) val spark = SparkSession.builder() .appName(“Demo”) .getOrCreate() case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer) // loading csv data to a Dataset of Bank type val bankFromCSV = spark.read.csv(“hdfs://localhost:9000/data/bank.csv").as[Bank] // loading JSON data to a Dataset of Bank type val bankFromJSON = spark.read.json(“hdfs://localhost:9000/data/bank.json").as[Bank] // select a column value from the Dataset bankFromCSV.select(‘age).show() will return all rows of column “age” from this dataset. 12
  • 12. Apache Spark – Spark SQL You can also configure a specific data source with specific options val spark = SparkSession.builder() .appName(“Demo”) .getOrCreate() case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer) // loading csv data to a Dataset of Bank type val bankFromCSV = sparkSession.read .option("header", ”true") // Use first line of all files as header .option("inferSchema", ”true") // Automatically infer data types .option("delimiter", " ") .csv("/users/lresende/data.csv”) .as[Bank] bankFromCSV.select(‘age).show() // will return all rows of column “age” from this dataset. 13
  • 13. Apache Spark – Spark SQL – Data Sources Data Sources under the covers - Data source registration (e.g. spark.read.datasource) - Provide BaseRelation implementation • That implements support for table scans: – TableScans, PrunedScan, PrunedFilteredScan, CatalystScan - Detailed information available at • https://developer.ibm.com/code/2016/11/10/exploring-apache-spark-datasource-api/ 14
  • 14. Apache Spark – Spark SQL – Data Sources Data Sources V1 Limitations - Leak upper-level API in the data source (DataFrame/SQLContext) - Hard to extend the Data Sources API for more optimizations - Zero transaction guarantee in the write APIs - Limited Extensibility 15
  • 15. Apache Spark – Spark SQL – Data Sources Data Sources V2 - Support for row-based scan and columnar scan - Column pruning and filter push-down - Can report basic statistics and data partitioning - Transactional write API - Streaming source and sink support for micro-batch and continuous mode - Detailed information available at • https://developer.ibm.com/code/2018/04/16/introducing-apache-spark-data-sources-api-v2/ 16
  • 16. Apache Spark – Spark SQL Structured Streaming Unified programming model for streaming, interactive and batch queries 17Image source: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html Considers the data stream as unbounded table
  • 17. Apache Spark – Spark SQL Structured Streaming SQL regular APIs val spark = SparkSession.builder() .appName(“Demo”) .getOrCreate() val input = spark.read .schema(schema) .format(”csv") .load(”input-path") val result = input .select(”age”) .where(”age > 18”) result.write .format(”json”) . save(” dest-path”) 18 Structured Streaming APIs val spark = SparkSession.builder() .appName(“Demo”) .getOrCreate() val input = spark.readStream .schema(schema) .format(”csv") .load(”input-path") val result = input .select(”age”) .where(”age > 18”) result.write .format(”json”) . startStream(” dest-path”)
  • 18. Apache Spark – Spark Streaming 19 Spark Streaming Micro-batch event processing for near-real time analytics e.g. Internet of Things (IoT) devices, Twitter feeds, Kafka (event hub), etc. No multi-threading or parallel process programming required
  • 19. Apache Spark – Spark Streaming Also known as discretized stream or DStream Abstracts a continuous stream of data Based on micro-batching Based on RDDs 20
  • 20. Apache Spark – Spark Streaming val sparkConf = new SparkConf() .setAppName("MQTTWordCount") val ssc = new StreamingContext(sparkConf, Seconds(2)) val lines = MQTTUtils.createStream(ssc, brokerUrl, topic, StorageLevel.MEMORY_ONLY_SER_2) val words = lines.flatMap(x => x.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) wordCounts.print() ssc.start() ssc.awaitTermination() 21
  • 22. Origins of the Apache Bahir Project MAY/2016: Established as a top-level Apache Project. - PMC formed by Apache Spark committers/pmc, Apache Members - Initial contributions imported from Apache Spark AUG/2016: Apache Flink community join Apache Bahir - Initial contributions of Flink extensions - In October 2016 Robert Metzger elected committer
  • 23. Origins of the Bahir name Naming an Apache Project is a science !!! - We needed a name that wasn’t used yet - Needed to be related to Spark We ended up with : Bahir - A name of Arabian origin that means Sparkling, - Also associated with a guy who succeeds at everything
  • 24. Why Apache Bahir It’s an Apache project - And if you are here, you know what it means Benefits of curating your extensions at Apache Bahir - Apache Governance - Apache License - Apache Community - Apache Brand 25
  • 25. Why Apache Bahir Flexibility - Release flexibility • Bounded to platform or component release Shared infrastructure - Release, CI, etc Shared knowledge - Collaborate with experts on both platform and component areas 26
  • 26. Bahir extensions for Apache Spark MQTT – Enables reading data from MQTT Servers using Spark Streaming or Structured streaming. • http://bahir.apache.org/docs/spark/current/spark-sql-streaming-mqtt/ • http://bahir.apache.org/docs/spark/current/spark-streaming-mqtt/ Couch DB/Cloudant – Enables reading data from CouchDB/Cloudant using Spark SQL and Spark Streaming. Twitter – Enables reading social data from twitter using Spark Streaming. • http://bahir.apache.org/docs/spark/current/spark-streaming-twitter/ Akka – Enables reading data from Akka Actors using Spark Streaming or Structured Streaming. • http://bahir.apache.org/docs/spark/current/spark-streaming-akka/ ZeroMQ – Enables reading data from ZeroMQ using Spark Streaming. • http://bahir.apache.org/docs/spark/current/spark-streaming-zeromq/ 27
  • 27. Bahir extensions for Apache Spark Google Cloud Pub/Sub – Add spark streaming connector to Google Cloud Pub/Sub 28
  • 28. Apache Spark extensions in Bahir Adding Bahir extensions into your application - Using SBT libraryDependencies += "org.apache.bahir" %% "spark-streaming-mqtt" % "2.2.0” - Using Maven <dependency> <groupId>org.apache.bahir</groupId> <artifactId>spark-streaming-mqtt_2.11 </artifactId> <version>2.2.0</version> </dependency> 29
  • 29. Apache Spark extensions in Bahir Submitting applications with Bahir extensions to Spark - Spark-shell bin/spark-shell --packages org.apache.bahir:spark-streaming_mqtt_2.11:2.2.0 ….. - Spark-submit bin/spark-submit --packages org.apache.bahir:spark-streaming_mqtt_2.11:2.2.0 ….. 30
  • 31. IoT – Definition by Wikipedia The Internet of things (IoT) is the network of physical devices, vehicles, home appliances, and other items embedded with electronics, software, sensors, actuators, and network connectivity which enable these objects to connect and exchange data. 32
  • 32. IoT – Interaction between multiple entities 33 Things Software People control observe inform command actuate inform
  • 33. IoT Patterns – Some of them … 35 • Remote control • Security analysis • Edge analytics • Historical data analysis • Distributed Platforms • Real-time decisions
  • 34. MQTT – M2M / IoT Connectivity Protocol 37 Connect + Publish + Subscribe ~1990 IBM / Eurotech 2010 Published 2011 Eclipse M2M / Paho 2014 OASIS Open spec + 40 client implementatio ns Minimal overhead Tiny Clients (Java 170KB) History Header 2-4 bytes (publish) 14 bytes (connect) V5 May 2018
  • 35. MQTT – Quality of Service 38 MQTT Broker QoS0 QoS1 QoS2 At most once At least once Exactly once . No connection failover . Never duplicate . Has connection failover . Can duplicate . Has connection failover . Never duplicate
  • 36. MQTT – World usage Smart Home Automation Messaging Notable Mentions: - IBM IoT Platform - AWS IoT - Microsoft IoT Hub - Facebook Messanger 39
  • 38. IoT Simulator using MQTT The demo environment https://github.com/lresende/bahir-iot-demo 41 Node.js Web app Simulates Elevator IoT devices Elevator simulator Metrics: • Weight • Speed • Power • Temperature • System MQTT Mosquitto
  • 40. Summary – Take away points Apache Spark - IoT Analytics Runtime with support for ”Continuous Applications” Apache Bahir - Bring access to IoT data via supported connectors (e.g. MQTT) IoT Applications - Using Spark and Bahir to start processing IoT data in near real time using Spark Streaming and Spark Structured Streaming 43
  • 41. Join the Apache Bahir community 44
  • 42. References Apache Bahir http://bahir.apache.org Documentation for Apache Spark extensions http://bahir.apache.org/docs/spark/current/documentation/ Source Repositories https://github.com/apache/bahir https://github.com/apache/bahir-website Demo Repository https://github.com/lresende/bahir-iot-demo 45Image source: http://az616578.vo.msecnd.net/files/2016/03/21/6359412499310138501557867529_thank-you-1400x800-c-default.gif
  • 43. 46March 30 2018 / © 2018 IBM Corporation