SlideShare a Scribd company logo
Session 9 - Big Data and Machine Learning with FIWARE
Fernando López, Cloud & Platform Senior Expert
fernando.lopez@fiware.org
@flopezaguilar
FIWARE Foundation, e.V.
Learning Goals
1
● Introduction to Big Data
● Different between Apache Flink and Spark
● FIWARE connectors
● (Work in Progress) Machine Learning in FIWARE
2
Introduction to Big Data
Big Data
3
Big Data Analytics
4
Indexed
Storage
(RDBMS,
Apache
Solr)
Interactive
Processing
(e.g. Drill,
BigQuery,
OLAP)
MapReduce
(e,g, Spark, Hadoop)
Realtime
Analytics
(CEP,
Stream
Processing)
In-Memory
Computing
(e.g. Spark,
SAP Hana,
VoltDB)SizeodtheDataHandled
(persecond)
millis seconds minutes hours days
Time to Act
100k
events
(100MBs)
1k events
(1MBs)
100 events
(10KBs)
5
NGSI-LD
Based on
Source: https://docbox.etsi.org/ISG/CIM/Open/NGSI-LD_introduction.pdf
https://www.webfirst.com/services/open-data-solutions
6
ETL architecture
Source: https://www.red-gate.com/simple-talk/sql/database-
delivery/database-lifecycle-management-for-etl-systems/
7
Lambda architecture
8
Kappa architecture
Source: Siddharth Mittal
Simple Smart solutions: Reference Architecture
9
Draco
Kurento
Wirecloud
QuantumLeap
Knowage
Flink
CrateDB
10
FIWARE Cosmos: Orion Flink Connector
Features
▪ The Cosmos Generic Enabler enables an easier BigData analysis over context
integrated with some of the most popular BigData platforms.
▪ Batch Processing
▪ Stream Processing (Real-time)
▪ Direct data ingestion
▪ Direct connection with Context Broker
▪ Multiple Sinks
11
Apache Flink
▪ Framework and distributed processing engine for stateful computations over unbounded
and bounded data streams.
▪ Designed to run in all common cluster environments, perform computations at in-memory
speed and at any scale.
12
Architecture
13
Connection
14
ORION
Context Broker
Flink Cluster
Flink Job (JAR)
orion-flink-connector
HTTP POST (Notification)
HTTP POST/PUT/PATCH
OrionSource
OrionSink
OrionSource
15
▪ Receives data from the Orion Context Broker from a given port.
▪ The received data is a Stream of NgsiEvent object.
val eventStream = env.addSource(new OrionSource(9001))
OrionSink
16
▪ Sends data back to the Orion Context Broker:
▪ Takes a stream of OrionSinkObjects as a source:
• content: Message content in String format. If it is a JSON, it needs to be stringified.
• url: URL to which the message should be sent
• contentType: Type of HTTP content of the message (JSON, Plain)
• method: HTTP method of the message (POST, PUT, PATCH)
OrionSink.addSink( processedDataStream )
Basic example
17
final val URL_CB = "http://flinkexample_orion_1:1026/v2/entities/"
final val CONTENT_TYPE = ContentType.JSON
final val METHOD = HTTPMethod.POST
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
// Create Orion Source. Receive notifications on port 9001
val eventStream = env.addSource(new OrionSource(9001))
// Process event stream
Basic example
18
// Process event stream
val processedDataStream = eventStream
.flatMap(event => event.entities)
.map(entity => {
val temp = entity.attrs("temperature").value.asInstanceOf[Number].floatValue()
new Temp_Node(entity.id, temp)
})
.keyBy("id")
.timeWindow(Time.seconds(5), Time.seconds(2))
.min("temperature")
.map(tempNode => {
val url = URL_CB + tempNode.id + "/attrs"
OrionSinkObject(tempNode.toString, url, CONTENT_TYPE, METHOD)
})
// Add Orion Sink
Basic example
19
// Add Orion Sink
OrionSink.addSink( processedDataStream )
// …
}
20
FIWARE Cosmos: Orion Spark Connector
Spark Components
21
Spark Scheduler
22
join
union
groupBy
map
Stage 3
Stage 1
Stage 2
A: B:
C: D:
E:
F:
G:
= cached data partition
▪ Dryad-like DAGs
▪ Pipelines functions within a stage
▪ Cache-aware work reuse & locality
▪ Partitioning-aware to avoid shuffles
Motivation of Spark
23
▪ Iterative algorithms (machine learning, graphs)
▪ Interactive data mining tools (R, Excel, Python)
Connection
24
ORION
Context Broker
Spark Cluster
Spark Job (JAR)
orion-spark-connector
HTTP POST (Notification)
HTTP POST/PUT/PATCH
OrionReceiver
OrionSink
25
FIWARE & Machine Learning
26
Machine Learning
Machine Learning Development Lifecycle
27
Machine Learning Algorithms
28
▪ Some solutions have
high algorithm complexity
▪ Some can be parallelized
in a cluster (FlinkML)
▪ Other can use GPU (e.g.
Tensorflow)
▪ Even each case could be
different we try to set up
some generic life cycle.
29
ML Standard Solution
▪ Each problem requires an analysis of which ML algorithm suits our data.
▪ Later, the training dataset needs to be set up.
▪ Each problem may be slightly different (“same same but different”).
▪ We can provide some solutions for some cases and use a proper dataset.
▪ The tool to use (Spark, Flink, Tensorflow) depends on the chosen ML algorithm (not all the
ML algorithms are in all the architectures).
30
Current Status
Orion Connector
Orion Source/Receiver + Orion Sink ✔ ✔
RTD Documentation ✔ ✔
Unit Tests ✔ ✔
Examples ✔ ✔
Step-by-step tutorial ✔
Support NGSI LD
Summary: Terms
31
● OLAP, Online Analytical Processing.
● OLTP, Online Transaction Processing.
● RDBMS, Relational Database Management System.
● ETL, Extract, Transform, Load.
● ERP, Enterprise Resource Planning.
● CRM, Customer relationship management.
Summary: Terms
32
● OSV, Output Slot Vector.
● BI, Business Intelligence.
● HDFS, Hadoop Distributed File System
● DAG, Directed Acyclic Graph. The DAG defines the dataflow of the application, and the vertices of the
graph defines the operations that are to be performed on the data.
References
▪ FIWARE Catalogue
• https://www.fiware.org/developers/catalogue
▪ FIWARE Academy:
• https://fiware-academy.readthedocs.io/en/latest/processing/wirecloud
▪ Installation, administration & reference documentation is available on Read The Docs:
• https://fiware-cosmos-flink.readthedocs.io
33
References
▪ GitHub
• https://github.com/ging/fiware-cosmos-orion-flink-connector
• https://github.com/ging/fiware-cosmos-orion-spark-connector
• https://github.com/ging/fiware-cosmos-orion-flink-connector-examples
• https://github.com/ging/fiware-cosmos-orion-spark-connector-examples
34
Question & Answer
35
fiware-tech-help@lists.fiware.org
Big Data and Machine Learning with FIWARE
3
7

More Related Content

What's hot (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PDF
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
PDF
FIWARE Global Summit - NGSI-LD – an Evolution from NGSIv2
FIWARE
 
PDF
Intro to Neo4j and Graph Databases
Neo4j
 
PDF
Airbyte @ Airflow Summit - The new modern data stack
Michel Tricot
 
PDF
Introduction to Smart Data Models
FIWARE
 
PPTX
Comparing three data ingestion approaches where Apache Kafka integrates with ...
HostedbyConfluent
 
PDF
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE
 
PDF
FIWARE Global Summit - The Scorpio NGSI-LD Broker: Features and Supported Arc...
FIWARE
 
PDF
Change Data Feed in Delta
Databricks
 
PPTX
Building an Event Streaming Architecture with Apache Pulsar
ScyllaDB
 
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
PDF
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
StreamNative
 
PDF
Real time stock processing with apache nifi, apache flink and apache kafka
Timothy Spann
 
PDF
FIWARE Training: NGSI-LD Advanced Operations
FIWARE
 
PDF
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j
 
PDF
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Edureka!
 
PPTX
Apache NiFi in the Hadoop Ecosystem
Bryan Bende
 
PDF
Iceberg: a fast table format for S3
DataWorks Summit
 
DW Migration Webinar-March 2022.pptx
Databricks
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
FIWARE Global Summit - NGSI-LD – an Evolution from NGSIv2
FIWARE
 
Intro to Neo4j and Graph Databases
Neo4j
 
Airbyte @ Airflow Summit - The new modern data stack
Michel Tricot
 
Introduction to Smart Data Models
FIWARE
 
Comparing three data ingestion approaches where Apache Kafka integrates with ...
HostedbyConfluent
 
FIWARE Wednesday Webinars - Architecting Your Smart Solution Powered by FIWARE
FIWARE
 
FIWARE Global Summit - The Scorpio NGSI-LD Broker: Features and Supported Arc...
FIWARE
 
Change Data Feed in Delta
Databricks
 
Building an Event Streaming Architecture with Apache Pulsar
ScyllaDB
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
StreamNative
 
Real time stock processing with apache nifi, apache flink and apache kafka
Timothy Spann
 
FIWARE Training: NGSI-LD Advanced Operations
FIWARE
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Edureka!
 
Apache NiFi in the Hadoop Ecosystem
Bryan Bende
 
Iceberg: a fast table format for S3
DataWorks Summit
 

Similar to Big Data and Machine Learning with FIWARE (20)

PDF
FIWARE Global Summit - Real-time Processing of Historic Context Information u...
FIWARE
 
PDF
FIWARE Real-time Processing of Historic Context Information using Apache Flin...
sonsoleslp
 
PDF
FIWARE Global Summit - Big Data and Machine Learning with FIWARE
FIWARE
 
PDF
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE
 
PDF
FIWARE Global Summit - Real-time Processing of Historic Context Information u...
FIWARE
 
PDF
Near real-time anomaly detection at Lyft
markgrover
 
PPTX
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Data Con LA
 
PDF
Building iot applications with Apache Spark and Apache Bahir
Luciano Resende
 
PDF
Getting insights from IoT data with Apache Spark and Apache Bahir
Luciano Resende
 
PDF
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
PDF
FIWARE Wednesday Webinars - Machine Learning with Cosmos and Spark
FIWARE
 
PDF
Siddhi - cloud-native stream processor
Sriskandarajah Suhothayan
 
PPTX
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
InfluxData
 
PDF
A Tool For Big Data Analysis using Apache Spark
datamantra
 
PPTX
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
PPTX
Spark Summit EMEA - Arun Murthy's Keynote
Hortonworks
 
PPTX
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit
 
PDF
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
Guido Schmutz
 
PDF
The magic behind your Lyft ride prices: A case study on machine learning and ...
Karthik Murugesan
 
PPTX
Spark in the Maritime Domain
Demi Ben-Ari
 
FIWARE Global Summit - Real-time Processing of Historic Context Information u...
FIWARE
 
FIWARE Real-time Processing of Historic Context Information using Apache Flin...
sonsoleslp
 
FIWARE Global Summit - Big Data and Machine Learning with FIWARE
FIWARE
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE
 
FIWARE Global Summit - Real-time Processing of Historic Context Information u...
FIWARE
 
Near real-time anomaly detection at Lyft
markgrover
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Data Con LA
 
Building iot applications with Apache Spark and Apache Bahir
Luciano Resende
 
Getting insights from IoT data with Apache Spark and Apache Bahir
Luciano Resende
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
FIWARE Wednesday Webinars - Machine Learning with Cosmos and Spark
FIWARE
 
Siddhi - cloud-native stream processor
Sriskandarajah Suhothayan
 
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
InfluxData
 
A Tool For Big Data Analysis using Apache Spark
datamantra
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
Spark Summit EMEA - Arun Murthy's Keynote
Hortonworks
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit
 
IoT Architecture - Are Traditional Architectures Good Enough or do we Need Ne...
Guido Schmutz
 
The magic behind your Lyft ride prices: A case study on machine learning and ...
Karthik Murugesan
 
Spark in the Maritime Domain
Demi Ben-Ari
 
Ad

More from Fernando Lopez Aguilar (20)

PDF
Introduction to FIWARE technology
Fernando Lopez Aguilar
 
PDF
DW2020 Data Models - FIWARE Platform
Fernando Lopez Aguilar
 
PPTX
How to deploy a smart city platform?
Fernando Lopez Aguilar
 
PPTX
Building the Smart City Platform on FIWARE Lab
Fernando Lopez Aguilar
 
PDF
FIWARE and Robotics
Fernando Lopez Aguilar
 
PDF
Operational Dashboards with FIWARE WireCloud
Fernando Lopez Aguilar
 
PDF
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Fernando Lopez Aguilar
 
PDF
FIWARE Identity Management and Access Control
Fernando Lopez Aguilar
 
PDF
Data persistency (draco, cygnus, sth comet, quantum leap)
Fernando Lopez Aguilar
 
PDF
How to debug IoT Agents
Fernando Lopez Aguilar
 
PDF
Core Context Management
Fernando Lopez Aguilar
 
PDF
What is an IoT Agent
Fernando Lopez Aguilar
 
PDF
FIWARE Overview
Fernando Lopez Aguilar
 
PDF
Overview of the FIWARE Ecosystem
Fernando Lopez Aguilar
 
PPTX
Cloud and Big Data in the agriculture sector
Fernando Lopez Aguilar
 
PDF
Berlin OpenStack Summit'18
Fernando Lopez Aguilar
 
PPTX
Context Information Management in IoT enabled smart systems - the basics
Fernando Lopez Aguilar
 
PPTX
FIWARE IoT Introduction 1
Fernando Lopez Aguilar
 
PPTX
Introduction to FIWARE IoT
Fernando Lopez Aguilar
 
PPTX
Setting up your virtual infrastructure using FIWARE Lab Cloud
Fernando Lopez Aguilar
 
Introduction to FIWARE technology
Fernando Lopez Aguilar
 
DW2020 Data Models - FIWARE Platform
Fernando Lopez Aguilar
 
How to deploy a smart city platform?
Fernando Lopez Aguilar
 
Building the Smart City Platform on FIWARE Lab
Fernando Lopez Aguilar
 
FIWARE and Robotics
Fernando Lopez Aguilar
 
Operational Dashboards with FIWARE WireCloud
Fernando Lopez Aguilar
 
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Fernando Lopez Aguilar
 
FIWARE Identity Management and Access Control
Fernando Lopez Aguilar
 
Data persistency (draco, cygnus, sth comet, quantum leap)
Fernando Lopez Aguilar
 
How to debug IoT Agents
Fernando Lopez Aguilar
 
Core Context Management
Fernando Lopez Aguilar
 
What is an IoT Agent
Fernando Lopez Aguilar
 
FIWARE Overview
Fernando Lopez Aguilar
 
Overview of the FIWARE Ecosystem
Fernando Lopez Aguilar
 
Cloud and Big Data in the agriculture sector
Fernando Lopez Aguilar
 
Berlin OpenStack Summit'18
Fernando Lopez Aguilar
 
Context Information Management in IoT enabled smart systems - the basics
Fernando Lopez Aguilar
 
FIWARE IoT Introduction 1
Fernando Lopez Aguilar
 
Introduction to FIWARE IoT
Fernando Lopez Aguilar
 
Setting up your virtual infrastructure using FIWARE Lab Cloud
Fernando Lopez Aguilar
 
Ad

Recently uploaded (20)

PDF
The Internet - By the numbers, presented at npNOG 11
APNIC
 
PPTX
Lec15_Mutability Immutability-converted.pptx
khanjahanzaib1
 
PPTX
西班牙巴利阿里群岛大学电子版毕业证{UIBLetterUIB文凭证书}文凭复刻
Taqyea
 
PPTX
原版一样(毕业证书)法国蒙彼利埃大学毕业证文凭复刻
Taqyea
 
PPTX
原版一样(LHU毕业证书)英国利物浦希望大学毕业证办理方法
Taqyea
 
PDF
BRKSP-2551 - Introduction to Segment Routing.pdf
fcesargonca
 
PDF
Boardroom AI: The Next 10 Moves | Cerebraix Talent Tech
ssuser73bdb11
 
PPTX
Academic Debate: Creation vs Evolution.pptx
JOHNPATRICKMARTINEZ5
 
PDF
The Hidden Benefits of Outsourcing IT Hardware Procurement for Small Businesses
Carley Cramer
 
PDF
google promotion services in Delhi, India
Digital Web Future
 
PDF
FutureCon Seattle 2025 Presentation Slides - You Had One Job
Suzanne Aldrich
 
PDF
Digital burnout toolkit for youth workers and teachers
asociatiastart123
 
PPTX
Softuni - Psychology of entrepreneurship
Kalin Karakehayov
 
PDF
BRKAPP-1102 - Proactive Network and Application Monitoring.pdf
fcesargonca
 
PDF
Top 10 Testing Procedures to Ensure Your Magento to Shopify Migration Success...
CartCoders
 
PPTX
Metaphysics_Presentation_With_Visuals.pptx
erikjohnsales1
 
PPTX
美国电子毕业证帕克大学电子版成绩单UMCP学费发票办理学历认证
Taqyea
 
PPTX
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
PDF
Enhancing Parental Roles in Protecting Children from Online Sexual Exploitati...
ICT Frame Magazine Pvt. Ltd.
 
PDF
Cleaning up your RPKI invalids, presented at PacNOG 35
APNIC
 
The Internet - By the numbers, presented at npNOG 11
APNIC
 
Lec15_Mutability Immutability-converted.pptx
khanjahanzaib1
 
西班牙巴利阿里群岛大学电子版毕业证{UIBLetterUIB文凭证书}文凭复刻
Taqyea
 
原版一样(毕业证书)法国蒙彼利埃大学毕业证文凭复刻
Taqyea
 
原版一样(LHU毕业证书)英国利物浦希望大学毕业证办理方法
Taqyea
 
BRKSP-2551 - Introduction to Segment Routing.pdf
fcesargonca
 
Boardroom AI: The Next 10 Moves | Cerebraix Talent Tech
ssuser73bdb11
 
Academic Debate: Creation vs Evolution.pptx
JOHNPATRICKMARTINEZ5
 
The Hidden Benefits of Outsourcing IT Hardware Procurement for Small Businesses
Carley Cramer
 
google promotion services in Delhi, India
Digital Web Future
 
FutureCon Seattle 2025 Presentation Slides - You Had One Job
Suzanne Aldrich
 
Digital burnout toolkit for youth workers and teachers
asociatiastart123
 
Softuni - Psychology of entrepreneurship
Kalin Karakehayov
 
BRKAPP-1102 - Proactive Network and Application Monitoring.pdf
fcesargonca
 
Top 10 Testing Procedures to Ensure Your Magento to Shopify Migration Success...
CartCoders
 
Metaphysics_Presentation_With_Visuals.pptx
erikjohnsales1
 
美国电子毕业证帕克大学电子版成绩单UMCP学费发票办理学历认证
Taqyea
 
法国巴黎第二大学本科毕业证{Paris 2学费发票Paris 2成绩单}办理方法
Taqyea
 
Enhancing Parental Roles in Protecting Children from Online Sexual Exploitati...
ICT Frame Magazine Pvt. Ltd.
 
Cleaning up your RPKI invalids, presented at PacNOG 35
APNIC
 

Big Data and Machine Learning with FIWARE