SlideShare a Scribd company logo
i4Trust Website
i4Trust Community
End-to-end AI Solution With
PySpark & Real-time Data
Processing With Apache NiFi
Rihab Feki, Machine Learning Engineer and Evangelist
Sherifa Fayed, Technical Expert and Evangelist
FIWARE Foundation
Learning goals
● Managing real time data with the Context broker
● Data transformation (JSON-LD to CSV) and persistence with Apache NiFi
● Setting up a Google Cloud environment
○ Creating a Dataproc cluster and connecting it to Jupyter Notebook
○ Using Google Cloud Storage Service (GCS)
● Modeling a ML solution based on PySpark for multi-classification
● Deploying the ML model with Flask and getting predictions in real time
2
End to End AI service architecture powered by FIWARE
3
What is Apache NiFi?
4
● System to process and distribute
data
● Supports powerful and scalable
directed graphs of data routing and
transformation
● Web based user interface
● Tracking data flow from beginning
to end
5
Connecting NiFi to the Context Broker
NGSI-LD
Context
Broker
cURL or
Postman
NiFi (or
Draco)
1026:1026 5050:5050
27017:27017
MongoDB
Entity: Steel plate geometric measurements
6
Link to dataset
End to End AI service architecture powered by FIWARE
7
Dataflow overview
8
Ingesting
Data processing and persistence with NiFi
9
The overall NiFi workflow
10
Overview about NiFi workflow
11
● ListenHTTP: Configured as source for receiving notifications from the Context Broker
● GetFile: Reads data in JSON-LD format
● JoltTransformJSON: Transforms nested JSON to a simple attribute value JSON file which
will be used to form the CSV file
● ConvertRecord: Converts each JSON file to a CSV file
● MergeContent: Merges the resulting CSV record files to form an aggregated CSV dataset
(PS: The min number of entries can be set to perform the merge processor. Also a max
number of flow files can be set)
● PutGCSObject: Saves the resulting CSV in Google Cloud Storage bucket
Demo: Data transformation and persistence
12
End to End AI service architecture powered by FIWARE
13
What is PySpark?
14
PySpark is an interface for Apache Spark in Python.
PySpark is a language for performing exploratory data analysis at scale, building
machine learning pipelines, and creating ETLs for a data platform.
What is Cloud Dataproc?
Batch processing, querying, streaming
Machine Learning
15
Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools.
Big data processing
The main benefits of Dataproc
● It’s a managed service: No need for a system administrator to set it up.
● It’s fast: Cluster creation in about 90 seconds.
● It’s cheaper than building your own cluster: Because you can spin up a Dataproc cluster
when you need to run a job and shut it down afterward, so you only pay when jobs are
running.
● It’s integrated with other Google Cloud services: Including Cloud Storage, BigQuery, and
Cloud Bigtable, so it’s easy to get data into and out of it.
16
What makes Dataproc special?
Typical mode of operation of Hadoop/Spark   on premise or in cloud  require
you deploy a cluster, and then you proceed to fill up said cluster with jobs
17
What makes Dataproc special?
Rather than submitting the
job to an already-deployed
cluster, you submit the job to
Dataproc, which creates a
cluster on your behalf
on-demand.
➢ A cluster is now a
means to an end for
job execution.
18
Let’s see how Dataproc makes
it easy and scalable...
19
Data scientists are big fans of Jupyter Notebooks
However, getting an Apache Spark cluster set-up with Jupyter Notebooks can be complicated
Apache Spark and Jupyter Lab architecture on Google
Cloud
20
How it works ?
1. Setting up the Google cloud environment and creating a project
2. Creating a Google Cloud Storage bucket for your cluster
3. Creating a Dataproc Cluster with Jupyter and Component Gateway
4. Accessing the JupyterLab web UI on Dataproc
5. Creating a Notebook and developing the AI algorithm with PySpark
21
Creating a Dataproc cluster using cloud shell
22
gcloud beta dataproc clusters create ${CLUSTER_NAME} 
--region=${REGION} 
--image-version=1.4 
--master-machine-type=n1-standard-4 
--worker-machine-type=n1-standard-4 
--bucket=${BUCKET_NAME} 
--optional-components=ANACONDA,JUPYTER 
--enable-component-gateway
Component gateway for additional cluster components
23
Steel plates faults prediction
24
● Features: 27
Geometric Measurements
of the steel plates
● Fault types: 7
○ Pastry
○ Z_Scratch
○ K_Scatch
○ Stains
○ Dirtiness
○ Bumps
○ Other_Faults
Dataset format: CSV | Number of Samples: 1941
Link to dataset
Demo:
Cloud environment set up
Modeling the ML solution based on PySpark
25
ML model deployment with Flask architecture
26
27017:27017
5000:5000
www
Orion
Context
Broker
Model
prediction
Saved
Model
(.parquet)
Model training
Jupyter Notebook
cURL or
Postman
1026:1026
Useful links
● Source code and documentation
https://github.com/RihabFekii/PySpark-AI-service_Data-processing-NiFi
● Jupyter Notebook for Steel faults classification based on PySpark
https://github.com/RihabFekii/PySpark-AI-service_Data-processing-NiFi/blob/master/PySpark/P
ySpark_Steel_faults_Classification.ipynb
● Data processing and persistence with Apache NiFi documentation
https://github.com/RihabFekii/PySpark-AI-service_Data-processing-NiFi/tree/master/Nifi
● NGSI-LD Context Broker
○ Docker hub: https://hub.docker.com/r/fiware/orion-ld
○ Documentation: https://github.com/FIWARE/context.Orion-LD
● Google Cloud Console: https://console.cloud.google.com/
● Flask Apps with Docker: https://runnable.com/docker/python/docker-compose-with-flask-apps
● 27
Summary
28
● Context Broker does not store data or persist it
● Google Cloud Dataproc service provides data scientists an easy way to set up, control
and secure data science environments. Plus making it simple and fast for them to
integrate it with other open source data tools.
● Once the Dataproc cluster is created, it is not possible to change the configuration or
install new dependencies, libraries,..
● Dataproc jobs are limited to some programming languages.
● Apache NiFi might not be the easiest tool for data processing but it manages data flows
and automates them and it fits when dealing with large scale data or real-time data.
● Other cloud platforms could be used (AWS, Azure, Databricks,..)
Thank you!
http://fiware.org
Follow @FIWARE on Twitter
30
Q&A
31
Annex
32
Creating an entity in the Context Broker
unique id and type
Attributes of the
created entity
33
Subscribing to changes and listening
posting subscription to Orion
subscribing to all entities of
certain type
sending notification to port NiFi is listening on
subscribing to relevant attributes
34
Subscribing to changes and listening
Inducing a change and receiving a notification
35
Processor Out Count jumps to 1
changing the value of X_Minimum
Inducing a change and receiving a notification
Setting up the cloud environment
37
Creating a project in Google Cloud Platform
38
We can manage the
project via the Cloud Shell
Creating a Google Cloud Storage bucket
39
➢ Store datastes
➢ Store Notebooks
➢ Store logs
➢ Store output files
Creating a Dataproc cluster using cloud shell
40
gcloud beta dataproc clusters create ${CLUSTER_NAME} 
--region=${REGION} 
--image-version=1.4 
--master-machine-type=n1-standard-4 
--worker-machine-type=n1-standard-4 
--bucket=${BUCKET_NAME} 
--optional-components=ANACONDA,JUPYTER 
--enable-component-gateway
Creating a Dataproc cluster using GUI
41
Component gateway for additional cluster components
42
Overview of the Dataproc cluster
43
Dataproc cluster web interfaces
44
Dataproc cluster : Jupyter lab interface
45
Creating a Jupyter Notebook and provisioning data from
Google Cloud Bucket
46
Link to Notebook
Submitting a Pyspark job using Dataproc GUI
47
Submitting a Pyspark job to Dataproc cluster
48
www.egm.io
Fluid Machine Learning
lifecycle with FIWARE
Benoit Orihuela – i4Trust Training Webinar
A TYPICAL ML LIFECYCLE
• A Data Scientist
• Get and clean up data
• Prepare and train a ML model
• An IT person
• Package and deploy the ML model
• An end user
• Discover the available ML models (with respect to privacy)
• Ask to use one or more of them (and optionally pay for it)
• Get real time data (predictions, outliers,…) from a ML model
ML lifecycle with FIWARE - i4Trust - 12/05/2021 3
WHAT DO WE AIM AT?
ML lifecycle with FIWARE - i4Trust - 12/05/2021 4
Bridge the gap between data scientists and operations (MLOps)
Develop the Machine Learning as a Service (MLaaS) model
And also:
More and more use cases requiring ML / AI activities
FIWARE needs to offer a rich variety of tools
THE TRAINING AND PREPARATION PHASE
ML lifecycle with FIWARE - i4Trust - 12/05/2021 5
THE DISCOVERY AND REGISTRATION PHASE
ML lifecycle with FIWARE - i4Trust - 12/05/2021 6
THE PREDICTION PHASE
ML lifecycle with FIWARE - i4Trust - 12/05/2021 7
DEMONSTRATIONS
• Demonstration #1 - End to end demonstration of a ML model development, deployment and use
• Use of Jupyter notebook as interface
• Applied to a simplistic water flow calculation
• Demonstration #2 – Events generation from video stream analysis
• Realtime extraction of context information from a video stream
ML lifecycle with FIWARE - i4Trust - 12/05/2021 8
Thank You!
Tel:
E.mail:
www.egm.io
Benoit ORIHUELA
Lead Architect
+33 687427107
benoit.orihuela@egm.io
www.egm.io
MlaaS for Image analysis
Anwar ALFATAYRI
2
REAL LIFE EXAMPLE: SOCIAL DISTANCING
Number of people : 14
Groups of 2 people : 1
Groups of 3 people : 2
Groups of 4 people : 1
Groups >4 People: 0
Machine learning on the edge
TWO APPROACHES
3
Image 3 people detected
Street
Fiware Cloud
4
Machine learning as a service
TWO APPROACHES
Image
3 people detected
Street Fiware Cloud
API Rest
Ad

More Related Content

What's hot (20)

FIWARE Training: IoT and Legacy
FIWARE Training: IoT and LegacyFIWARE Training: IoT and Legacy
FIWARE Training: IoT and Legacy
FIWARE
 
Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...
Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...
Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...
FIWARE
 
FIWARE Training: FIWARE Training: i4Trust Marketplace
FIWARE Training: FIWARE Training: i4Trust MarketplaceFIWARE Training: FIWARE Training: i4Trust Marketplace
FIWARE Training: FIWARE Training: i4Trust Marketplace
FIWARE
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE
 
FIWARE Training: JSON-LD and NGSI-LD
FIWARE Training: JSON-LD and NGSI-LDFIWARE Training: JSON-LD and NGSI-LD
FIWARE Training: JSON-LD and NGSI-LD
FIWARE
 
FIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked DataFIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE
 
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Creating a Context-Aware solution, Complex Event Processing with FIWARE PerseoCreating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Fernando Lopez Aguilar
 
FIWARE Training: Identity Management and Access Control
FIWARE Training: Identity Management and Access ControlFIWARE Training: Identity Management and Access Control
FIWARE Training: Identity Management and Access Control
FIWARE
 
FIWARE Wednesday Webinars - Integrating FIWARE with Blockchain/DLTs
FIWARE Wednesday Webinars - Integrating FIWARE with Blockchain/DLTsFIWARE Wednesday Webinars - Integrating FIWARE with Blockchain/DLTs
FIWARE Wednesday Webinars - Integrating FIWARE with Blockchain/DLTs
FIWARE
 
FIWARE Training: API Umbrella
FIWARE Training: API UmbrellaFIWARE Training: API Umbrella
FIWARE Training: API Umbrella
FIWARE
 
i4Trust IAM Components
i4Trust IAM Componentsi4Trust IAM Components
i4Trust IAM Components
FIWARE
 
FIWARE Wednesday Webinars - How to Design DataModels
FIWARE Wednesday Webinars - How to Design DataModelsFIWARE Wednesday Webinars - How to Design DataModels
FIWARE Wednesday Webinars - How to Design DataModels
FIWARE
 
Integrating Fiware Orion, Keyrock and Wilma
Integrating Fiware Orion, Keyrock and WilmaIntegrating Fiware Orion, Keyrock and Wilma
Integrating Fiware Orion, Keyrock and Wilma
Dalton Valadares
 
IBM: Hey FIDO, Meet Passkey!.pptx
IBM: Hey FIDO, Meet Passkey!.pptxIBM: Hey FIDO, Meet Passkey!.pptx
IBM: Hey FIDO, Meet Passkey!.pptx
FIDO Alliance
 
Introduction to Smart Data Models
Introduction to Smart Data ModelsIntroduction to Smart Data Models
Introduction to Smart Data Models
FIWARE
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - Overview
FIWARE
 
Orion Context Broker 1.15.0
Orion Context Broker 1.15.0Orion Context Broker 1.15.0
Orion Context Broker 1.15.0
Fermin Galan
 
SSO With APEX and ADFS the weblogic way
SSO With APEX and ADFS the weblogic waySSO With APEX and ADFS the weblogic way
SSO With APEX and ADFS the weblogic way
makker_nl
 
Data Modeling with NGSI, NGSI-LD
Data Modeling with NGSI, NGSI-LDData Modeling with NGSI, NGSI-LD
Data Modeling with NGSI, NGSI-LD
Fernando Lopez Aguilar
 
OAuth 2.0
OAuth 2.0OAuth 2.0
OAuth 2.0
Uwe Friedrichsen
 
FIWARE Training: IoT and Legacy
FIWARE Training: IoT and LegacyFIWARE Training: IoT and Legacy
FIWARE Training: IoT and Legacy
FIWARE
 
Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...
Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...
Session 1 - Introduction to i4Trust Data Spaces, building blocks, and roles |...
FIWARE
 
FIWARE Training: FIWARE Training: i4Trust Marketplace
FIWARE Training: FIWARE Training: i4Trust MarketplaceFIWARE Training: FIWARE Training: i4Trust Marketplace
FIWARE Training: FIWARE Training: i4Trust Marketplace
FIWARE
 
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
FIWARE
 
FIWARE Training: JSON-LD and NGSI-LD
FIWARE Training: JSON-LD and NGSI-LDFIWARE Training: JSON-LD and NGSI-LD
FIWARE Training: JSON-LD and NGSI-LD
FIWARE
 
FIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked DataFIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE Global Summit - NGSI-LD - NGSI with Linked Data
FIWARE
 
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Creating a Context-Aware solution, Complex Event Processing with FIWARE PerseoCreating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Creating a Context-Aware solution, Complex Event Processing with FIWARE Perseo
Fernando Lopez Aguilar
 
FIWARE Training: Identity Management and Access Control
FIWARE Training: Identity Management and Access ControlFIWARE Training: Identity Management and Access Control
FIWARE Training: Identity Management and Access Control
FIWARE
 
FIWARE Wednesday Webinars - Integrating FIWARE with Blockchain/DLTs
FIWARE Wednesday Webinars - Integrating FIWARE with Blockchain/DLTsFIWARE Wednesday Webinars - Integrating FIWARE with Blockchain/DLTs
FIWARE Wednesday Webinars - Integrating FIWARE with Blockchain/DLTs
FIWARE
 
FIWARE Training: API Umbrella
FIWARE Training: API UmbrellaFIWARE Training: API Umbrella
FIWARE Training: API Umbrella
FIWARE
 
i4Trust IAM Components
i4Trust IAM Componentsi4Trust IAM Components
i4Trust IAM Components
FIWARE
 
FIWARE Wednesday Webinars - How to Design DataModels
FIWARE Wednesday Webinars - How to Design DataModelsFIWARE Wednesday Webinars - How to Design DataModels
FIWARE Wednesday Webinars - How to Design DataModels
FIWARE
 
Integrating Fiware Orion, Keyrock and Wilma
Integrating Fiware Orion, Keyrock and WilmaIntegrating Fiware Orion, Keyrock and Wilma
Integrating Fiware Orion, Keyrock and Wilma
Dalton Valadares
 
IBM: Hey FIDO, Meet Passkey!.pptx
IBM: Hey FIDO, Meet Passkey!.pptxIBM: Hey FIDO, Meet Passkey!.pptx
IBM: Hey FIDO, Meet Passkey!.pptx
FIDO Alliance
 
Introduction to Smart Data Models
Introduction to Smart Data ModelsIntroduction to Smart Data Models
Introduction to Smart Data Models
FIWARE
 
i4Trust - Overview
i4Trust - Overviewi4Trust - Overview
i4Trust - Overview
FIWARE
 
Orion Context Broker 1.15.0
Orion Context Broker 1.15.0Orion Context Broker 1.15.0
Orion Context Broker 1.15.0
Fermin Galan
 
SSO With APEX and ADFS the weblogic way
SSO With APEX and ADFS the weblogic waySSO With APEX and ADFS the weblogic way
SSO With APEX and ADFS the weblogic way
makker_nl
 

Similar to Session 8 - Creating Data Processing Services | Train the Trainers Program (20)

Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
FIWARE
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Alluxio, Inc.
 
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online BootcampBuilding Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Data Con LA
 
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
オラクルエンジニア通信
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
DSDT_MTL
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
JDA Labs MTL
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
IanFurlong4
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann
 
Google's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT ServicesGoogle's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT Services
Intel® Software
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
CalvinSim10
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Alluxio, Inc.
 
JAM23-24_ppt.pptx
JAM23-24_ppt.pptxJAM23-24_ppt.pptx
JAM23-24_ppt.pptx
AbrarSharif2
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analytics
Kyle Bader
 
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Andrejs Prokopjevs
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
Alex Van Boxel
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
Day 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers ProgramDay 13 - Creating Data Processing Services | Train the Trainers Program
Day 13 - Creating Data Processing Services | Train the Trainers Program
FIWARE
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Alluxio, Inc.
 
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online BootcampBuilding Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Data Con LA
 
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
オラクルエンジニア通信
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
DSDT_MTL
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
JDA Labs MTL
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
IanFurlong4
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann
 
Google's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT ServicesGoogle's Infrastructure and Specific IoT Services
Google's Infrastructure and Specific IoT Services
Intel® Software
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
CalvinSim10
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Alluxio, Inc.
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analytics
Kyle Bader
 
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Oracle EBS Journey to the Cloud - What is New in 2022 (UKOUG Breakthrough 22 ...
Andrejs Prokopjevs
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
Ad

More from FIWARE (20)

Behm_Herne_NeMo_akt.pptx
Behm_Herne_NeMo_akt.pptxBehm_Herne_NeMo_akt.pptx
Behm_Herne_NeMo_akt.pptx
FIWARE
 
Katharina Hogrebe Herne Digital Days.pdf
 Katharina Hogrebe Herne Digital Days.pdf Katharina Hogrebe Herne Digital Days.pdf
Katharina Hogrebe Herne Digital Days.pdf
FIWARE
 
Christoph Mertens_IDSA_Introduction to Data Spaces.pptx
Christoph Mertens_IDSA_Introduction to Data Spaces.pptxChristoph Mertens_IDSA_Introduction to Data Spaces.pptx
Christoph Mertens_IDSA_Introduction to Data Spaces.pptx
FIWARE
 
Behm_Herne_NeMo.pptx
Behm_Herne_NeMo.pptxBehm_Herne_NeMo.pptx
Behm_Herne_NeMo.pptx
FIWARE
 
Evangelists + iHubs Promo Slides.pptx
Evangelists + iHubs Promo Slides.pptxEvangelists + iHubs Promo Slides.pptx
Evangelists + iHubs Promo Slides.pptx
FIWARE
 
Lukas Künzel Smart City Operating System.pptx
Lukas Künzel Smart City Operating System.pptxLukas Künzel Smart City Operating System.pptx
Lukas Künzel Smart City Operating System.pptx
FIWARE
 
Pierre Golz Der Transformationsprozess im Konzern Stadt.pptx
Pierre Golz Der Transformationsprozess im Konzern Stadt.pptxPierre Golz Der Transformationsprozess im Konzern Stadt.pptx
Pierre Golz Der Transformationsprozess im Konzern Stadt.pptx
FIWARE
 
Dennis Wendland_The i4Trust Collaboration Programme.pptx
Dennis Wendland_The i4Trust Collaboration Programme.pptxDennis Wendland_The i4Trust Collaboration Programme.pptx
Dennis Wendland_The i4Trust Collaboration Programme.pptx
FIWARE
 
Ulrich Ahle_FIWARE.pptx
Ulrich Ahle_FIWARE.pptxUlrich Ahle_FIWARE.pptx
Ulrich Ahle_FIWARE.pptx
FIWARE
 
Aleksandar Vrglevski _FIWARE DACH_OSIH.pptx
Aleksandar Vrglevski _FIWARE DACH_OSIH.pptxAleksandar Vrglevski _FIWARE DACH_OSIH.pptx
Aleksandar Vrglevski _FIWARE DACH_OSIH.pptx
FIWARE
 
Water Quality - Lukas Kuenzel.pdf
Water Quality - Lukas Kuenzel.pdfWater Quality - Lukas Kuenzel.pdf
Water Quality - Lukas Kuenzel.pdf
FIWARE
 
Cameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptx
Cameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptxCameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptx
Cameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptx
FIWARE
 
FiWareSummit.msGIS-Data-to-Value.2023.06.12.pptx
FiWareSummit.msGIS-Data-to-Value.2023.06.12.pptxFiWareSummit.msGIS-Data-to-Value.2023.06.12.pptx
FiWareSummit.msGIS-Data-to-Value.2023.06.12.pptx
FIWARE
 
Boris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptx
Boris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptxBoris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptx
Boris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptx
FIWARE
 
Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....
Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....
Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....
FIWARE
 
Abdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdf
Abdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdfAbdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdf
Abdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdf
FIWARE
 
FGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdf
FGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdfFGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdf
FGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdf
FIWARE
 
HTAG_Skalierung_Plattform_lokal_final_versand.pptx
HTAG_Skalierung_Plattform_lokal_final_versand.pptxHTAG_Skalierung_Plattform_lokal_final_versand.pptx
HTAG_Skalierung_Plattform_lokal_final_versand.pptx
FIWARE
 
WE_LoRaWAN _ IoT.pptx
WE_LoRaWAN  _ IoT.pptxWE_LoRaWAN  _ IoT.pptx
WE_LoRaWAN _ IoT.pptx
FIWARE
 
EU Opp_Clara Pezuela - German chapter.pptx
EU Opp_Clara Pezuela - German chapter.pptxEU Opp_Clara Pezuela - German chapter.pptx
EU Opp_Clara Pezuela - German chapter.pptx
FIWARE
 
Behm_Herne_NeMo_akt.pptx
Behm_Herne_NeMo_akt.pptxBehm_Herne_NeMo_akt.pptx
Behm_Herne_NeMo_akt.pptx
FIWARE
 
Katharina Hogrebe Herne Digital Days.pdf
 Katharina Hogrebe Herne Digital Days.pdf Katharina Hogrebe Herne Digital Days.pdf
Katharina Hogrebe Herne Digital Days.pdf
FIWARE
 
Christoph Mertens_IDSA_Introduction to Data Spaces.pptx
Christoph Mertens_IDSA_Introduction to Data Spaces.pptxChristoph Mertens_IDSA_Introduction to Data Spaces.pptx
Christoph Mertens_IDSA_Introduction to Data Spaces.pptx
FIWARE
 
Behm_Herne_NeMo.pptx
Behm_Herne_NeMo.pptxBehm_Herne_NeMo.pptx
Behm_Herne_NeMo.pptx
FIWARE
 
Evangelists + iHubs Promo Slides.pptx
Evangelists + iHubs Promo Slides.pptxEvangelists + iHubs Promo Slides.pptx
Evangelists + iHubs Promo Slides.pptx
FIWARE
 
Lukas Künzel Smart City Operating System.pptx
Lukas Künzel Smart City Operating System.pptxLukas Künzel Smart City Operating System.pptx
Lukas Künzel Smart City Operating System.pptx
FIWARE
 
Pierre Golz Der Transformationsprozess im Konzern Stadt.pptx
Pierre Golz Der Transformationsprozess im Konzern Stadt.pptxPierre Golz Der Transformationsprozess im Konzern Stadt.pptx
Pierre Golz Der Transformationsprozess im Konzern Stadt.pptx
FIWARE
 
Dennis Wendland_The i4Trust Collaboration Programme.pptx
Dennis Wendland_The i4Trust Collaboration Programme.pptxDennis Wendland_The i4Trust Collaboration Programme.pptx
Dennis Wendland_The i4Trust Collaboration Programme.pptx
FIWARE
 
Ulrich Ahle_FIWARE.pptx
Ulrich Ahle_FIWARE.pptxUlrich Ahle_FIWARE.pptx
Ulrich Ahle_FIWARE.pptx
FIWARE
 
Aleksandar Vrglevski _FIWARE DACH_OSIH.pptx
Aleksandar Vrglevski _FIWARE DACH_OSIH.pptxAleksandar Vrglevski _FIWARE DACH_OSIH.pptx
Aleksandar Vrglevski _FIWARE DACH_OSIH.pptx
FIWARE
 
Water Quality - Lukas Kuenzel.pdf
Water Quality - Lukas Kuenzel.pdfWater Quality - Lukas Kuenzel.pdf
Water Quality - Lukas Kuenzel.pdf
FIWARE
 
Cameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptx
Cameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptxCameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptx
Cameron Brooks_FGS23_FIWARE Summit_Keynote_Cameron.pptx
FIWARE
 
FiWareSummit.msGIS-Data-to-Value.2023.06.12.pptx
FiWareSummit.msGIS-Data-to-Value.2023.06.12.pptxFiWareSummit.msGIS-Data-to-Value.2023.06.12.pptx
FiWareSummit.msGIS-Data-to-Value.2023.06.12.pptx
FIWARE
 
Boris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptx
Boris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptxBoris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptx
Boris Otto_FGS2023_Opening- EU Innovations from Data_PUB_V1_BOt.pptx
FIWARE
 
Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....
Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....
Bjoern de Vidts_FGS23_Opening_athumi - bjord de vidts - personal data spaces....
FIWARE
 
Abdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdf
Abdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdfAbdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdf
Abdulrahman Ibrahim_FGS23 Opening - Abdulrahman Ibrahim.pdf
FIWARE
 
FGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdf
FGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdfFGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdf
FGS2023_Opening_Red Hat Keynote Andrea Battaglia.pdf
FIWARE
 
HTAG_Skalierung_Plattform_lokal_final_versand.pptx
HTAG_Skalierung_Plattform_lokal_final_versand.pptxHTAG_Skalierung_Plattform_lokal_final_versand.pptx
HTAG_Skalierung_Plattform_lokal_final_versand.pptx
FIWARE
 
WE_LoRaWAN _ IoT.pptx
WE_LoRaWAN  _ IoT.pptxWE_LoRaWAN  _ IoT.pptx
WE_LoRaWAN _ IoT.pptx
FIWARE
 
EU Opp_Clara Pezuela - German chapter.pptx
EU Opp_Clara Pezuela - German chapter.pptxEU Opp_Clara Pezuela - German chapter.pptx
EU Opp_Clara Pezuela - German chapter.pptx
FIWARE
 
Ad

Recently uploaded (20)

Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 

Session 8 - Creating Data Processing Services | Train the Trainers Program

  • 1. i4Trust Website i4Trust Community End-to-end AI Solution With PySpark & Real-time Data Processing With Apache NiFi Rihab Feki, Machine Learning Engineer and Evangelist Sherifa Fayed, Technical Expert and Evangelist FIWARE Foundation
  • 2. Learning goals ● Managing real time data with the Context broker ● Data transformation (JSON-LD to CSV) and persistence with Apache NiFi ● Setting up a Google Cloud environment ○ Creating a Dataproc cluster and connecting it to Jupyter Notebook ○ Using Google Cloud Storage Service (GCS) ● Modeling a ML solution based on PySpark for multi-classification ● Deploying the ML model with Flask and getting predictions in real time 2
  • 3. End to End AI service architecture powered by FIWARE 3
  • 4. What is Apache NiFi? 4 ● System to process and distribute data ● Supports powerful and scalable directed graphs of data routing and transformation ● Web based user interface ● Tracking data flow from beginning to end
  • 5. 5 Connecting NiFi to the Context Broker NGSI-LD Context Broker cURL or Postman NiFi (or Draco) 1026:1026 5050:5050 27017:27017 MongoDB
  • 6. Entity: Steel plate geometric measurements 6 Link to dataset
  • 7. End to End AI service architecture powered by FIWARE 7
  • 9. Data processing and persistence with NiFi 9
  • 10. The overall NiFi workflow 10
  • 11. Overview about NiFi workflow 11 ● ListenHTTP: Configured as source for receiving notifications from the Context Broker ● GetFile: Reads data in JSON-LD format ● JoltTransformJSON: Transforms nested JSON to a simple attribute value JSON file which will be used to form the CSV file ● ConvertRecord: Converts each JSON file to a CSV file ● MergeContent: Merges the resulting CSV record files to form an aggregated CSV dataset (PS: The min number of entries can be set to perform the merge processor. Also a max number of flow files can be set) ● PutGCSObject: Saves the resulting CSV in Google Cloud Storage bucket
  • 12. Demo: Data transformation and persistence 12
  • 13. End to End AI service architecture powered by FIWARE 13
  • 14. What is PySpark? 14 PySpark is an interface for Apache Spark in Python. PySpark is a language for performing exploratory data analysis at scale, building machine learning pipelines, and creating ETLs for a data platform.
  • 15. What is Cloud Dataproc? Batch processing, querying, streaming Machine Learning 15 Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools. Big data processing
  • 16. The main benefits of Dataproc ● It’s a managed service: No need for a system administrator to set it up. ● It’s fast: Cluster creation in about 90 seconds. ● It’s cheaper than building your own cluster: Because you can spin up a Dataproc cluster when you need to run a job and shut it down afterward, so you only pay when jobs are running. ● It’s integrated with other Google Cloud services: Including Cloud Storage, BigQuery, and Cloud Bigtable, so it’s easy to get data into and out of it. 16
  • 17. What makes Dataproc special? Typical mode of operation of Hadoop/Spark   on premise or in cloud  require you deploy a cluster, and then you proceed to fill up said cluster with jobs 17
  • 18. What makes Dataproc special? Rather than submitting the job to an already-deployed cluster, you submit the job to Dataproc, which creates a cluster on your behalf on-demand. ➢ A cluster is now a means to an end for job execution. 18
  • 19. Let’s see how Dataproc makes it easy and scalable... 19 Data scientists are big fans of Jupyter Notebooks However, getting an Apache Spark cluster set-up with Jupyter Notebooks can be complicated
  • 20. Apache Spark and Jupyter Lab architecture on Google Cloud 20
  • 21. How it works ? 1. Setting up the Google cloud environment and creating a project 2. Creating a Google Cloud Storage bucket for your cluster 3. Creating a Dataproc Cluster with Jupyter and Component Gateway 4. Accessing the JupyterLab web UI on Dataproc 5. Creating a Notebook and developing the AI algorithm with PySpark 21
  • 22. Creating a Dataproc cluster using cloud shell 22 gcloud beta dataproc clusters create ${CLUSTER_NAME} --region=${REGION} --image-version=1.4 --master-machine-type=n1-standard-4 --worker-machine-type=n1-standard-4 --bucket=${BUCKET_NAME} --optional-components=ANACONDA,JUPYTER --enable-component-gateway
  • 23. Component gateway for additional cluster components 23
  • 24. Steel plates faults prediction 24 ● Features: 27 Geometric Measurements of the steel plates ● Fault types: 7 ○ Pastry ○ Z_Scratch ○ K_Scatch ○ Stains ○ Dirtiness ○ Bumps ○ Other_Faults Dataset format: CSV | Number of Samples: 1941 Link to dataset
  • 25. Demo: Cloud environment set up Modeling the ML solution based on PySpark 25
  • 26. ML model deployment with Flask architecture 26 27017:27017 5000:5000 www Orion Context Broker Model prediction Saved Model (.parquet) Model training Jupyter Notebook cURL or Postman 1026:1026
  • 27. Useful links ● Source code and documentation https://github.com/RihabFekii/PySpark-AI-service_Data-processing-NiFi ● Jupyter Notebook for Steel faults classification based on PySpark https://github.com/RihabFekii/PySpark-AI-service_Data-processing-NiFi/blob/master/PySpark/P ySpark_Steel_faults_Classification.ipynb ● Data processing and persistence with Apache NiFi documentation https://github.com/RihabFekii/PySpark-AI-service_Data-processing-NiFi/tree/master/Nifi ● NGSI-LD Context Broker ○ Docker hub: https://hub.docker.com/r/fiware/orion-ld ○ Documentation: https://github.com/FIWARE/context.Orion-LD ● Google Cloud Console: https://console.cloud.google.com/ ● Flask Apps with Docker: https://runnable.com/docker/python/docker-compose-with-flask-apps ● 27
  • 28. Summary 28 ● Context Broker does not store data or persist it ● Google Cloud Dataproc service provides data scientists an easy way to set up, control and secure data science environments. Plus making it simple and fast for them to integrate it with other open source data tools. ● Once the Dataproc cluster is created, it is not possible to change the configuration or install new dependencies, libraries,.. ● Dataproc jobs are limited to some programming languages. ● Apache NiFi might not be the easiest tool for data processing but it manages data flows and automates them and it fits when dealing with large scale data or real-time data. ● Other cloud platforms could be used (AWS, Azure, Databricks,..)
  • 32. 32 Creating an entity in the Context Broker unique id and type Attributes of the created entity
  • 33. 33 Subscribing to changes and listening posting subscription to Orion subscribing to all entities of certain type sending notification to port NiFi is listening on subscribing to relevant attributes
  • 34. 34 Subscribing to changes and listening
  • 35. Inducing a change and receiving a notification 35
  • 36. Processor Out Count jumps to 1 changing the value of X_Minimum Inducing a change and receiving a notification
  • 37. Setting up the cloud environment 37
  • 38. Creating a project in Google Cloud Platform 38 We can manage the project via the Cloud Shell
  • 39. Creating a Google Cloud Storage bucket 39 ➢ Store datastes ➢ Store Notebooks ➢ Store logs ➢ Store output files
  • 40. Creating a Dataproc cluster using cloud shell 40 gcloud beta dataproc clusters create ${CLUSTER_NAME} --region=${REGION} --image-version=1.4 --master-machine-type=n1-standard-4 --worker-machine-type=n1-standard-4 --bucket=${BUCKET_NAME} --optional-components=ANACONDA,JUPYTER --enable-component-gateway
  • 41. Creating a Dataproc cluster using GUI 41
  • 42. Component gateway for additional cluster components 42
  • 43. Overview of the Dataproc cluster 43
  • 44. Dataproc cluster web interfaces 44
  • 45. Dataproc cluster : Jupyter lab interface 45
  • 46. Creating a Jupyter Notebook and provisioning data from Google Cloud Bucket 46 Link to Notebook
  • 47. Submitting a Pyspark job using Dataproc GUI 47
  • 48. Submitting a Pyspark job to Dataproc cluster 48
  • 49. www.egm.io Fluid Machine Learning lifecycle with FIWARE Benoit Orihuela – i4Trust Training Webinar
  • 50. A TYPICAL ML LIFECYCLE • A Data Scientist • Get and clean up data • Prepare and train a ML model • An IT person • Package and deploy the ML model • An end user • Discover the available ML models (with respect to privacy) • Ask to use one or more of them (and optionally pay for it) • Get real time data (predictions, outliers,…) from a ML model ML lifecycle with FIWARE - i4Trust - 12/05/2021 3
  • 51. WHAT DO WE AIM AT? ML lifecycle with FIWARE - i4Trust - 12/05/2021 4 Bridge the gap between data scientists and operations (MLOps) Develop the Machine Learning as a Service (MLaaS) model And also: More and more use cases requiring ML / AI activities FIWARE needs to offer a rich variety of tools
  • 52. THE TRAINING AND PREPARATION PHASE ML lifecycle with FIWARE - i4Trust - 12/05/2021 5
  • 53. THE DISCOVERY AND REGISTRATION PHASE ML lifecycle with FIWARE - i4Trust - 12/05/2021 6
  • 54. THE PREDICTION PHASE ML lifecycle with FIWARE - i4Trust - 12/05/2021 7
  • 55. DEMONSTRATIONS • Demonstration #1 - End to end demonstration of a ML model development, deployment and use • Use of Jupyter notebook as interface • Applied to a simplistic water flow calculation • Demonstration #2 – Events generation from video stream analysis • Realtime extraction of context information from a video stream ML lifecycle with FIWARE - i4Trust - 12/05/2021 8
  • 57. www.egm.io MlaaS for Image analysis Anwar ALFATAYRI
  • 58. 2 REAL LIFE EXAMPLE: SOCIAL DISTANCING Number of people : 14 Groups of 2 people : 1 Groups of 3 people : 2 Groups of 4 people : 1 Groups >4 People: 0
  • 59. Machine learning on the edge TWO APPROACHES 3 Image 3 people detected Street Fiware Cloud
  • 60. 4 Machine learning as a service TWO APPROACHES Image 3 people detected Street Fiware Cloud API Rest