SlideShare a Scribd company logo
Machine Learning in the IoT
with
Apache Nifi
Michael Bironneau
April 2017
@OpenEnergi
What problem are we solving?
Image from Wiki Commons https://en.wikipedia.org/wiki/Pearl_Street_Station
0
5
10
15
20
25
30
35
Installed Capacity (GW) Generation (GW)
Machine Learning in the IoT with Apache NiFi
Our Solution
Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
0
2
4
6
8
10
12
14
16
18
20
0:00 2:30 5:00 7:30 10:00 12:30 15:00 17:30 20:00 22:30
MW
Total Power
Average upwards flex – 120%
Average downwards flex – 35%
Machine Learning in the IoT with Apache NiFi
Machine Learning in the IoT with Apache NiFi
Our Data
• ~20k telemetry messages/second
• ~5k messages/second report a change of state that requires
secondary processing (eg. validating forecast)
• Most messages require aggregation for reporting purposes
Why Apache Nifi?
• Data provenance
• Built-in mechanism for backpressure and fault handling
• Easy to use
• Built-in processors for Azure services
• Easy to extend
• Performance not our main concern, but nice to know that it
scales
Downsides
• Source control of flows – possible but diffs not very readable
• Automated flow testing and CI still remain difficult
• Script components not easy maintain
• Not all processors work in clustered mode
Examples
Computing Response After Dispatch
0
2
4
6
8
10
12
0 5 10 15 20 25 30
Response(kW)
Time Elapsed (s)
Dynamic Demand Response
-2
0
2
4
6
8
10
12
0 5 10 15 20 25 30
ActivePower(kW)
Time Elapsed (s)
Connected Power Consumption
Response
baseline
Duration of request
Extract JSON properties
Lookup previous state
and cache current state
Compute and publish
state change metrics
Dynamic Demand Response Forecasting
-2
0
2
4
6
8
10
12
0 5 10 15 20 25 30
ActivePower(kW)
Time (s)
Forecasted Response
Before After? Dispatch Request
Forecast
Extract properties from JSON
Metadata/state lookups and
caching
Score model
(Python Script)
First approach - pure Nifi solution
Observations
• Fun example, but not practical
• Nifi scripting is not easy to test or maintain
• Long, messy flows are not easy to troubleshoot
Extract JSON properties
Filter
Get
Forecast
2nd approach – Use Nifi as Orchestrator
Observations
• As practical/maintainable as the HTTP service
• Where did all the logic go? This is boring!
• Why use Nifi at all?
– Traditional stream processing (eg. Storm)
– Serverless (eg. Azure Function)
0%
5%
10%
15%
20%
25%
30%
35%
40%
135
140
145
150
155
160
165
7:12:00 PM 12:00:00 AM 4:48:00 AM 9:36:00 AM 2:24:00 PM 7:12:00 PM 12:00:00 AM 4:48:00 AM 9:36:00 AM 2:24:00 PM
MeanSqErrorOverDay
BitumenTankSetpoint(DegC)
Date
Forecasting Error
Setpoint Mean sq Error
Real-time Model Validation
Setpoint Change
Invalid Model
Forecast
Receive data
Observe
Difference
Increment
Accumulated
Square Error
Fit model parameters
Y
N
Acceptable Error?
Extract JSON properties
Filter
Get and cache
forecast
Validate and re-fit if required
Real-time Validation with Nifi
Next step
• Store the errors in a max-heap and use these to retrain in a
priority order
• Better reporting
Architecture
Model
Registry/Proxy
Model 1
Model 1
Persistence
Model
Registry/Proxy
Model 1
Model 1
Persistence
Enrichment/
Aggregation
Forecasting/
Optimisation
Model
Registry/Proxy
Model 1
Model 1
Persistence
Machine Learning in the IoT with Apache NiFi
Thank you for listening

More Related Content

What's hot (20)

PDF
Making Apache Spark Better with Delta Lake
Databricks
 
PDF
Azure Data Factory v2
inovex GmbH
 
PDF
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Flink Forward
 
PPTX
Kafka presentation
Mohammed Fazuluddin
 
PDF
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Flink Forward
 
PPTX
Free Training: How to Build a Lakehouse
Databricks
 
PDF
Big Query - Utilizing Google Data Warehouse for Media Analytics
hafeeznazri
 
PDF
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Michael Rainey
 
PDF
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
 
PDF
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Databricks
 
PPTX
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
PDF
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
PPTX
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
DataWorks Summit
 
PDF
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Lightbend
 
PDF
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PDF
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
PPTX
Rate limiting
Viyaan Jhiingade
 
PDF
Building large scale transactional data lake using apache hudi
Bill Liu
 
PDF
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Making Apache Spark Better with Delta Lake
Databricks
 
Azure Data Factory v2
inovex GmbH
 
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Flink Forward
 
Kafka presentation
Mohammed Fazuluddin
 
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Flink Forward
 
Free Training: How to Build a Lakehouse
Databricks
 
Big Query - Utilizing Google Data Warehouse for Media Analytics
hafeeznazri
 
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Michael Rainey
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Databricks
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Databricks
 
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
DataWorks Summit
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Lightbend
 
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Yohei Onishi
 
Rate limiting
Viyaan Jhiingade
 
Building large scale transactional data lake using apache hudi
Bill Liu
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 

Similar to Machine Learning in the IoT with Apache NiFi (20)

PDF
ApacheCon 2021 - Apache NiFi Deep Dive 300
Timothy Spann
 
PPTX
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
PDF
Learning the basics of Apache NiFi for iot OSS Europe 2020
Timothy Spann
 
PDF
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
Timothy Spann
 
PDF
A survey on Machine Learning In Production (July 2018)
Arnab Biswas
 
PDF
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Otávio Carvalho
 
PDF
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann
 
PDF
CoC23_ Looking at the New Features of Apache NiFi
ssuser73434e
 
PDF
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Flink Forward
 
PDF
Building Your Data Streams for all the IoT
DevOps.com
 
PDF
Edge to ai analytics from edge to cloud with efficient movement of machine data
Timothy Spann
 
PDF
Enterprise IIoT Edge Processing with Apache NiFi
Timothy Spann
 
PDF
WarsawITDays_ ApacheNiFi202
Timothy Spann
 
PDF
Nifi workshop
Yifeng Jiang
 
PDF
Using apache mx net in production deep learning streaming pipelines
Timothy Spann
 
PDF
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Anant Corporation
 
PPTX
Machine Learning Orchestration with Airflow
Anant Corporation
 
PDF
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
PDF
Automate your data flows with Apache NIFI
Adam Doyle
 
PDF
Hail hydrate! from stream to lake using open source
Timothy Spann
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
Timothy Spann
 
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
Learning the basics of Apache NiFi for iot OSS Europe 2020
Timothy Spann
 
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
Timothy Spann
 
A survey on Machine Learning In Production (July 2018)
Arnab Biswas
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Otávio Carvalho
 
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann
 
CoC23_ Looking at the New Features of Apache NiFi
ssuser73434e
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Flink Forward
 
Building Your Data Streams for all the IoT
DevOps.com
 
Edge to ai analytics from edge to cloud with efficient movement of machine data
Timothy Spann
 
Enterprise IIoT Edge Processing with Apache NiFi
Timothy Spann
 
WarsawITDays_ ApacheNiFi202
Timothy Spann
 
Nifi workshop
Yifeng Jiang
 
Using apache mx net in production deep learning streaming pipelines
Timothy Spann
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Anant Corporation
 
Machine Learning Orchestration with Airflow
Anant Corporation
 
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 
Automate your data flows with Apache NIFI
Adam Doyle
 
Hail hydrate! from stream to lake using open source
Timothy Spann
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
PPT
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
PDF
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
PDF
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
PPTX
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
PPTX
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
PPTX
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
PPTX
HBase in Practice
DataWorks Summit/Hadoop Summit
 
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
PPTX
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 
Ad

Recently uploaded (20)

PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 

Machine Learning in the IoT with Apache NiFi

Editor's Notes

  • #4: We are trying to improve the efficiency in power networks. About 20% at the time of the first power station – still only about 25% now.
  • #5: The under-utilisation is even worse for renewables! Consumers end up footing the bill for this chronic inefficacy.
  • #6: Why? Because we think we can’t control demand, so we have to over-supply in case of spikes…
  • #7: Let’s control demand!
  • #8: The gateway contains hard-coded information on the assets it controls, sensors that help it tell when the asset has stored energy and constraints (such as peak tariff avoidance), enabling it to dispatch them when grid frequency is too low or too high.
  • #9: The aggregation means that each asset need not be a proportional control to grid frequency, but remains free to perform operational duties 94% of the time – our service is invisible to the end customer (except for the monthly checks).
  • #10: Dynamic Demand can deliver approx £85,000 per MW/Yr FCDM / Static FFR £22,000 - £26,000 per MW/Yr STOR - £10,000 - £15,000 per MW/Yr
  • #11: - Open Energi is turning the energy system on it’s head, so that instead of supply adjusting to meet demand, demand adjusts to meet supply By harnessing small amounts of flexible energy demand from energy-intensive equipment we can create a virtual power station and displace fossil-fuelled peaking power stations This is enabling a user-led transformation in how our energy system works, so that businesses and consumers are not only making it happen, but also seeing the benefits It’s a vital part of our transition to a zero carbon economy because we cannot maximise our use of renewables unless our demand for energy becomes more responsive
  • #12: Basically, we’re 20x cheaper than building a new power station because we just tap into existing infrastructure.
  • #13: This is not huge data on its own, but Low latency requirement for aggregations One message can feed into multiple streams
  • #15: There are ongoing discussions to improve flow testing, CI and source control.
  • #20: This is only one half of the flow!
  • #23: To the third point, using Nifi gives better traceability, instantaneous feedback on pipeline health (i.e. metrics) and a simple UI.
  • #25: Timeframe for all this – minutes to hours.
  • #26: As an output of the PostHTTP processor we get not only the forecast but also expectation of error. We keep track of the accumulated square error, so that we can have a single “reduceable” map key in the distributed cache.
  • #29: Nifi cluster – 5 nodes, 28 flows Flink cluster – 4 nodes, 16 jobs Persistence – Azure
  • #31: In blue – tools used primarily by data science team. In grey – tools used primarily by software team. Others – shared infrastructure.
  • #32: Nifi Auditability Shallow learning curve (easy to use) Nice UI Flink Ultimate control Windowing Steeper learning curve