Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing

WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics

Pranav Prakash, Quartic.ai
Application and challenges of streaming
analytics and machine learning on multi-variate
time series data for smart manufacturing
#UnifiedDataAnalytics #SparkAISummit

Pranav Prakash
• Co-Founder, VP Engineering at
Quartic.ai
• Ex- LinkedIn SlideShare
• Passionate about
– A.I., Computer Vision, 3D
Printing
– Music, Caffeine
3

What
you’ll
learn in
next 40
mins
4
A cool startup
solving some real-
life use cases
Downtime
Reduction use
case of a critical
asset in Pharma
world
•And a “secret” to
solve such problems
Challenges in
Industrial Stream
Processing
Spark specific stuff
that we learned

We enable Industry 4.0
• AI powered smart manufacturing platform
• Processing Billions of sensor data every
day
• Work with top Pharma companies on
multiple use cases
• Team of 22 techies including Engineers &
Data Scientists + 4 Domain Veterans
#UnifiedDataAnalytics #SparkAISummit 5

We started by
building
solutions for
pharmaceutical
manufacturing
And created a
DIY platform
• Increased uptime of sterilization autoclave by 7 days
• Increased yield of protein from fermentation process
• Incubated egg harvester – increase uptime during
critical flu season
• Cold-chain monitoring for pharma refrigeration –
reduced downtime and waste
• Predictive health monitoring of air handlers for clean
rooms in pharma
• Enable continuous validation of biologic production
process
• Medical Device Assembly – reduce recalls caused by
poor quality.

Case study – an Intelligent
Asset Health Monitoring system
for an Industrial Autoclave
• Mission - Improve the
reliability of a complex asset.
• Details - 13 differentmodes
(cycles)
• Runs 24/7
• CriticalAsset

Equipment
Reliability
• Capture process, condition data
• Establish baseline and measure
deviations
• Forecast the future
• Classify errors early
• “Advisory Mode” AI

SCADA = Supervisory Control and Data Acquisition
PLC = Programable Logic Controller

System
Design
Params
• Data
– Speed: 10ms – 2 hours
– Volume: Couple 1,000s sensors per
asset. 10,000s of asset per
enterprise
– Data Type: String, Numeric,
Boolean, Array
– Timeseries, Discrete

System
Design
Params
• Deployment
– Edge (80%)
• Hardware Limit
• Many cloud-only solutions won’t
work
• High Uptime, Low Response
Time
– Cloud (20%)

System
Design
Params
• Use Cases
– Automatic Model Param Tuning,
Model Training
– 1000s of ML Models Deployment
– Complex Event Processing (CEP)
– Statistical & Analytical Processing
• Rule Recommendation
• Near Real Time Stream Processing

Challenges
• ML
– Multiple granularities
– Late Data Arrival
– Model Deployment on a
heterogenous data stream
– Flash Flood of Data

Multiple Granularities
15
TS Sensor A Sensor B
12:03:01.198
12:03:02.283
12:03:03.316
12:03:04.572
12:03:05.283
12:03:06.342
TS Sensor C Sensor D
12:03:01.230
12:03:06.233
12:03:11.316
12:03:16.520
12:03:21.283
- Both belong to same “Asset”
- Target Feature – C/D or A/B
Poll Frequency = 1s Poll Frequency = 5s

Multiple
Granularities
• Approximation (Roundoff)
• Aggregation
• Filling - Forward or Backward or
Average

Late Data Arrival
17
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking

Late Data
Arrival
• Watermarking
– Homogenous stream: One
watermark per Stream
– Heterogenous stream: multiple
watermark per “Usage
Condition”

- Watermarking time automatically
and dynamically chosen
- Data later than threshold is
discarded

Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing

Flash Flood
of Data
• Backpressure enabled
• Allows Ingestion rate to be chosen dynamically and
automatically
• PID Controller
22

Complex
Event
Processing
• Insights
– PySpark + yahoo/graphkit
• Rules
– Scala Spark + drools

Summing
it up
• Industrial IoT is different
• Context = Process Data + Condition Data
• Techniques for processing heterogenous
stream

We’re hiring
2
5
helloworld@quartic.ai

DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing

Recommended

More Related Content

What's hot (20)

Similar to Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing (20)

More from Databricks (20)

Recently uploaded (20)

Application and Challenges of Streaming Analytics and Machine Learning on Multi-Variate Time Series Data for Smart Manufacturing