SlideShare a Scribd company logo
Providentia Worldwide
S. Ryan Quick @phaedo, Providentia Worldwide. April 2020
HPC Impact
EDA Telemetry Neural Networks
Systems Intelligence
Ecosystem Management
Providentia Worldwide
Systems
Intelligence
Principles
Methodology for leveraging
multiple data domains
through complex data
processing
Disparate / Unlike Domains
Messaging Middleware
Insight
Insight
Providentia Worldwide
• Aggregation

• Event Statistics

• Atomic Pattern Recognition
• Simple example shown as “waterfalling” for
illustration — the operations are parallel and
stateless

• Pattern is an example of the type and method
of telemetry we use for EDA environmental and
in-workload collection to feed AI and neural
networks inline

• There are literally thousands of metrics for a
single operation, millions per job
Multiple-Domain Simple
Data Access
Metrics Calculator
CPU
Event
Source
app login r/sec
app successful login r/sec
app failed login r/sec
cpu 1m load avg
cpu 5m load avg
cpu 15m load avg
cpu blocked proc cnt
cpu running proc cnt
cpu waiting proc cnt
cpu user %
cpu idle %
cpu system %
cpu io wait %
db active queries
db slow queries
db selects
db updates
db deletes
db rows fetched
db table locks held
db row locks held
Available Source Fields App
Login
Event
Source
DB
Access
Event
Source
> 3?
app failed login /
app success
login * 100
AVG(cpu waiting /
cpu running)) / cpu
1M load avg * 100
> 0.5?
DB Slow
Queries
> 4?
Anomaly Detected:
Potential Login
Attack
yes
yes
yes
Providentia Worldwide
• Affinity + Simple Case

• Stream + Augmented Datasource

• Parallel Stream
• Frequency-Shifted Stream

• “Correlative/Normalized View”: Similar to a SQL “join”
concept, we relate data fields in disparate stream sources

• Many examples — for other talks :)

• This illustrates the mechanisms by which we can combine
and augment data types for complex events in AI/neural
networks and utilize inline training and active models.

• Also allows us to introduce the notion of insight, which is
crucial to incremental improvement model — especially
for “slight touch ecosystems” like coral reefs
Multiple-Domain Complex
Event Processing
Approaches
Complex Event Processor
CPU
Source
Zookeeper
Source
RabbitMQ
Source
Application
Event
Source
Parallel Source
Disparate
Normalization
Correlative/
Normalized
View
Correlative/
Normalized
View
Correlative/
Normalized
View
approx-data-sz
avg-latency
ephemeral-count
followers
max-fd-cnt
max-latency
min-latency
open-fd-cnt
num-alive-connections
outstanding-requests
packets-received
packets-sent
pending-syncs
synced-followers
watch-cnt
znode-cnt
Zookeeper
message total
message ready
message unasked
rate.publish
rate.deliver
rate.redeliver
rate.confirm
rate.ack
connection.total
connection.idle
channel.total
channel.publisher
channel.consumer
channel.duplex
channel.inactive
exchange.rate.phaedo
q.total
q.idle
q.messages.phaedo
q.consumers.phaedo
q.memory.phaedo
q.ingress.phaedo
q.egress.phaedo
binding.total
RabbitMQ
Providentia Worldwide
Semiconductor EDA
Designing the Digital Future
Providentia Worldwide
HPC HTC
• “High Throughput Computing”

• Very predictable, common engineering pipeline

• Toolset geared to repeat the steps in the pattern
100s, 1000s of times per iteration, per engineer
constantly. Each adjustment cascades hundreds/
thousands of small jobs.

• Jobs are very short lived. Avg time on single core is
under 3s. Job scheduler itself is often a
bottleneck on large, shared systems.

• EDA requires multiple phases of HDL synthesizers
and HLL compilers and so can result in different
sorts of computational bottlenecks at different
phases of the pipeline as well as resulting for
different design choices in the engineering
decisions.
EDA Characteristics
Providentia Worldwide
Well-established Sector
• Traditional enterprise storage (NFS3)

• 10-100M small <=1M files/dir)

• user and group based access controls

• POSIX, locking not required

• OS scheduler is often sufficient. Sometimes,
job submission separated by login node.

• License model well understood, and generally
by core or time-based. Codes are generally
proprietary.

• Turnkey deployment is up and running in
minutes on nearly any sized system. Very little
motivation to alter the status quo.
EDA Characteristics
Providentia Worldwide
What Would it Take to Try something new?
• All on-prem, w/ cloud tests successful
but not adopted:

• too costly

• intellectual property concerns

• ROI delayed

• data management difficulties

• Storage enhancements show
improvements, and large shops adopt
those, but NFS3 performs well for
most small-medium practitioners.
EDA Environments
Providentia Worldwide
What Would it Take to Try something new?
• EDA Process is well-known, easy-to-
hire to, and well-understood in the
industry. Why rock the boat?

• Any perturbations to the system
would need to overcome the cost of
change, which in semiconductor
fabrication can be immense.

• Even where bottlenecks are known
(storage, compute, scheduling), they
are understood and manageable.
New is new and unpredictable with
unknown value…
EDA Pipelines at Scale?
Providentia Worldwide
For valuable and motivational change in
semiconductor EDA, we need disruption both
in behavior and environment simultaneously.
Providentia Worldwide
External focus for HTC/Systems Intelligence
• Two primary mechanisms for
augmenting the EDA process:

Internally (inside the EDA
pipeline).

Externally (augmenting and
enhancing the pipelining
environment). 

We are focusing here for this
project, but the usual neural
network caveats apply.
Neural Networks for EDA Pipelines
Semiconductor Electronic Design Automation
«precondition» API to workflow data
Chip Specification
Design entry/Functional verification
RTL synthesis
Partitioning of chip
Design for test (DFT) insertion
Floor planning
Placement stage
Clock tree synthesis (CTS)
Routing stage
Final verification
GDS II
Infrastructure Automation
«precondition» API to all components
«precondition» API backwards compatible
Systems Provisioning
Network Provisioning
Application Deployment
Configuration Management
Platform Management
Change Orchestration
capabilities
XY
User/group file CRUD
Workflow scheduling
Job management
License management
sd Systems Intelligence — EDA Messaging Substrate
Data Analytics Command & Control
Internal
External
Providentia Worldwide
Semiconductor EDA
Designing the Digital Future
“When we think of sensing technologies as devices
that order the world, rather than devices that describe
it, then alternative relationships between the social and
the technical are strikingly brought to light.”
— Genevieve Bell (Intel) @feraldata
Providentia Worldwide
EDA Workflow and Supporting Infrastructure SI Messaging
XY
User/group file CRUD
Workflow scheduling
Job management
License management
X
Y
sd Systems Intelligence — EDA Messaging Substrate
C
E
P
I
n
g
e
s
t
Data Analytics
inline models
offline models
Atomic Pattern
Recognition
Parallel Stream
Command & Control
Stream Augmentation
data/scores/metrics
decisioning
orchestration
validation
feedback
Frequency-Shifted
Streams
Affinity Streams
Aggregation/ Statistics
Semiconductor Electronic Design Automation
«precondition» API to workflow data
Chip Specification
Design entry/Functional verification
RTL synthesis
Partitioning of chip
Design for test (DFT) insertion
Floor planning
Placement stage
Clock tree synthesis (CTS)
Routing stage
Final verification
GDS II
Infrastructure Automation
«precondition» API to all components
«precondition» API backwards compatible
Systems Provisioning
Network Provisioning
Application Deployment
Configuration Management
Platform Management
Change Orchestration
capabilities
XY
User/group file CRUD
Workflow scheduling
Job management
License management
X
Y
sd Systems Intelligence — EDA Messaging Substrate
C
E
P
I
n
Data Analytics
inline models
offline models
Atomic Pattern
Recognition
Command & Control
Stream Augmentation
data/scores/metrics
decisioning
orchestration
External Capabilities and Infrastructure
EDA SI Messaging Substrate
Insight
Insight
Providentia Worldwide
EDA Workflow and AI/NN Frameworks
Semiconductor Electronic Design Automation
«precondition» API to workflow data
Chip Specification
Design entry/Functional verification
RTL synthesis
Partitioning of chip
Design for test (DFT) insertion
Floor planning
Placement stage
Clock tree synthesis (CTS)
Routing stage
Final verification
GDS II
Infrastructure Automation
«precondition» API to all components
«precondition» API backwards compatible
Systems Provisioning
Network Provisioning
Application Deployment
Configuration Management
Platform Management
Change Orchestration
capabilities
XY
User/group file CRUD
Workflow scheduling
Job management
License management
X
Y
sd Systems Intelligence — EDA Messaging Substrate
C
E
P
I
n
Data Analytics
inline models
offline models
Atomic Pattern
Recognition
Command & Control
Stream Augmentation
data/scores/metrics
decisioning
orchestration
GDS II
XY
User/group file CRUD
Workflow scheduling
Job management
License management
sd Neural Networks
sd Messaging-Based Machine Learning / AI / Neural Networks Workflow
Data Analytics and
Normalization
Reactive Systems
scoring/metrics
decisioning
orchestration
validation
feedback
inline learning models
Clustering,
Classification, Decision
Trees
Insight
Consumers
Ecosystem Insight and
KPI Enhancements
Ecosystem Messaging Platform
Pattern Enhancements
ModelRunModelTraining
Offline / replay learning models
CEP/INGESTfromExisting
Datasources
X
Y
Y
X
External Capabilities and Infrastructure
EDA ML / AI / NN Workflow
SIMessagingSubstrate
Insight
Insight
Insight
Providentia Worldwide
Unique position for AI and NN
Why Artificial Intelligence/Neural Networks for this Problem?
• Small, incremental human-driven changes are not cost-effective in
today’s DevOps systems

• Continuous observation for “minority report” style changes is difficult
to design sprints and test efficacy, even harder to measure ROI

• Command and control systems can be designed to allow incremental
change directly from NNs based on deployments — e.g. allow each
“reef” to tune itself based on its own ecosystem

• The “show your work”/“show your rationale” problems are weaker in
EDA compared to delivering results than in other domains
Providentia Worldwide
Insight: “looking inward”
Insight provides a mechanism for self-tuning behavior of the running system at all
levels:

•algorithms, models, data access, expert systems, KPIs, behaviors, reports,
accuracy, efficiency, even insight itself

•In-built feedback mechanism for capturing behavior and performance

•Mechanism to ensure that changes over time are accounted for and noticed if not
understood

•Allows for inline and ongoing training without having to maintain offline (and
outdated) training datasets

•Allows for locale-specific NN training (the NN-locale problem).
Providentia Worldwide
Program Status
Where are we now?
• Telemetry data from workload systems feeding messaging platform

• Synthetic workload (provided from partner benchmarking suite) being modified for user-
emulation

• NN specific topology choice and models under discussion with wider team considering
we will need to utilize simultaneous learning, model promotion, results propagation, etc.

• Insight mechanisms are developed in the messaging substrate automatically, with
common APIs available to higher level structures. Common reporting in dashboards etc.

• Always looking for helpers to take things farther — will report more later as we
(un)shelter…

More Related Content

What's hot (20)

PDF
Building the SD-Branch using uCPE
Michelle Holley
 
PDF
InfiniBand In-Network Computing Technology and Roadmap
inside-BigData.com
 
PDF
Operationalizing SDN
ADVA
 
PDF
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Facultad de Informática UCM
 
PPTX
Akraino and Edge Computing
Liz Warner
 
PDF
Enabling MEC as a New Telco Business Opportunity
Michelle Holley
 
PDF
Introduction to container networking in K8s - SDN/NFV London meetup
Haidee McMahon
 
PPSX
Development, test, and characterization of MEC platforms with Teranium and Dr...
Michelle Holley
 
PPT
State Of FPGA: Current & Future - A Panel discussion @ 4th FPGA Camp
FPGA Central
 
PPTX
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
Ganesan Narayanasamy
 
PPTX
SDN Service Provider Use Cases
SDxCentral
 
PDF
Your Path to Edge Computing - Akraino Edge Stack Update
Liz Warner
 
PDF
P4/FPGA, Packet Acceleration
Liz Warner
 
PDF
Ligato - A platform for development of Cloud-Native VNF's - SDN/NFV London me...
Haidee McMahon
 
PDF
Create New Value for You - Huawei Agile Network
Huawei Enterprise Hong Kong
 
PPTX
Weaving the Future - Enable Networks to Be More Agile for Services
Huawei Enterprise Hong Kong
 
PDF
Introducing the Vitis Unified Software Platform for Programming FPGAs
inside-BigData.com
 
PDF
Mellanox OpenPOWER features
Ganesan Narayanasamy
 
PDF
SDN/NFV Building Block Introduction
Michelle Holley
 
PDF
FPGAs and Machine Learning
inside-BigData.com
 
Building the SD-Branch using uCPE
Michelle Holley
 
InfiniBand In-Network Computing Technology and Roadmap
inside-BigData.com
 
Operationalizing SDN
ADVA
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Facultad de Informática UCM
 
Akraino and Edge Computing
Liz Warner
 
Enabling MEC as a New Telco Business Opportunity
Michelle Holley
 
Introduction to container networking in K8s - SDN/NFV London meetup
Haidee McMahon
 
Development, test, and characterization of MEC platforms with Teranium and Dr...
Michelle Holley
 
State Of FPGA: Current & Future - A Panel discussion @ 4th FPGA Camp
FPGA Central
 
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
Ganesan Narayanasamy
 
SDN Service Provider Use Cases
SDxCentral
 
Your Path to Edge Computing - Akraino Edge Stack Update
Liz Warner
 
P4/FPGA, Packet Acceleration
Liz Warner
 
Ligato - A platform for development of Cloud-Native VNF's - SDN/NFV London me...
Haidee McMahon
 
Create New Value for You - Huawei Agile Network
Huawei Enterprise Hong Kong
 
Weaving the Future - Enable Networks to Be More Agile for Services
Huawei Enterprise Hong Kong
 
Introducing the Vitis Unified Software Platform for Programming FPGAs
inside-BigData.com
 
Mellanox OpenPOWER features
Ganesan Narayanasamy
 
SDN/NFV Building Block Introduction
Michelle Holley
 
FPGAs and Machine Learning
inside-BigData.com
 

Similar to HPC Impact: EDA Telemetry Neural Networks (20)

PDF
Handling data and workflows in computational materials science: the AiiDA ini...
Research Data Alliance
 
PPTX
The Role of Models in Semiconductor Smart Manufacturing
Kimberly Daich
 
PPTX
Mirabilis_Design AMD Versal System-Level IP Library
Deepak Shankar
 
PPTX
AI in the Enterprise at Scale
Ganesan Narayanasamy
 
PPTX
Addressing Connectivity Challenges of Disparate Data Sources in Smart Manufac...
Kimberly Daich
 
PPTX
Develop High-bandwidth/low latency electronic systems for AI/ML application
Deepak Shankar
 
PDF
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Eugenio Villar
 
PPTX
Smarter Manufacturing with SEMI Standards: Practical Approaches for Plug-and-...
Kimberly Daich
 
PDF
Gartner Top 10 Strategy Technology Trends 2018
Den Reymer
 
PDF
Defining a Practical Path to Artificial Intelligence
Roman Chanclor
 
PDF
Introduction to Event Driven Architecture
CitiusTech
 
PDF
How to create innovative architecture using ViualSim?
Deepak Shankar
 
PDF
How to create innovative architecture using VisualSim?
Deepak Shankar
 
PDF
How to create innovative architecture using VisualSim?
Deepak Shankar
 
PPT
Event Driven Architecture (EDA), November 2, 2006
Tim Bass
 
PPTX
Plenary Session: application drive design alberto sv
chiportal
 
PDF
FPGA Hardware Accelerator for Machine Learning
Dr. Swaminathan Kathirvel
 
PPTX
Connectivity challenges APC Europe by Alan Weber
Kimberly Daich
 
PDF
Calibration of Deployment Simulation Models - A Multi-Paradigm Modelling Appr...
Daniele Gianni
 
PPTX
Mirabilis_Presentation_DAC_June_2024.pptx
Deepak Shankar
 
Handling data and workflows in computational materials science: the AiiDA ini...
Research Data Alliance
 
The Role of Models in Semiconductor Smart Manufacturing
Kimberly Daich
 
Mirabilis_Design AMD Versal System-Level IP Library
Deepak Shankar
 
AI in the Enterprise at Scale
Ganesan Narayanasamy
 
Addressing Connectivity Challenges of Disparate Data Sources in Smart Manufac...
Kimberly Daich
 
Develop High-bandwidth/low latency electronic systems for AI/ML application
Deepak Shankar
 
Tutorial at the European Nanoelectronics Applications, Design & Technology Co...
Eugenio Villar
 
Smarter Manufacturing with SEMI Standards: Practical Approaches for Plug-and-...
Kimberly Daich
 
Gartner Top 10 Strategy Technology Trends 2018
Den Reymer
 
Defining a Practical Path to Artificial Intelligence
Roman Chanclor
 
Introduction to Event Driven Architecture
CitiusTech
 
How to create innovative architecture using ViualSim?
Deepak Shankar
 
How to create innovative architecture using VisualSim?
Deepak Shankar
 
How to create innovative architecture using VisualSim?
Deepak Shankar
 
Event Driven Architecture (EDA), November 2, 2006
Tim Bass
 
Plenary Session: application drive design alberto sv
chiportal
 
FPGA Hardware Accelerator for Machine Learning
Dr. Swaminathan Kathirvel
 
Connectivity challenges APC Europe by Alan Weber
Kimberly Daich
 
Calibration of Deployment Simulation Models - A Multi-Paradigm Modelling Appr...
Daniele Gianni
 
Mirabilis_Presentation_DAC_June_2024.pptx
Deepak Shankar
 
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
inside-BigData.com
 
PPTX
Transforming Private 5G Networks
inside-BigData.com
 
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
PDF
Machine Learning for Weather Forecasts
inside-BigData.com
 
PPTX
HPC AI Advisory Council Update
inside-BigData.com
 
PDF
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
PDF
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
PDF
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
PDF
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
PDF
Overview of HPC Interconnects
inside-BigData.com
 
PDF
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
inside-BigData.com
 
PDF
Data Parallel Deep Learning
inside-BigData.com
 
PDF
Making Supernovae with Jets
inside-BigData.com
 
PDF
Adaptive Linear Solvers and Eigensolvers
inside-BigData.com
 
PDF
Scientific Applications and Heterogeneous Architectures
inside-BigData.com
 
PDF
SW/HW co-design for near-term quantum computing
inside-BigData.com
 
PDF
Deep Learning State of the Art (2020)
inside-BigData.com
 
Major Market Shifts in IT
inside-BigData.com
 
Transforming Private 5G Networks
inside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
inside-BigData.com
 
HPC AI Advisory Council Update
inside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Overview of HPC Interconnects
inside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
inside-BigData.com
 
Data Parallel Deep Learning
inside-BigData.com
 
Making Supernovae with Jets
inside-BigData.com
 
Adaptive Linear Solvers and Eigensolvers
inside-BigData.com
 
Scientific Applications and Heterogeneous Architectures
inside-BigData.com
 
SW/HW co-design for near-term quantum computing
inside-BigData.com
 
Deep Learning State of the Art (2020)
inside-BigData.com
 
Ad

Recently uploaded (20)

PDF
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
PDF
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
PPTX
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pdf
ghjghvhjgc
 
PDF
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pdf
ghjghvhjgc
 
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 

HPC Impact: EDA Telemetry Neural Networks

  • 1. Providentia Worldwide S. Ryan Quick @phaedo, Providentia Worldwide. April 2020 HPC Impact EDA Telemetry Neural Networks
  • 3. Providentia Worldwide Systems Intelligence Principles Methodology for leveraging multiple data domains through complex data processing Disparate / Unlike Domains Messaging Middleware Insight Insight
  • 4. Providentia Worldwide • Aggregation • Event Statistics • Atomic Pattern Recognition • Simple example shown as “waterfalling” for illustration — the operations are parallel and stateless • Pattern is an example of the type and method of telemetry we use for EDA environmental and in-workload collection to feed AI and neural networks inline • There are literally thousands of metrics for a single operation, millions per job Multiple-Domain Simple Data Access Metrics Calculator CPU Event Source app login r/sec app successful login r/sec app failed login r/sec cpu 1m load avg cpu 5m load avg cpu 15m load avg cpu blocked proc cnt cpu running proc cnt cpu waiting proc cnt cpu user % cpu idle % cpu system % cpu io wait % db active queries db slow queries db selects db updates db deletes db rows fetched db table locks held db row locks held Available Source Fields App Login Event Source DB Access Event Source > 3? app failed login / app success login * 100 AVG(cpu waiting / cpu running)) / cpu 1M load avg * 100 > 0.5? DB Slow Queries > 4? Anomaly Detected: Potential Login Attack yes yes yes
  • 5. Providentia Worldwide • Affinity + Simple Case • Stream + Augmented Datasource • Parallel Stream • Frequency-Shifted Stream • “Correlative/Normalized View”: Similar to a SQL “join” concept, we relate data fields in disparate stream sources • Many examples — for other talks :) • This illustrates the mechanisms by which we can combine and augment data types for complex events in AI/neural networks and utilize inline training and active models. • Also allows us to introduce the notion of insight, which is crucial to incremental improvement model — especially for “slight touch ecosystems” like coral reefs Multiple-Domain Complex Event Processing Approaches Complex Event Processor CPU Source Zookeeper Source RabbitMQ Source Application Event Source Parallel Source Disparate Normalization Correlative/ Normalized View Correlative/ Normalized View Correlative/ Normalized View approx-data-sz avg-latency ephemeral-count followers max-fd-cnt max-latency min-latency open-fd-cnt num-alive-connections outstanding-requests packets-received packets-sent pending-syncs synced-followers watch-cnt znode-cnt Zookeeper message total message ready message unasked rate.publish rate.deliver rate.redeliver rate.confirm rate.ack connection.total connection.idle channel.total channel.publisher channel.consumer channel.duplex channel.inactive exchange.rate.phaedo q.total q.idle q.messages.phaedo q.consumers.phaedo q.memory.phaedo q.ingress.phaedo q.egress.phaedo binding.total RabbitMQ
  • 7. Providentia Worldwide HPC HTC • “High Throughput Computing” • Very predictable, common engineering pipeline • Toolset geared to repeat the steps in the pattern 100s, 1000s of times per iteration, per engineer constantly. Each adjustment cascades hundreds/ thousands of small jobs. • Jobs are very short lived. Avg time on single core is under 3s. Job scheduler itself is often a bottleneck on large, shared systems. • EDA requires multiple phases of HDL synthesizers and HLL compilers and so can result in different sorts of computational bottlenecks at different phases of the pipeline as well as resulting for different design choices in the engineering decisions. EDA Characteristics
  • 8. Providentia Worldwide Well-established Sector • Traditional enterprise storage (NFS3) • 10-100M small <=1M files/dir) • user and group based access controls • POSIX, locking not required • OS scheduler is often sufficient. Sometimes, job submission separated by login node. • License model well understood, and generally by core or time-based. Codes are generally proprietary. • Turnkey deployment is up and running in minutes on nearly any sized system. Very little motivation to alter the status quo. EDA Characteristics
  • 9. Providentia Worldwide What Would it Take to Try something new? • All on-prem, w/ cloud tests successful but not adopted: • too costly • intellectual property concerns • ROI delayed • data management difficulties • Storage enhancements show improvements, and large shops adopt those, but NFS3 performs well for most small-medium practitioners. EDA Environments
  • 10. Providentia Worldwide What Would it Take to Try something new? • EDA Process is well-known, easy-to- hire to, and well-understood in the industry. Why rock the boat? • Any perturbations to the system would need to overcome the cost of change, which in semiconductor fabrication can be immense. • Even where bottlenecks are known (storage, compute, scheduling), they are understood and manageable. New is new and unpredictable with unknown value… EDA Pipelines at Scale?
  • 11. Providentia Worldwide For valuable and motivational change in semiconductor EDA, we need disruption both in behavior and environment simultaneously.
  • 12. Providentia Worldwide External focus for HTC/Systems Intelligence • Two primary mechanisms for augmenting the EDA process: Internally (inside the EDA pipeline). Externally (augmenting and enhancing the pipelining environment). We are focusing here for this project, but the usual neural network caveats apply. Neural Networks for EDA Pipelines Semiconductor Electronic Design Automation «precondition» API to workflow data Chip Specification Design entry/Functional verification RTL synthesis Partitioning of chip Design for test (DFT) insertion Floor planning Placement stage Clock tree synthesis (CTS) Routing stage Final verification GDS II Infrastructure Automation «precondition» API to all components «precondition» API backwards compatible Systems Provisioning Network Provisioning Application Deployment Configuration Management Platform Management Change Orchestration capabilities XY User/group file CRUD Workflow scheduling Job management License management sd Systems Intelligence — EDA Messaging Substrate Data Analytics Command & Control Internal External
  • 13. Providentia Worldwide Semiconductor EDA Designing the Digital Future “When we think of sensing technologies as devices that order the world, rather than devices that describe it, then alternative relationships between the social and the technical are strikingly brought to light.” — Genevieve Bell (Intel) @feraldata
  • 14. Providentia Worldwide EDA Workflow and Supporting Infrastructure SI Messaging XY User/group file CRUD Workflow scheduling Job management License management X Y sd Systems Intelligence — EDA Messaging Substrate C E P I n g e s t Data Analytics inline models offline models Atomic Pattern Recognition Parallel Stream Command & Control Stream Augmentation data/scores/metrics decisioning orchestration validation feedback Frequency-Shifted Streams Affinity Streams Aggregation/ Statistics Semiconductor Electronic Design Automation «precondition» API to workflow data Chip Specification Design entry/Functional verification RTL synthesis Partitioning of chip Design for test (DFT) insertion Floor planning Placement stage Clock tree synthesis (CTS) Routing stage Final verification GDS II Infrastructure Automation «precondition» API to all components «precondition» API backwards compatible Systems Provisioning Network Provisioning Application Deployment Configuration Management Platform Management Change Orchestration capabilities XY User/group file CRUD Workflow scheduling Job management License management X Y sd Systems Intelligence — EDA Messaging Substrate C E P I n Data Analytics inline models offline models Atomic Pattern Recognition Command & Control Stream Augmentation data/scores/metrics decisioning orchestration External Capabilities and Infrastructure EDA SI Messaging Substrate Insight Insight
  • 15. Providentia Worldwide EDA Workflow and AI/NN Frameworks Semiconductor Electronic Design Automation «precondition» API to workflow data Chip Specification Design entry/Functional verification RTL synthesis Partitioning of chip Design for test (DFT) insertion Floor planning Placement stage Clock tree synthesis (CTS) Routing stage Final verification GDS II Infrastructure Automation «precondition» API to all components «precondition» API backwards compatible Systems Provisioning Network Provisioning Application Deployment Configuration Management Platform Management Change Orchestration capabilities XY User/group file CRUD Workflow scheduling Job management License management X Y sd Systems Intelligence — EDA Messaging Substrate C E P I n Data Analytics inline models offline models Atomic Pattern Recognition Command & Control Stream Augmentation data/scores/metrics decisioning orchestration GDS II XY User/group file CRUD Workflow scheduling Job management License management sd Neural Networks sd Messaging-Based Machine Learning / AI / Neural Networks Workflow Data Analytics and Normalization Reactive Systems scoring/metrics decisioning orchestration validation feedback inline learning models Clustering, Classification, Decision Trees Insight Consumers Ecosystem Insight and KPI Enhancements Ecosystem Messaging Platform Pattern Enhancements ModelRunModelTraining Offline / replay learning models CEP/INGESTfromExisting Datasources X Y Y X External Capabilities and Infrastructure EDA ML / AI / NN Workflow SIMessagingSubstrate Insight Insight Insight
  • 16. Providentia Worldwide Unique position for AI and NN Why Artificial Intelligence/Neural Networks for this Problem? • Small, incremental human-driven changes are not cost-effective in today’s DevOps systems • Continuous observation for “minority report” style changes is difficult to design sprints and test efficacy, even harder to measure ROI • Command and control systems can be designed to allow incremental change directly from NNs based on deployments — e.g. allow each “reef” to tune itself based on its own ecosystem • The “show your work”/“show your rationale” problems are weaker in EDA compared to delivering results than in other domains
  • 17. Providentia Worldwide Insight: “looking inward” Insight provides a mechanism for self-tuning behavior of the running system at all levels: •algorithms, models, data access, expert systems, KPIs, behaviors, reports, accuracy, efficiency, even insight itself •In-built feedback mechanism for capturing behavior and performance •Mechanism to ensure that changes over time are accounted for and noticed if not understood •Allows for inline and ongoing training without having to maintain offline (and outdated) training datasets •Allows for locale-specific NN training (the NN-locale problem).
  • 18. Providentia Worldwide Program Status Where are we now? • Telemetry data from workload systems feeding messaging platform • Synthetic workload (provided from partner benchmarking suite) being modified for user- emulation • NN specific topology choice and models under discussion with wider team considering we will need to utilize simultaneous learning, model promotion, results propagation, etc. • Insight mechanisms are developed in the messaging substrate automatically, with common APIs available to higher level structures. Common reporting in dashboards etc. • Always looking for helpers to take things farther — will report more later as we (un)shelter…