SlideShare a Scribd company logo
Testing Safety Critical Systems Theory and Experiences 2 May 2011 Jaap van Ekris
Jaap van Ekris
Some people live on the edge… How would you feel if you were getting ready to launch and knew you were sitting on top of two million parts -- all built by the lowest bidder on a government contract. John Glenn
Actually, we all do…
Agenda The challenge Process and Organization System design Verification Techniques Trends Reality
THE CHALLENGE Why is testing safety critical systems so hard?
Software is dangerous... Capers-Jones: at least 2 high severity errors per 10KLoc Industry concensus is that software will never be more reliable than 10 -5  per usage 10 -9  per operating hour
We even accept loss... Lost/misdirected luggage: Chance of failure 10 -2  per suitcase Airplane: Chance of failure 10 -8  per flight hour Storm Surge Barrier: Chance of failure 10 -7  per usage Nuclear power plant: As Low As  Reasonably  Possible (ALARP)
The value of testing Program testing can be used to show the presence of bugs, but never to show their absence! Edsger W. Dijkstra
PROCESS AND ORGANIZATION Who does what in safety critical software development?
IEC 61508: Safety Integrity Level and acceptable risk
IEC61508: Risk distribution
IEC 61508: A process for safety critical functions
SYSTEM DESIGN What do safety critical systems look like?
A short introduction into storm surge barriers…
Design Principles Keep it simple... Risk analysis drives design (decissions) Safety first (production later) Fail-to-safe There shall be no single source of (catastrophic) failure
A simple design of a storm surge barrier Relais (€10,00/piece) Waterdetector (€17,50) Design documentation (Sponsored by Heineken)
Risk analysis Relais failure Chance : small Cause : aging Effect : catastophic Waterdetector fails Change : Huge Oorzaken : Rust, driftwood, seaguls (eating, shitting) Effect : Catastophic Measurement errors Chance : Collossal Causes : Waves, wind Effect : False Positive Broken cable Chance : Medium Cause : digging, seaguls Effect : Catastophic
System Architecture
Risk analysis
Typical risks identified Components making the wrong decissions Power failure Hardware failure of PLC’s/Servers Network failure Ship hitting water sensors Human maintenance error
Risk  ≠  system crash Wrongful functional behaviour Data accuracy Lack of response speed Understandability of the GUI Tolerance towards unlogical inputs
Systems do misbehave...
Risks can be external as well
Nihilating risk isn’t the goal… No matter how well the environment analysis has been: Some scenarios will be missed Some scenarios are too expensive to prevent: Accept risk Communicate to stakeholders
Risk reality does change over time...
9/11... Really tested our “test abortion” procedure Introduced a fundamental new risk to ATC systems Changed the ATC system dramatically Doubled our testcases overnight
Stuur X : Component architecture design
Stuur x ::Functionality, initial global design Init Start_D “ Start” signal to Diesels Wacht Waterlevel < 3 meter Waterlevel> 3 meter W_O_D “ Diesels ready” Sluit_? “ Close Barrier” Waterlevel
Stuur x ::Functionality, final global design
Stuur x ::Functionality, Wait_For_Diesels, detailed design
VERIFICATION What is getting tested, and how?
The end is nigh...
Challenge: time and resource limitations 64 bits input isn’t that uncommon 2 64  is the global rice production in 1000 years, measured in individual grains Fully testing all binary inputs on a 64-bits stimilus response system takes 2 centuries
Goals of testing safety critical systems Verify correct functional safety-behaviour Verify safety-behaviour during degraded and failure conditions
An example of safety critical components
IEC 61508 SIL4: Required verification activities
Design Validation and Verification Peer reviews by System architect 2 nd  designer Programmers Testmanager  system testing Fault Tree Analysis / Failure Mode and Effect Analysis Performance modeling Static Verification/ Dynamic Simulation by (Twente University)
Programming (in C/C++) Coding standard: Based on “Safer C”, by Les Hutton May only use safe subset of the compiler Verified by Lint and 5 other tools Code is peer reviewed by 2 nd  developer Certified and calibrated compiler
Unit tests Focus on conformance to specifications Required coverage: 100% with respect to: Code paths Input equivalence classes Boundary Value analysis Probabilistic testing Execution: Fully automated scripts, running 24x7 Creates 100Mb/hour  of logs and measurement data Upon bug detection 3 strikes is out    After 3 implementation errors it is build by another developer 2 strikes is out    Need for a 2 nd  rebuild implies a redesign by another designer
Representative testing is difficult
Integration testing Focus on Functional behaviour of chain of components Failure scenarios based on risk analysis Required coverage 100% coverage on input classes Probabilistic testing Execution: Fully automated scripts, running 24x7, speed times 10 Creates 250Mb/hour  of logs and measurement data Upon detection Each bug    Rootcause-analysis
Redundancy is a nasty beast You do get functional behaviour of your entire system It is nearly impossible to see if all your components are working correctly
System testing Focus on Functional behaviour Failure scenarios based on risk analysis Required coverage 100% complete environment (simultation) 100% coverage on input classes Execution: Fully automated scripts, running 24x7, speed times 10 Creates 250Mb/hour  of logs and measurement data Upon detection Each bug    Rootcause-analysis
Acceptance testing Acceptance testing Functional acceptance Failure behaviour, all top 50 (FMECA) risks tested A year of operational verification Execution: Tests performed on a working stormsurge barrier Creates 250Mb/hour  of logs and measurement data Upon detection Each bug    Root cause-analysis
GUI Acceptance testing Looking for quality in use for interactive systems Understandability of the GUI Structural investigation of the performance of the system-human interactions Looking for “abuse” by the users Looking at real-life handling of emergency operations
Avalanche testing To test the capabilies of alarming Usually starts with one simple trigger Generally followed by millions of alarms Generally brings your network and systems to the breaking point
Crash and recovery procedure testing Validation of system behaviour after massive crash and restart Usually identifies many issues about emergency procedures Sometimes identifies issues around power supply Usually identifies some (combination of) systems incapable of unattended recovery...
Testing safety critical functions  is  dangerous...
A risk analysis to testing There should always be a way out of a test procedure Some things are too dangerous to test Some tests introduce more risks than they try to mitigate
Root-cause analysis A painfull process, by design Is extremely thorough Assumes that the error found is a symptom of an underlying collection of (process) flaws Searches for the underlying causes for the error, and looks for possible similar errors that might have followed a similar path
Failed gates of a potential deadlock
TRENDS What is the newest and hottest?
Model Driven Design
A real-life example
A root-cause analysis of this flaw
REALITY What are the real-life challenges of a testmanager of safety critical systems?
Testing in reality
It requires a specific breed of people The faiths of developers and testers are linked to safety critical systems into eternity
Conclusions Stop reading newspapers Safety Critical Testing is a lot of work, making sure nothing happens Technically it isn’t that much different, we’re just more rigerous and use a specific breed of people....
Safeguarding life, property  and the environment www.dnv.com

More Related Content

What's hot (20)

PDF
DefCon_2015_Slides_Krotofil_Larsen
Marina Krotofil
 
PPT
Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)
Jaap van Ekris
 
PDF
presentation_sas2016_V3
Marina Krotofil
 
PDF
If I Were MITRE ATT&CK Developer: Challenges to Consider when Developing ICS ...
Marina Krotofil
 
PDF
Mission Impact Assessment for Industrial Control Systems
Marina Krotofil
 
PPT
Safety system
jafarhosseini123
 
PPTX
S4x16_Europe_Krotofil
Marina Krotofil
 
PDF
Practical Safety Instrumentation & Emergency Shutdown Systems for Process Ind...
Living Online
 
PDF
Accenture & NextNine – Medium Size Oil & Gas Company Cyber Security Case Study
Honeywell
 
PDF
CS3STHLM_2019_krotofil_kopeytsev
Marina Krotofil
 
PPTX
Sil presentation
Valeriano Barrilà
 
PDF
"Man-in-the-SCADA": Anatomy of Data Integrity Attacks in Industrial Control S...
Marina Krotofil
 
PDF
American Bar Assoc. ISC 2009
infracritical
 
PPT
2010-03-31 - VU Amsterdam - Experiences testing safety critical systems
Jaap van Ekris
 
PDF
Improving SCADA Security
Narinrit Prem-apiwathanokul
 
PPT
Safety Instrumentation
Living Online
 
PDF
35958867 safety-instrumented-systems
Mowaten Masry
 
PPTX
Safety and security in distributed systems
Einar Landre
 
PPTX
Safety and security in mission critical IoT systems
Einar Landre
 
PDF
2016-05-30 risk driven design
Jaap van Ekris
 
DefCon_2015_Slides_Krotofil_Larsen
Marina Krotofil
 
Testing safety critical systems: Practice and Theory (14-05-2013, VU Amsterdam)
Jaap van Ekris
 
presentation_sas2016_V3
Marina Krotofil
 
If I Were MITRE ATT&CK Developer: Challenges to Consider when Developing ICS ...
Marina Krotofil
 
Mission Impact Assessment for Industrial Control Systems
Marina Krotofil
 
Safety system
jafarhosseini123
 
S4x16_Europe_Krotofil
Marina Krotofil
 
Practical Safety Instrumentation & Emergency Shutdown Systems for Process Ind...
Living Online
 
Accenture & NextNine – Medium Size Oil & Gas Company Cyber Security Case Study
Honeywell
 
CS3STHLM_2019_krotofil_kopeytsev
Marina Krotofil
 
Sil presentation
Valeriano Barrilà
 
"Man-in-the-SCADA": Anatomy of Data Integrity Attacks in Industrial Control S...
Marina Krotofil
 
American Bar Assoc. ISC 2009
infracritical
 
2010-03-31 - VU Amsterdam - Experiences testing safety critical systems
Jaap van Ekris
 
Improving SCADA Security
Narinrit Prem-apiwathanokul
 
Safety Instrumentation
Living Online
 
35958867 safety-instrumented-systems
Mowaten Masry
 
Safety and security in distributed systems
Einar Landre
 
Safety and security in mission critical IoT systems
Einar Landre
 
2016-05-30 risk driven design
Jaap van Ekris
 

Viewers also liked (16)

PPT
Introduction to Functional Safety and SIL Certification
ISA Boston Section
 
PPT
2011-04-29 - Risk management conference - Technische IT risico's in de praktijk
Jaap van Ekris
 
PPTX
2016 02-15 - IASTED Innsbruck 2016 - the role and decompesition of delivery ...
Jaap van Ekris
 
PPTX
What the hack happened to digi notar (28-10-2011)
Jaap van Ekris
 
PPT
2011-03-12 - PDAtotaal Usergroup meeting - Ervaringen met Windows Phone 7 in ...
Jaap van Ekris
 
PDF
2016 11-15 - nvrb - software betrouwbaarheid
Jaap van Ekris
 
PPTX
2010-09-21 - (ISC)2 - Protecting patient privacy while enabling medical re…
Jaap van Ekris
 
PDF
Ruud Dofferhoff - Siemens
Themadagen
 
PDF
Martijn Drost - Pilz
Themadagen
 
PDF
Martin van der Have - RAB
Themadagen
 
PPT
TOPAAS Versie 2.0, een praktische inleiding
Jaap van Ekris
 
PPT
Windows Phone 7 and the cloud, the good, the bad and the ugly (17-06-2011, SDN)
Jaap van Ekris
 
PDF
SIL in the practice: Safety by design
ie-net ingenieursvereniging vzw
 
PDF
Dorner works do-254_information
Annmarie Davidson
 
PDF
ISApaperIEC61508_AMN_Final
Andy Nack
 
PPTX
Jamil R. Mazzawi, Founder and CEO, Optima Design Automation
chiportal
 
Introduction to Functional Safety and SIL Certification
ISA Boston Section
 
2011-04-29 - Risk management conference - Technische IT risico's in de praktijk
Jaap van Ekris
 
2016 02-15 - IASTED Innsbruck 2016 - the role and decompesition of delivery ...
Jaap van Ekris
 
What the hack happened to digi notar (28-10-2011)
Jaap van Ekris
 
2011-03-12 - PDAtotaal Usergroup meeting - Ervaringen met Windows Phone 7 in ...
Jaap van Ekris
 
2016 11-15 - nvrb - software betrouwbaarheid
Jaap van Ekris
 
2010-09-21 - (ISC)2 - Protecting patient privacy while enabling medical re…
Jaap van Ekris
 
Ruud Dofferhoff - Siemens
Themadagen
 
Martijn Drost - Pilz
Themadagen
 
Martin van der Have - RAB
Themadagen
 
TOPAAS Versie 2.0, een praktische inleiding
Jaap van Ekris
 
Windows Phone 7 and the cloud, the good, the bad and the ugly (17-06-2011, SDN)
Jaap van Ekris
 
SIL in the practice: Safety by design
ie-net ingenieursvereniging vzw
 
Dorner works do-254_information
Annmarie Davidson
 
ISApaperIEC61508_AMN_Final
Andy Nack
 
Jamil R. Mazzawi, Founder and CEO, Optima Design Automation
chiportal
 
Ad

Similar to 2011-05-02 - VU Amsterdam - Testing safety critical systems (20)

PPT
Software safety in embedded systems &amp; software safety why, what, and how
bdemchak
 
PPT
metrics.ppt
SRICHARANPONNADA
 
PDF
Safety Critical Research
Jennifer (Hui) Li
 
PPT
Ch9
phanleson
 
PPT
Safety Integrity Levels
Sandeep Patalay
 
DOC
Critical systems specification
Aryan Ajmer
 
PPT
Critical System Specification in Software Engineering SE17
koolkampus
 
PDF
Sean carter dan_deans
NASAPMC
 
PPTX
Scs.pptx repaired
Saransh Garg
 
PDF
Testing safety critical control systems
yvjadi123
 
PDF
Michael.bay
NASAPMC
 
PPTX
Fault-Tree-Analysis for learning and understanding
AbdulMujebRadi
 
PPTX
Safety and security in distributed systems
Einar Landre
 
PDF
Fault tolerance on cloud computing
www.pixelsolutionbd.com
 
PDF
Introduction to Software Testing
Henry Muccini
 
PDF
Presentation
Saleh Aldajah
 
PPT
testsfw2
lmscollaborative2
 
PPT
testsfw7
lmscollaborative2
 
PPT
prova2
mbsoftware
 
Software safety in embedded systems &amp; software safety why, what, and how
bdemchak
 
metrics.ppt
SRICHARANPONNADA
 
Safety Critical Research
Jennifer (Hui) Li
 
Safety Integrity Levels
Sandeep Patalay
 
Critical systems specification
Aryan Ajmer
 
Critical System Specification in Software Engineering SE17
koolkampus
 
Sean carter dan_deans
NASAPMC
 
Scs.pptx repaired
Saransh Garg
 
Testing safety critical control systems
yvjadi123
 
Michael.bay
NASAPMC
 
Fault-Tree-Analysis for learning and understanding
AbdulMujebRadi
 
Safety and security in distributed systems
Einar Landre
 
Fault tolerance on cloud computing
www.pixelsolutionbd.com
 
Introduction to Software Testing
Henry Muccini
 
Presentation
Saleh Aldajah
 
prova2
mbsoftware
 
Ad

More from Jaap van Ekris (14)

PDF
2024-04-10 VU Amsterdam - Testing Safety Critical Systems.pdf
Jaap van Ekris
 
PDF
2021 08-28, QONFEST 2021 - Reliability cenetered maintenance for sleeping giants
Jaap van Ekris
 
PDF
2020 09-08 - sdn - waarom klanten een hekel aan software ontwikkelaars hebben
Jaap van Ekris
 
PDF
2018-11-08 risk and reslience festival
Jaap van Ekris
 
PPT
2015 10-08 Uitwijken, het hoe, waarom en de consequenties
Jaap van Ekris
 
PPTX
Cloud Security (11-09-2012, (ISC)2 Secure Amsterdam)
Jaap van Ekris
 
PPT
2010-04-17 - PDAtotaal Usergroup meeting - Introductie in Windows Phone 7
Jaap van Ekris
 
PPT
2009-07-09 - DNV - Risico en betrouwbaarheid van ICT systemen
Jaap van Ekris
 
PPT
2009-02-18 - IASTED Innsbruck 2009 - Factors in project management influencin...
Jaap van Ekris
 
PPT
2009-02-12 - VU Amsterdam - Customer Satisfaction and dealing with customers ...
Jaap van Ekris
 
PPT
2008-10-09 - Bits and Chips Conference - Embedded Systemen Architecture patterns
Jaap van Ekris
 
PPT
2008-07-15 - (ISC)2 - Mobile Phone Security, you have to let go in order t…
Jaap van Ekris
 
PPT
2008-06-23 - SDN - Kwaliteit van software, wat is dat nu eigenlijk?
Jaap van Ekris
 
PPT
2008-02-14 - IASTED Innsbruck 2008 - Customer Retention and Delivery Quality ...
Jaap van Ekris
 
2024-04-10 VU Amsterdam - Testing Safety Critical Systems.pdf
Jaap van Ekris
 
2021 08-28, QONFEST 2021 - Reliability cenetered maintenance for sleeping giants
Jaap van Ekris
 
2020 09-08 - sdn - waarom klanten een hekel aan software ontwikkelaars hebben
Jaap van Ekris
 
2018-11-08 risk and reslience festival
Jaap van Ekris
 
2015 10-08 Uitwijken, het hoe, waarom en de consequenties
Jaap van Ekris
 
Cloud Security (11-09-2012, (ISC)2 Secure Amsterdam)
Jaap van Ekris
 
2010-04-17 - PDAtotaal Usergroup meeting - Introductie in Windows Phone 7
Jaap van Ekris
 
2009-07-09 - DNV - Risico en betrouwbaarheid van ICT systemen
Jaap van Ekris
 
2009-02-18 - IASTED Innsbruck 2009 - Factors in project management influencin...
Jaap van Ekris
 
2009-02-12 - VU Amsterdam - Customer Satisfaction and dealing with customers ...
Jaap van Ekris
 
2008-10-09 - Bits and Chips Conference - Embedded Systemen Architecture patterns
Jaap van Ekris
 
2008-07-15 - (ISC)2 - Mobile Phone Security, you have to let go in order t…
Jaap van Ekris
 
2008-06-23 - SDN - Kwaliteit van software, wat is dat nu eigenlijk?
Jaap van Ekris
 
2008-02-14 - IASTED Innsbruck 2008 - Customer Retention and Delivery Quality ...
Jaap van Ekris
 

Recently uploaded (20)

PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
The Past, Present & Future of Kenya's Digital Transformation
Moses Kemibaro
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
The Past, Present & Future of Kenya's Digital Transformation
Moses Kemibaro
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
The Future of Artificial Intelligence (AI)
Mukul
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 

2011-05-02 - VU Amsterdam - Testing safety critical systems

  • 1. Testing Safety Critical Systems Theory and Experiences 2 May 2011 Jaap van Ekris
  • 3. Some people live on the edge… How would you feel if you were getting ready to launch and knew you were sitting on top of two million parts -- all built by the lowest bidder on a government contract. John Glenn
  • 5. Agenda The challenge Process and Organization System design Verification Techniques Trends Reality
  • 6. THE CHALLENGE Why is testing safety critical systems so hard?
  • 7. Software is dangerous... Capers-Jones: at least 2 high severity errors per 10KLoc Industry concensus is that software will never be more reliable than 10 -5 per usage 10 -9 per operating hour
  • 8. We even accept loss... Lost/misdirected luggage: Chance of failure 10 -2 per suitcase Airplane: Chance of failure 10 -8 per flight hour Storm Surge Barrier: Chance of failure 10 -7 per usage Nuclear power plant: As Low As Reasonably Possible (ALARP)
  • 9. The value of testing Program testing can be used to show the presence of bugs, but never to show their absence! Edsger W. Dijkstra
  • 10. PROCESS AND ORGANIZATION Who does what in safety critical software development?
  • 11. IEC 61508: Safety Integrity Level and acceptable risk
  • 13. IEC 61508: A process for safety critical functions
  • 14. SYSTEM DESIGN What do safety critical systems look like?
  • 15. A short introduction into storm surge barriers…
  • 16. Design Principles Keep it simple... Risk analysis drives design (decissions) Safety first (production later) Fail-to-safe There shall be no single source of (catastrophic) failure
  • 17. A simple design of a storm surge barrier Relais (€10,00/piece) Waterdetector (€17,50) Design documentation (Sponsored by Heineken)
  • 18. Risk analysis Relais failure Chance : small Cause : aging Effect : catastophic Waterdetector fails Change : Huge Oorzaken : Rust, driftwood, seaguls (eating, shitting) Effect : Catastophic Measurement errors Chance : Collossal Causes : Waves, wind Effect : False Positive Broken cable Chance : Medium Cause : digging, seaguls Effect : Catastophic
  • 21. Typical risks identified Components making the wrong decissions Power failure Hardware failure of PLC’s/Servers Network failure Ship hitting water sensors Human maintenance error
  • 22. Risk ≠ system crash Wrongful functional behaviour Data accuracy Lack of response speed Understandability of the GUI Tolerance towards unlogical inputs
  • 24. Risks can be external as well
  • 25. Nihilating risk isn’t the goal… No matter how well the environment analysis has been: Some scenarios will be missed Some scenarios are too expensive to prevent: Accept risk Communicate to stakeholders
  • 26. Risk reality does change over time...
  • 27. 9/11... Really tested our “test abortion” procedure Introduced a fundamental new risk to ATC systems Changed the ATC system dramatically Doubled our testcases overnight
  • 28. Stuur X : Component architecture design
  • 29. Stuur x ::Functionality, initial global design Init Start_D “ Start” signal to Diesels Wacht Waterlevel < 3 meter Waterlevel> 3 meter W_O_D “ Diesels ready” Sluit_? “ Close Barrier” Waterlevel
  • 30. Stuur x ::Functionality, final global design
  • 31. Stuur x ::Functionality, Wait_For_Diesels, detailed design
  • 32. VERIFICATION What is getting tested, and how?
  • 33. The end is nigh...
  • 34. Challenge: time and resource limitations 64 bits input isn’t that uncommon 2 64 is the global rice production in 1000 years, measured in individual grains Fully testing all binary inputs on a 64-bits stimilus response system takes 2 centuries
  • 35. Goals of testing safety critical systems Verify correct functional safety-behaviour Verify safety-behaviour during degraded and failure conditions
  • 36. An example of safety critical components
  • 37. IEC 61508 SIL4: Required verification activities
  • 38. Design Validation and Verification Peer reviews by System architect 2 nd designer Programmers Testmanager system testing Fault Tree Analysis / Failure Mode and Effect Analysis Performance modeling Static Verification/ Dynamic Simulation by (Twente University)
  • 39. Programming (in C/C++) Coding standard: Based on “Safer C”, by Les Hutton May only use safe subset of the compiler Verified by Lint and 5 other tools Code is peer reviewed by 2 nd developer Certified and calibrated compiler
  • 40. Unit tests Focus on conformance to specifications Required coverage: 100% with respect to: Code paths Input equivalence classes Boundary Value analysis Probabilistic testing Execution: Fully automated scripts, running 24x7 Creates 100Mb/hour of logs and measurement data Upon bug detection 3 strikes is out  After 3 implementation errors it is build by another developer 2 strikes is out  Need for a 2 nd rebuild implies a redesign by another designer
  • 42. Integration testing Focus on Functional behaviour of chain of components Failure scenarios based on risk analysis Required coverage 100% coverage on input classes Probabilistic testing Execution: Fully automated scripts, running 24x7, speed times 10 Creates 250Mb/hour of logs and measurement data Upon detection Each bug  Rootcause-analysis
  • 43. Redundancy is a nasty beast You do get functional behaviour of your entire system It is nearly impossible to see if all your components are working correctly
  • 44. System testing Focus on Functional behaviour Failure scenarios based on risk analysis Required coverage 100% complete environment (simultation) 100% coverage on input classes Execution: Fully automated scripts, running 24x7, speed times 10 Creates 250Mb/hour of logs and measurement data Upon detection Each bug  Rootcause-analysis
  • 45. Acceptance testing Acceptance testing Functional acceptance Failure behaviour, all top 50 (FMECA) risks tested A year of operational verification Execution: Tests performed on a working stormsurge barrier Creates 250Mb/hour of logs and measurement data Upon detection Each bug  Root cause-analysis
  • 46. GUI Acceptance testing Looking for quality in use for interactive systems Understandability of the GUI Structural investigation of the performance of the system-human interactions Looking for “abuse” by the users Looking at real-life handling of emergency operations
  • 47. Avalanche testing To test the capabilies of alarming Usually starts with one simple trigger Generally followed by millions of alarms Generally brings your network and systems to the breaking point
  • 48. Crash and recovery procedure testing Validation of system behaviour after massive crash and restart Usually identifies many issues about emergency procedures Sometimes identifies issues around power supply Usually identifies some (combination of) systems incapable of unattended recovery...
  • 49. Testing safety critical functions is dangerous...
  • 50. A risk analysis to testing There should always be a way out of a test procedure Some things are too dangerous to test Some tests introduce more risks than they try to mitigate
  • 51. Root-cause analysis A painfull process, by design Is extremely thorough Assumes that the error found is a symptom of an underlying collection of (process) flaws Searches for the underlying causes for the error, and looks for possible similar errors that might have followed a similar path
  • 52. Failed gates of a potential deadlock
  • 53. TRENDS What is the newest and hottest?
  • 56. A root-cause analysis of this flaw
  • 57. REALITY What are the real-life challenges of a testmanager of safety critical systems?
  • 59. It requires a specific breed of people The faiths of developers and testers are linked to safety critical systems into eternity
  • 60. Conclusions Stop reading newspapers Safety Critical Testing is a lot of work, making sure nothing happens Technically it isn’t that much different, we’re just more rigerous and use a specific breed of people....
  • 61. Safeguarding life, property and the environment www.dnv.com

Editor's Notes

  • #3: Copyright CIBIT Adviseurs|Opleiders 2005 Jaap van Ekris, Veiligheidskritische systemen Werkveld: Kerncentrales Luchtverkeersleiding Stormvloedkeringen Fouten kosten veel mensenlevens
  • #4: Voordeel van Glen was dat het maar 1 keer hoefde te werken...... En dat waren de 60er jaren (toen kon dat nog), en astronauten hadden nog lef Bron: http://www.historicwings.com/features98/mercury/seven-left-bottom.html
  • #5: When I started my career, my mentor told me: “From now on, your goal is to stay off the frontpage of the newspapers” I can tell you it is hard, but so far I’ve succeeded ... ALMOST: T5
  • #8: Maar we leven (onwetend) nog steeds in die wereld..... 10 June 2011
  • #9: Please note that these failure rates include electromechanical failure as well!! Electrocution by a light switch: Change of 10 -5 per usage 10 June 2011
  • #10: Voordeel van Glen was dat het maar 1 keer hoefde te werken...... Bron: http://www.historicwings.com/features98/mercury/seven-left-bottom.html
  • #14: FTA en FMEA zijn tegenpolen, goede controlemechanismen van elkaar (NASA) Alhoewel NASA geen feilloos trackrecord heeft….
  • #16: Doel: mag maar eens in de 10.000 jaar
  • #21: Je begint met je primary concern Proces is simpel: je hakt je probleem zover op todat je die 2 miljoen onderdelen hebt, en je weet wat de bijdrage is van elke component Je pakt de belangrijkste 10, of 100 en neemt gericht maatregelen
  • #25: Tickles security: hard van buiten, boterzacht van binnen
  • #31: Als we rekening gaan houden met deadlocks en redundantie ziet ons plaatje er zo uit: niet echt simpel meer……
  • #32: There is a bug in this one: this code is NOT fail-safe because it has a potential catastrophic deadlock (when the Diesels don’t report Ready)..... 10 June 2011
  • #34: Please be reminded: the presented code has a deadlock! 10 June 2011
  • #39: Do you know the difference between validation and verification? Validation = meets external expectations, does what it is supposed to do Verification = meets internal expectations, conforming to specs 10 June 2011
  • #47: Funny example: printing screen using collosal bitmaps to postscript printer, instead of small vector drawings....
  • #49: Most beautifull example: UPSes using too much power to charge, killing all fuses.... Current example: found out that identity management server was a single point of failure... 10 June 2011
  • #56: This is functional nonsense: DirMsgResponse is sent to the output, whatever what. 10 June 2011
  • #60: Dijkstra put mathematicians in the line of ships, just to remind them of the danger: a practice still used by Boeing and Airbus (maiden flight) Testers, like John Glenn actually was, put their life on the line each and every time At eurocontrol, each bug had a bodycount attachted to it..... When a system fails in production, it is actual blood on our hands I lose about a collegue a year Quit when you think it is routine.....
  • #62: 10 June 2011