08 - Installations and Maintenance of Health IT Systems - Unit 9 - Creating Fault-Tolerant Systems, Backups, and Decommissioning - Lecture A
08 - Installations and Maintenance of Health IT Systems - Unit 9 - Creating Fault-Tolerant Systems, Backups, and Decommissioning - Lecture A
Health IT Systems
Creating Fault-Tolerant Systems,
Backups, and Decommissioning
Lecture a
This material Comp8_Unit9a was developed by Duke University, funded by the Department of Health and Human Services,
Office of the National Coordinator for Health Information Technology under Award Number IU24OC000024.
Creating Fault-Tolerant Systems,
Backups, and Decommissioning
Learning Objectives
1. Define availability, reliability, redundancy, and fault
tolerance (Lecture a)
2. Explain areas and outline rules for implementing
fault tolerant systems (Lecture a)
3. Perform risk assessment (Lecture a)
4. Follow best practice guidelines for common
implementations (Lecture b)
5. Develop strategies for backup and restore of
operating systems, applications, configuration
settings, and databases (Lecture c)
6. Decommission systems and data (Lecture c)
2
Health IT Workforce Curriculum
Version 3.0/Spring 2012
Installation and Maintenance of Health IT Systems
Creating Fault-Tolerant Systems, Backups, and Decommissioning
Lecture a
Redundancy and Fault Tolerance
Dependence on EHRs is increasing.
EHR systems require redundant, or failover, resources
and fault tolerance to ensure uptime and data integrity so
that it can perform as specified.
Failure vs fault: fault is the cause of a failure of the
system to comply with its specifications or precise
requirements.
Fault tolerance is resilience in a system, or ability to
continue performing to specification despite problems
Ask vendor how fault tolerance is designed/coded into
the EHR application.
3
Health IT Workforce Curriculum
Version 3.0/Spring 2012
Installation and Maintenance of Health IT Systems
Creating Fault-Tolerant Systems, Backups, and Decommissioning
Lecture a
Creating Fault Tolerance
Redundancy
Secondary or backup
systems
Reliability
Infrequent failure
Redundant components
Availability
Accessible when needed no
downtime
Available systems are reliable
and accessible
Computer hardware
Servers and workstations
Data storage
Hard disks
Network and Power
Network switches and
Internet access
Mains, generators, batteries
Virtualization
Isolation of system from
hardware
4
Health IT Workforce Curriculum
Version 3.0/Spring 2012
Installation and Maintenance of Health IT Systems
Creating Fault-Tolerant Systems, Backups, and Decommissioning
Lecture a
System Failure and Downtime
Forrester Consulting report on server failure during prior two
years:
experienced downtime.
Only 1% of server outages were resolved within five
minutes.
68% had impact on clinical activities.
50+% affected administrative processes.
How much downtime is acceptable?
Required good understanding of business processes
Critical system downtime can have significant negative
impact on patient health
(Forrester Consulting Report, 2010)
5
Health IT Workforce Curriculum
Version 3.0/Spring 2012
Installation and Maintenance of Health IT Systems
Creating Fault-Tolerant Systems, Backups, and Decommissioning
Lecture a
Three Areas for Fault Tolerance
1. Hardware fault tolerance compensate for hardware failure
Often simplest to implement
Extra hardware resources as secondaries or backups
E.g., secondary network cards, error checking and correcting (ECC)
memory, redundant power supplies, redundant disks / file storage
2. Software fault tolerance compensate for poor programming or data
Involves program verification (code review) and assertion checking
Compensating for faults such as poorly formatted input data
E.g., sanity check, double-entry comparison, and multiple-version
programs
3. System fault tolerance compensate for non-computer or inter-device
failures
Most complex, highest number of variables
System may include facilities that are not computer-based
E.g., detection of sensor failure, graceful reaction to intersystem
communication failure, graceful shutdown in unexpected circumstances
(A Conceptual Framework for System Fault Tolerance - 1.1 What is a System?, 1995)
6
Health IT Workforce Curriculum
Version 3.0/Spring 2012
Installation and Maintenance of Health IT Systems
Creating Fault-Tolerant Systems, Backups, and Decommissioning
Lecture a
Six Rules of Fault Tolerance
In A Conceptual Framework for Systems Fault Tolerance,
the Center For High Integrity Software Systems Assurance
summarizes 6 rules:
1. Know precisely what the system is supposed to do.
2. Look at what can go wrong.
3. Study your application & determine appropriate fault
containment regions & earliest feasible time to deal with
potential faults.
4. Completely understand application requirements & use
them to make appropriate time/space trade-offs.
5. Concentrate on credible faults first.
6. Determine application failure margins.
(A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995)
7
Health IT Workforce Curriculum
Version 3.0/Spring 2012
Installation and Maintenance of Health IT Systems
Creating Fault-Tolerant Systems, Backups, and Decommissioning
Lecture a
Six Rules of Fault Tolerance
(contd)
Rule 1: Know precisely what the system is
supposed to do.
How long can system be allowed to deviate from
specifications before being declared a failure?
What abnormal conditions must be
accommodated?
Rule 2: Look at what can go wrong.
Group causes into classes.
Define fault floor.
(A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995)
8
Health IT Workforce Curriculum
Version 3.0/Spring 2012
Installation and Maintenance of Health IT Systems
Creating Fault-Tolerant Systems, Backups, and Decommissioning
Lecture a
Six Rules of Fault Tolerance
(contd)
Rule 3: Study your application & determine
appropriate fault containment regions & earliest
feasible time to deal with potential faults.
Fault tolerance generally means more resources
(time & space)
Rule 4: Completely understand application
requirements & use them to make appropriate
time/space trade-offs.
Consider costs, & classify faults by likelihood.
(A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995)
9
Health IT Workforce Curriculum
Version 3.0/Spring 2012
Installation and Maintenance of Health IT Systems
Creating Fault-Tolerant Systems, Backups, and Decommissioning
Lecture a
Six Rules of Fault Tolerance
(contd)
Rule 5: Concentrate on credible faults first.
Ignore less likely faults unless they require little
additional cost. Mitigate the most likely faults first.
Rule 6: Determine application failure margins.
Balance the degree of fault tolerance needed with
the cost of implementation.
Does a small expenditure now save a great deal
later?
(A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together, 1995)
10
Health IT Workforce Curriculum
Version 3.0/Spring 2012
Installation and Maintenance of Health IT Systems
Creating Fault-Tolerant Systems, Backups, and Decommissioning
Lecture a
Risk Assessment
Risk Assessment
Identify what is to be protected
Examples: EHR server, or clinical record
Include rating of importance
Types of loss or liability
Identify risks to each component
Examples: Power failure, or record alteration
Risk = Threat x Probability x Impact
Intentional or Accidental, Human or System, Internal or External
Identify mitigation strategies for each risk
Examples: UPS with power monitoring, or automatic backup
Policies (for people) or Controls (for systems or equipment)
(Benson, n.d., Maniscalchi. 2009)
11
Health IT Workforce Curriculum
Version 3.0/Spring 2012
Installation and Maintenance of Health IT Systems
Creating Fault-Tolerant Systems, Backups, and Decommissioning
Lecture a
Creating Fault-Tolerant Systems,
Backups, and Decommissioning
Summary Lecture a
Fault tolerance is running despite problems
Implemented using Redundancy to increase
Reliability and provide Availability
Three areas of Hardware, Software, and System
Risk assessment to identify assets, risks, and
mitigation
12
Health IT Workforce Curriculum
Version 3.0/Spring 2012
Installation and Maintenance of Health IT Systems
Creating Fault-Tolerant Systems, Backups, and Decommissioning
Lecture a
Creating Fault-Tolerant Systems,
Backups, and Decommissioning
References Lecture a
References
Benson C. Security Planning. (n.d.) Available from: http://technet.microsoft.com/en-us/library/cc723503.aspx
Maniscalchi, J. Threat vs. Vulnerability vs. Risk. (June 2009) Available from:
http://www.digitalthreat.net/2009/06/threat-vs-vulnerability-vs-risk/
A Conceptual Framework for System Fault Tolerance - 1.1 What is a System? (1995, March 30). Retrieved from
National Institute of Standards and Technology website:
http://hissa.nist.gov/chissa/SEI_Framework/framework_3.html
A Conceptual Framework for System Fault Tolerance - 5 Putting It All Together (1995, March 30). Retrieved from
National Institute of Standards and Technology website:
http://hissa.nist.gov/chissa/SEI_Framework/framework_20.html
Server Availability Trends In The Time Of Electronic Health Records. (January 2010) Forrester Research, Inc.
Available at http://www.himss.org/content/files/Stratus%20Tech%20-
%20ServerAvailabilityTrends_EHR_ForresterPaper.pdf
Acknowledgement: The following reference generally informed the unit
Shackhow, T. et al. (June 2008). EHR Meltdown: How to Protect Your Patient Data. Fam Pract Manag, 15(6), A3-
A8. Available from: http://www.aafp.org/fpm/2008/0600/pa3.html
13
Health IT Workforce Curriculum
Version 3.0/Spring 2012
Installation and Maintenance of Health IT Systems
Creating Fault-Tolerant Systems, Backups, and Decommissioning
Lecture a