This document discusses the importance of dependability in systems and defines key concepts. It notes that system failures can affect many users and lead to rejection, high costs, and data loss. Dependability is defined as the degree of confidence users have that a system will operate as expected without failure. The four main dimensions of dependability are availability, reliability, safety, and security. Reliability is the probability a system delivers services correctly over time, while availability is the probability a system can deliver services upon request. Ensuring dependability requires avoiding errors, effective testing, and fault tolerance mechanisms.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
261 views
Unit 11 Dependability-and-Security
This document discusses the importance of dependability in systems and defines key concepts. It notes that system failures can affect many users and lead to rejection, high costs, and data loss. Dependability is defined as the degree of confidence users have that a system will operate as expected without failure. The four main dimensions of dependability are availability, reliability, safety, and security. Reliability is the probability a system delivers services correctly over time, while availability is the probability a system can deliver services upon request. Ensuring dependability requires avoiding errors, effective testing, and fault tolerance mechanisms.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39
Why Dependability important
System failures affect a large number of people.
Users often reject systems that are unreliable, unsafe, or insecure System failure costs may be enormous. Undependable systems may cause information loss Remember following things while developing dependable system ◦ Hardware Failure ◦ Software Failure ◦ Operational Failure Dependability Properties The dependability of a computer system is a property of the system that reflects its trustworthiness. Trustworthiness here essentially means the degree of confidence a user has that the system will operate as they expect, and that the system will not ‘fail’ in normal use. It is not meaningful to express dependability numerically. Programs running on computers may not operate as expected and occasionally may corrupt the data that is managed by the system Principles of Dependability Four principle dimension of dependability Availability Informally, the availability of a system is the probability that it will be up and running and able to deliver useful services to users at any given time. Reliability Informally, the reliability of a system is the probability, over a given period of time, that the system will correctly deliver services as expected by the user. Safety Informally, the safety of a system is a judgment of how likely it is that the system will cause damage to people or its environment. Security Informally, the security of a system is a judgment of how likely it is that the system can resist accidental or deliberate intrusions. Four main dependability properties Reparability System failures are inevitable, but the disruption caused by failure can be minimized if the system can be repaired quickly. Maintainability As systems are used, new requirements emerge and it is important to maintain the usefulness of a system by changing it to accommodate these new requirements. Survivability A very important attribute for Internet- based systems is survivability. Survivability is the ability of a system to continue to deliver service whilst under attack and, potentially, whilst part of the system is disabled. Error Tolerance This property can be considered as part of usability and reflects the extent to which the system has been designed so that user input errors are avoided and tolerated. Ensure these points before developing dependable software You avoid the introduction of accidental errors into the system during software specification and development. You design verification and validation processes that are effective in discovering residual errors that affect the dependability of the system. You design protection mechanisms that guard against external attacks that can compromise the availability or security of the system. You configure the deployed system and its supporting software correctly for its operating environment. Availability and reliability System availability and reliability are closely related properties that can both be expressed as numerical probabilities. The availability of a system is the probability that the system will be up and running to deliver these services to users on request. The reliability of a system is the probability that the system’s services will be delivered as defined in the system specification. Reliability and availability are closely related but sometimes one is more important than the other. If users expect continuous service from a system then the system has a high availability requirement. Availability and reliability The definition of reliability states that the environment in which the system is used and the purpose that it is used for taken into account. If you measure system reliability in one environment, you can’t assume that the reliability will be the same if the system is used in a different way. Reliability The probability of failure-free operation over a specified time, in a given environment, for a specific purpose. Availability The probability that a system, at a point in time, will be operational and able to deliver the requested services. Availability and reliability Continue A strict definition of reliability relates the system implementation to its specification. Availability and reliability are obviously linked as system failures may crash the system. Availability does not just depend on the number of system crashes, but also on the time needed to repair the faults that have caused the failure. System reliability and availability problems are mostly caused by system failures. Some of these failures are a consequence of specification errors or failures in other related systems such as a communications system. Reliability Terminology Human error or mistake Human behavior that results in the introduction of faults into a system. System fault A characteristic of a software system that can lead to a system error. The fault is the inclusion of the code to add 1 hour to the time of the last transmission, without a check if the time is greater than or equal to 23.00. System error An erroneous system state that can lead to system behavior that is unexpected by system users. System failure An event that occurs at some point in time when the system does not deliver a service as expected by its users. No weather data is transmitted because the time System Reliability and Availability When an input or a sequence of inputs causes faulty code in a system to be executed, an erroneous state is created that may lead to a software failure. Most inputs do not lead to system failure. However, some inputs or input combinations, shown in the shaded ellipse Ie in below fig. cause system failures or erroneous outputs to be generated. If inputs in the set Ie are executed by frequently used parts of the system, then failures will be frequent. However, if the inputs in Ie are executed by code that is rarely used, then users will hardly ever see failures. System error and System failure Not all code in a program is executed. The code that includes a fault (e.g., the failure to initialize a variable) may never be executed because of the way that the software is used. Errors are transient. A state variable may have an incorrect value caused by the execution of faulty code. However, before this is accessed and causes a system failure, some other system input may be processed that resets the state to a valid value. The system may include fault detection and protection mechanisms. These ensure that the erroneous behavior is discovered and corrected before the system services are affected. complementary approaches that are used to improve the reliability of a system: Fault avoidance Development techniques are used that either minimize the possibility of human errors and/or that trap mistakes before they result in the introduction of system faults. Examples of such techniques include avoiding error-prone programming language constructs such as pointers and the use of static analysis to detect program anomalies. Fault detection and removal The use of verification and validation techniques that increase the chances that faults will be detected and removed before the system is used. Systematic testing and debugging is an example of a fault detection technique. Fault tolerance These are techniques that ensure that faults in a system do not result in system errors or that system errors do not result in system failures. Safety Safety-critical systems are systems where it is essential that system operation is always safe; that is, the system should never damage people or the system’s environment even if the system fails. Examples of safety-critical systems include control and monitoring systems in aircraft, process control systems in chemical and pharmaceutical plants, and automobile control systems. Hardware control of safety-critical systems is simple to implement and analyze than software control. we now build systems of such complexity that they cannot be controlled by hardware alone. Software control is essential because of the need to manage large numbers of sensors and actuators with complex control laws. Two classes of Safety Critical Primary safety-critical software This is software that is embedded as a controller in a system. Malfunctioning of such software can cause a hardware malfunction, which results in human injury or environmental damage. Secondary safety-critical software This is software that can indirectly result in an injury. An example of such software is a computer-aided engineering design system whose malfunctioning might result in a design fault in the object being designed. This fault may cause injury to people if the designed system malfunctions. Why all reliable system are not safe We can never be 100% certain that a software system is fault-free and fault tolerant. Undetected faults can be dormant for a long time and software failures can occur after many years of reliable operation. The specification may be incomplete in that it does not describe the required behavior of the system in some critical situations. Hardware malfunctions may cause the system to behave in an unpredictable way, and present the software with an unanticipated environment. The system operators may generate inputs that are not individually incorrect but which, in some situations, can lead to a system malfunction. Safety Terminology Accident An unplanned event or sequence of events which results in human death or injury, damage to property, or to the environment. Hazard A condition with the potential for causing or contributing to an accident. Damage A measure of the loss resulting from a mishap. Damage can range from many people being killed as a result of an accident to minor injury or property damage. Safety Terminology Hazard Severity An assessment of the worst possible damage that could result from a particular hazard. Hazard Probability The probability of the events occurring which create a hazard. Probability values tend to be arbitrary but range from ‘probable’ (say 1/100 chance of a hazard occurring) to ‘implausible’ (no conceivable situations are likely in which the hazard could occur). Risk This is a measure of the probability that the system will cause an accident. Ways of assuring safety Hazard avoidance The system is designed so that hazards are avoided. For example, a cutting system that requires an operator to use two hands to press. separate buttons simultaneously avoids the hazard of the operator’s hands being in the blade pathway. Hazard detection and removal The system is designed so that hazards are detected and removed before they result in an accident. Example, a chemical plant system may detect excessive pressure and open a relief valve to reduce these pressures before an explosion occurs. Damage limitation The system may include protection features that minimize the damage that may result from an accident. Example, an aircraft engine normally includes automatic fire extinguishers. If a fire occurs, it can often be controlled before it poses a Security Security reflects the ability of a system to protect itself against external attacks. Security failures may lead to loss of availability, damage to the system or its data, or the leakage of information to unauthorized people. Security is a system attribute that reflects the ability of the system to protect itself from external attacks, which may be accidental or deliberate. If you really want a secure system, it is best not to connect it to the Internet. Military systems, systems for electronic commerce, and systems that involve the processing and interchange of confidential information must be designed so that they achieve a high level of security. Security Terminology Asset The records of each patient that is receiving or has received treatment. Exposure Potential financial loss from future patients who do not seek treatment because they do not trust the clinic to maintain their data. Financial loss from legal action by the sports star. Loss of reputation. Vulnerability A weak password system which makes it easy for users to set guessable passwords. User ids that are the same as names. Attack An impersonation of an authorized user. Threat An unauthorized user will gain access to the system by guessing the credentials (login name and password) of an authorized user. Control A password checking system that disallows user passwords that are proper names or words that are normally included in a dictionary. Types of Security threats in Security System Threats to the confidentiality of the system and its data These can disclose information to people or programs that are not authorized to have access to that information. Threats to the integrity of the system and its data These threats can damage or corrupt the software or its data. Threats to the availability of the system and its data These threats can restrict access to the software or its data for authorized users. security are comparable to those for reliability and safety: Vulnerability avoidance Controls that are intended to ensure that attacks are unsuccessful. The strategy here is to design the system so that security problems are avoided. Attack detection and neutralization Controls that are intended to detect and repel attacks. These controls involve including functionality in a system that monitors its operation and checks for unusual patterns of activity. Exposure limitation and recovery Controls that support recovery from problems. These can range from automated backup strategies and information ‘mirroring’ to insurance policies that cover the costs associated with a successful attack on the system. Success Criteria Generally, complex sociotechnical systems are developed to tackle what are sometimes called ‘wicked problems’. A wicked problem is a problem that is so complex and which involves so many related entities that there is no definitive problem specification. Different stakeholders see the problem in different ways and no one has a full understanding of the problem as a whole. The nature of security and dependability attributes sometimes makes it even more difficult to decide if a system is successful. The intention of a new system may be to improve security by replacing an existing system with a more secure data environment. System Engineering Systems engineering encompasses all of the activities involved in procuring, specifying, designing, implementing, validating, deploying, operating, and maintaining sociotechnical systems. Three Overlapping stages of Sociotechnical System Procurement or acquisition During this stage, the purpose of a system is decided; high-level system requirements are established; decisions are made on how functionality will be distributed across hardware, software, and people; and the components that will make up the system are purchased. System Engineering Development During this stage, the system is developed. Development processes include all of the activities involved in system development such as requirements definition, system design, hardware and software engineering, system integration, and testing. Operational processes are defined and the training courses for system users are designed. Operation At this stage, the system is deployed, users are trained, and the system is brought into use. The planned operational processes usually then have to change to reflect the real working environment where the system is used. Over time, the system evolves as new requirements are identified. Eventually, the system declines in Stages of System Engineering System Procurement The initial phase of systems engineering is system procurement (sometimes called system acquisition). At this stage, decisions are made on the scope of a system that is to be purchased, system budgets and timescales, and the high-level system requirements. Using mentioned information, further decisions are then made on whether to procure a system, the type of system required, and the supplier or suppliers of the system. Drivers of System Procurement Decisions The state of other organizational systems The need to comply with external regulations External competition Business reorganization Available budget Procurement Process for COTS Important points of Procurement Process Off-the-shelf components do not usually match requirements exactly, unless the requirements have been written with these components in mind. When a system is to be built specially, the specification of requirements is part of the contract for the system being acquired. It is therefore a legal as well as a technical document. After a contractor has been selected, to build a system, there is a contract negotiation period where you may have to negotiate further changes to the requirements and discuss issues such as the cost of changes to the system. Once a COTS system has been selected, you may negotiate with the supplier on costs, license conditions, possible changes to the system, etc. System Development The goals of the system development process are to develop or acquire all of the components of a system and then to integrate these components to create the final system. During procurement, business and high-level functional and nonfunctional system requirements are defined. This systems engineering process was an important influence on the ‘waterfall’ model of the software process. Although it is now accepted that the ‘waterfall’ model is not usually appropriate for software development, most systems development processes are plan-driven processes that still follow this model. System Development Fundamental Activities of System Development Requirements development The high-level and business requirements identified during the procurement process have to be developed in more detail. Requirements may have to be allocated to hardware, software, or processes and prioritized for implementation. System design This process overlaps significantly with the requirements development process. It involves establishing the overall architecture of the system, identifying the different system components and understanding the relationships between them. Subsystem engineering This stage involves developing the software components of the system; configuring off-the-shelf hardware and software, designing, if necessary, special-purpose hardware; defining the operational processes for the system. Fundamental Activities of System Development System integration The components are put together to create a new system. Only then do the emergent system properties become apparent. System testing This is usually an extensive, prolonged activity where problems are discovered. The subsystem engineering and system integration phases are reentered to repair these problems, tune the performance of the system, and implement new requirements. System testing may involve both testing by the system developer/acceptance user testing and by the organization that has procured the system. System deployment This is the process of making the system available to its users, transferring data from existing systems, & establishing communication with other systems in the environment. Fundamental Activities of System Development System integration The components are put together to create a new system. Only then do the emergent system properties become apparent. System testing This is usually an extensive, prolonged activity where problems are discovered. The subsystem engineering and system integration phases are reentered to repair these problems, tune the performance of the system, and implement new requirements. System testing may involve both testing by the system developer/acceptance user testing and by the organization that has procured the system. System deployment This is the process of making the system available to its users, transferring data from existing systems, & establishing communication with other systems in the environment. System Operation Operational processes are the processes that are involved in using the system for its defined purpose. For example, operators of an air traffic control system follow specific processes when aircraft enter and leave airspace, when they have to change height or speed, when an emergency occurs, and so on. The key benefit of having system operators is that people have a unique capability of being able to respond effectively to unexpected situations, even when they have never had direct experience of these situations. A problem that may only emerge after the system goes into operation is the operation of the new system alongside existing systems. Reason’s Cheese Swiss Model In this model, the defenses built into a system are compared to slices of Swiss cheese. Some types of Swiss cheese, such as Emmental, have holes and so the analogy is that the latent conditions are comparable to the holes in cheese slices. The position of these holes is not static but changes depending on the state of the overall sociotechnical system. If each slice represents a barrier, failures can occur when the holes line up at the same time as a human operational error. An active failure of system operation gets through the holes and leads to an overall system failure. Reason’s Cheese Swiss Model To reduce the probability that system failure will result from human error, designers should: ◦ Design a system so that different types of barriers are included. This means that the ‘holes’ will probably be in different places and so there is less chance of the holes lining up and failing to trap an error. ◦ Minimize the number of latent conditions in a system. Effectively, this means reducing the number and size of system ‘holes’. Human errors are inevitable and systems should include barriers to detect these errors before they lead to system failure. Reason’s Swiss cheese model explains how human error plus latent defects in the barriers can lead to system failure.