Inductionn + Chapter 1 Part 1

Uploaded by

alhindal63

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Inductionn + Chapter 1 Part 1

Uploaded by

alhindal63

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Induction

Chapter 1: Fault Tolerance and

Resilience in Cloud Computing
Environments
Dr. Sarah Abu Ghazalah
Course Description
• This course serves as a guide to the students to the most crucial security
technologies in different context, such as physical system, infrastructure, cloud
computing, network, wireless network, access control, blockchain and active
directory.
Course Learning Outcomes
CLOs Aligned PLOs
1 Knowledge and Understanding
1.1 List security technologies for a given asset K1
1.2 Describe the utilization of biometric in systems K1
2 Skills
2.1 Analyze how can cloud computing tolerate failures and apply appropriate mechanisms
S1

2.2 Implement Access Control Model on a given data/application S2

2.3 Justify the deployment of security technologies on Blockchain/Active S4
directory/Networking
3 Value
3.1 Ability to works as a group to use various security defense technologies necessary for a V2
given application.
No List of Topics
Fault Tolerance and Resilience in Cloud Computing Environment
Basic Concept of Fault Tolerance
1 Different Level of Fault Tolerance in Cloud Computing
Fault Tolerance Against Crash Failures in Cloud Computing
Fault Tolerance as a Service in Cloud Computing
Physical Security Essentials
Physical Security Threats
2 Physical Security Prevention and Mitigation Measures
Recovery form Physical Security Breach
Integration of Physical and Logical Security
Biometrics
3 Biometric System Architecture
Security Considerations

Course
Infrastructure Security
4 Communication security Goals
Attacks and Countermeasures
Access Control
5 DAC, MAC and RBAC

Contents
Strengthen the Infrastructure: Authentication Systems
Active Directory
Avenues to Compromise
Attractive Accounts for Credential Theft
Reducing the Active Directory Attack Surface
6 Implementing Least-Privilege Administrative Models
Implementing Secure Administrative Hosts
Securing Domain Controllers Against Attack
Monitoring Active Directory for Signs of Compromise
Audit Policy Recommendations
Network Security
Remote Access Architecture
AAA Server
7
SSO Technologies
Virtual Private Networks (VPNs)
PKI Architecture
8 Blockchain
Total
Course Book

Cyber Security and IT Infrastructure

Protection, John R. Vacca, Syngress;
st
1 edition (2014), ISBN: 0124166814
Assessments

Week Percentage of Total

# Assessment task
Due Assessment Score

1 Assignment 9 10
Quiz 4, 8 2 Quizzes (each 5
2
Marks)
3 Presentation 9 10
4 Mid- Term Theory Exam 6-7 20
5 Final Examination 12 50
Introduction
• Service providers have been
building massive data centers
that are distributed over
several geographical regions
to efficiently meet the
demand for their Cloud-
based services.
• In general, these data centers
are built using hundreds of
thousands of servers, and
virtualization technology is
used to provision computing
resources.
Introduction
• Due to the highly complex nature of the underlying infrastructure,
even carefully engineered data centers are subject to a large number
of failures.
• These failures evidently reduce the overall reliability and availability
of the cloud computing service.
• As a result, fault tolerance becomes of paramount importance to the
users as well as the service providers to ensure correct and
continuous system operation even in the presence of an unknown
and unpredictable number of failures.
Cloud Computing Fault Model
• A failure represents the condition in which the system deviates from
fulfilling its intended functionality or the expected behavior.

Fault Error Failure

So, What is Fault Tolerance?

Fault tolerance is the ability of the

system to perform its function even in
the presence of failures
Cloud Computing Architecture

Failure in a given layer normally has an impact on the services offered by the layers
above it. For example, failure in a user-level middleware (PaaS) may produce errors
in the software services built on top of it (SaaS applications). Similarly, failures in
the physical hardware or the IaaS layer will have an impact on most PaaS and SaaS
services.
This implies that the impact of failures in the IaaS layer or the physical hardware is
significantly high; hence, it is important to characterize typical hardware faults
and develop corresponding fault tolerance techniques
Failure Behavior of Servers
Failure Behavior of the Network
• Servers are connected using a set of
network switches and routers.
• In all rack-mounted servers are first
connected via a 1 Gbps link to a top-of-
rack switch (ToR), which is in turn
connected to two (primary and backup)
aggregation switches (AggS). An AggS
connects tens of switches (ToR) to
redundant access routers (AccR). This
implies that each AccR handles traffic
from thousands of servers and routes it
to core routers that connect different
data centers to the Internet.
Failure Behavior of the Network
• A link failure happens when the connection between two devices on a specific interface is down,
and a device failure happens when the device is not routing/forwarding packets correctly (due to
power outage or hardware crash).

Figure 1.3b
BASIC CONCEPTS ON FAULT TOLERANCE

• Crash faults that cause the system components to completely stop

functioning or remain inactive during failures (power outage, hard
disk crash).
•Byzantine faults that lead the system components to behave
arbitrarily or maliciously during failure, causing the system to behave
unpredictably incorrect.

https://www.youtube.com/watch?v=VWG9xcwjxUg
Fault Tolerance Methods

Monitoring Checkpoint

Replication
Fault Tolerance Methods
• The most widely adopted methods to achieve fault tolerance against
crash faults and byzantine faults are as follows:
1- Checking and monitoring: The system is constantly monitored at
runtime to validate, verify, and ensure that correct system
specifications are being met. This technique, though very simple, plays
a key role in failure detection and subsequent reconfiguration.
2- Checkpoint and restart: The system state is captured and saved
based on predefined parameters (after every 1024 instructions or every
60 seconds). When the system undergoes a failure, it is restored to the
previously known correct state using the latest checkpoint information.
Fault Tolerance Methods
3- Replication: Critical system components are duplicated using
additional hardware, software, and network resources in such a way
that a copy of the critical components is available even after a failure
happens.
Replication mechanisms are mainly used in
two formats:

Active Passive

• In active replication, all the replicas are simultaneously invoked, and each
replica processes the same request at the same time. This implies that all
the replicas have the same system state at any given point of time, and it
can continue to deliver its service even in case of a single replica failure.
• In passive replication, only one processing unit (the primary replica)
processes the requests, while the backup replicas only save the system
state during normal execution periods. Backup replicas take over the
execution process only when the primary replica fails.
Replication Mechanisms (Cont.)
• Semiactive replication technique is derived from traditional
approaches wherein primary and backup replicas execute all the
instructions but only the output generated by the primary replica is
made available to the user.
• Output generated by the backup replicas is logged and suppressed
within the system so that it can readily resume the execution process
when the primary replica failure happens.