All Material For Exit CS Module II
All Material For Exit CS Module II
Compiled by
Destalem Hagos (MSc)
Teklay Hagos (MSc)
February 2023
Page 1 of 248
The module contains Four parts
Computer Security
Compiler design
Page 1 of 248
Executive summary
This module introduces students to fundamental and advanced concepts in computer science that
leads students in preparing the national exit examination in relevant areas of Data
Communication and computer networking to explain the concepts and principles of data
communications and computer networks, Computer Security and privacy to comprehend
information security, including security threats, security vulnerabilities, Security goals and
security mechanisms, Network and system administration, computer organization and
architecture, operating systems identify basic concepts ,process scheduling, inter-process
communication; threads; CPU scheduling, , scheduling criteria, scheduling algorithms; process
synchronization . Formal language and complexity theory introduces some fundamental concepts
in automata theory and formal languages including grammar, finite automaton, regular
expression, formal language, push down automaton, and Turing machine. compiler design learns
basic techniques used in compiler construction such as lexical analysis, Syntax analysis, Semantic
analysis, intermediate code generation and target code generator. Introduction to Artificial
Intelligence defines reasoning, knowledge representation and learning techniques of artificial
intelligence.
Page 1 of 248
Table of Contents
Executive summary ......................................................................................... 1
Part I: Computer Networking and Security ......................................................... 4
1.1. Data Communications and Computer networking ............................................................... 4
1.2. Network and System Administration .................................................................................. 41
1.4. Computer Security and privacy ........................................................................................... 55
Part II: Computer Organization and Architecture ............................................... 86
2.1. Introduction ........................................................................................................................... 86
2.1.1. Types of computer Architecture ......................................................................................................... 87
2.1.2. Logic gates and Boolean algebra ........................................................................................................ 88
2.1.3. Types of Sequential Circuits .............................................................................................................. 95
2.1.4. Types of Latches ................................................................................................................................ 99
2.1.5. Number System ................................................................................................................................ 100
2.1.6. Register transfer languages ............................................................................................................... 114
2.1.7. Computer Registers .......................................................................................................................... 116
2.1.8. Computer Instructions ...................................................................................................................... 118
2.2. Operating Systems .............................................................................................................. 119
2.2.1. Introduction ...................................................................................................................................... 119
2.2.2. Basic elements of computer system.................................................................................................. 120
2.2.3. Operating system Services ............................................................................................................... 121
2.2.4. Operating system categories ............................................................................................................. 124
2.2.5. Operating System Types................................................................................................................... 128
2.2.6. Process Management ........................................................................................................................ 129
2.2.7. Operating System Scheduling Techniques ....................................................................................... 133
2.2.8. Inter-process communication (IPC).................................................................................................. 145
2.2.9. Memory Management ...................................................................................................................... 149
2.2.10. File system Interface ........................................................................................................................ 152
Part III: Introduction to Artificial Intelligence ...................................................159
3.1. Introduction to AI ............................................................................................................... 159
3.2. Main goals of Artificial Intelligence ................................................................................. 160
3.3. Types of Artificial Intelligence.......................................................................................... 160
3.4. Types of AI Agents ............................................................................................................ 166
3.5. The Nature of Environments ............................................................................................. 169
3.6. Knowledge Representation and Reasoning ...................................................................... 176
3.7. Syntax of propositional logic ............................................................................................. 177
3.8. Knowledge Representation ................................................................................................ 182
3.9. The relation between knowledge and intelligence ........................................................... 185
3.10. Machine Learning Basics ................................................................................................... 189
3.11. Natural Language Processing (NLP) Basics ..................................................................... 193
Part IV: Formal Language Theory and Compiler ...............................................204
4.1. Introduction to the Theory of Computation ...................................................................... 204
4.1.1. Grammars ......................................................................................................................................... 207
4.1.2. Types of Grammars .......................................................................................................................... 209
4.1.3. Regular Expression and Regular Grammars ..................................................................................... 210
4.1.4. EQUIVALENCE WITH FINITE AUTOMATA ............................................................................. 213
4.1.5. Introduction to Turing machines ...................................................................................................... 225
4.2. Compiler Design ................................................................................................................. 229
4.2.1. Language Processing System ........................................................................................................... 230
4.2.2. The Structure of a Compiler ............................................................................................................. 233
4.2.3. Intermediate Code Generation .......................................................................................................... 238
4.2.4. Compiler-Construction Tools ........................................................................................................... 240
Page 2 of 248
Page 3 of 248
Part I: Computer Networking and Security
1.1. Data Communications and Computer networking
Activity
Define data communication?
What Computer Networking?
Explain and differentiate between TCP/IP and OSI mode
Mention and Explain Briefly the Type of Computer networking
Explain the concepts and principles of data communications and computer networks
Demonstrate data transmission and transmission media
Describe Protocols and various networking components
Differentiate TCP/IP & OSI Reference Model
Differentiate LAN and WAN Technologies
Explain and implement IP addressing.
Build small to medium level Computer networks Demonstrate subnets
The fundamental purpose of a communications system is the exchange of data between two
parties. Figure 1.1 presents one particular example, which is communication between a
workstation and a server over a public telephone network.
Another example is the exchange of voice signals between two telephones over the same
network. The key components of the model are as follows:
Source. This device generates the data to be transmitted; examples are telephones and
personal computers.
Transmitter: Usually, the data generated by a source system are not transmitted directly
in the form in which they were generated. Rather, a transmitter transforms and encodes
the information in such a way as to produce electromagnetic signals that can be
transmitted across some sort of transmission system. For example, a modem takes a
digital bit stream from an attached device such as a personal computer and transforms that
bit stream into an analog signal that can be handled by the telephone network.
Transmission system: This can be a single transmission line or a complex network
connecting source and destination.
Receiver: The receiver accepts the signal from the transmission system and converts it
into a form that can be handled by the destination device. For example, a modem will
Page 4 of 248
accept an analog signal coming from a network or transmission line and convert it into a
digital bit stream.
Destination: Takes the incoming data from the receiver.
To get some flavor for the focus of data communication, Figure 1.2 provides a new perspective
on the communications model of Figure 1.1a. We trace the details of this figure using electronic
mail as an example. Suppose that the input device and transmitter are components of a personal
computer. The user of the PC wishes to send a message m to another user. The user activates the
electronic mail package on the PC and enters the message via the keyboard (input device). The
character string is briefly buffered in main memory. We can view it as a sequence of bits (g) in
memory. The personal computer is connected to some transmission medium, such as a local
network or a telephone line, by an I/O device (transmitter), such as a local network transceiver
or a modem. The input data are transferred to the transmitter as a sequence of voltage shifts
[g(t)] representing bits on some communications bus or cable. The transmitter is connected
directly to the medium and converts the incoming stream [g(t)] into a signal [s(t)] suitable for
transmission; specific alternatives will be described later on coming sections.
The transmitted signal s(t) presented to the medium is subject to a number of impairments,
discussed in later section, before it reaches the receiver. Thus, the received signal r(t) may differ
from s(t). The receiver will attempt to estimate the original s(t), based on r(t) and its knowledge
of the medium, producing a sequence of bits These bits are sent to the output personal computer,
where they are briefly buffered in memory as a block of bits In many cases, the destination
system will attempt to determine if an error has occurred and, if so, cooperate with the source
system to eventually obtain a complete, error-free block of data. These data are then presented to
Page 5 of 248
the user via an output device, such as a printer or screen. The message as viewed by the user will
usually be an exact copy of the original message (m).
Now consider a telephone conversation. In this case the input to the telephone is a message (m) in
the form of sound waves. The sound waves are converted by the telephone into electrical signals
of the same frequency. These signals are transmitted without modification over the telephone
line. Hence the input signal g(t) and the transmitted signal s(t) are identical. The signals (t) will
suffer some distortion over the medium, so that r(t) will not be identical to s(t). Nevertheless, the
signal r(t) is converted back into a sound wave with no attempt at correction or improvement of
signal quality. Thus, is not an exact replica of m. However, the received sound message is
generally comprehensible to the listener. The discussion so far does not touch on other key
aspects of data communications, including data link control techniques for controlling the flow of
data and detecting and correcting errors, and multiplexing techniques for transmission efficiency
Transmission Media
In a data transmission system, the transmission medium is the physical path between transmitter
and receiver. For guided media, electromagnetic waves are guided along a solid medium, such as
copper twisted pair, copper coaxial cable, and optical fiber. For unguided media, wireless
transmission occurs through the atmosphere, outer space, or water.
The characteristics and quality of a data transmission are determined both by the characteristics
of the medium and the characteristics of the signal. In the case of guided media, the medium itself
is more important in determining the limitations of transmission.
For unguided media, the bandwidth of the signal produced by the transmitting antenna is more
important than the medium in determining transmission characteristics. One key property of
Page 6 of 248
signals transmitted by antenna is directionality. In general, signals at lower frequencies are
omnidirectional; that is, the signal propagates in all directions from the antenna. At higher
frequencies, it is possible to focus the signal into a directional beam.
Classification Of Transmission Media:
In considering the design of data transmission systems, key concerns are data rate and distance:
the greater the data rate and distance the better. A number of design factors relating to the
transmission medium and the signal determine the data rate and distance:
Bandwidth: All other factors remaining constant, the greater the bandwidth of a signal,
the higher the data rate that can be achieved.
Transmission impairments: Impairments, such as attenuation, limit the distance. For
guided media, twisted pair generally suffers more impairment than coaxial cable, which in
turn suffers more than optical fiber.
Interference: Interference from competing signals in overlapping frequency bands can
distort or wipe out a signal. Interference is of particular concern for unguided media, but
is also a problem with guided media. For guided media, interference can be caused by
emanations from nearby cables. For example, twisted pairs are often bundled together and
conduits often carry multiple cables. Interference can also be experienced from unguided
transmissions. Proper shielding of a guided medium can minimize this problem.
Number of receivers: A guided medium can be used to construct a point-to-point link or
a shared link with multiple attachments. In the latter case, each attachment introduces
some attenuation and distortion on the line, limiting distance and/or data rate.
Guided Media
It is defined as the physical medium through which the signals are transmitted. It is also known as
Bounded media.
Page 7 of 248
Types of Guided media
Twisted pair
Twisted pair is a physical media made up of a pair of cables twisted with each other. A twisted
pair cable is cheap as compared to other transmission media. Installation of the twisted pair cable
is easy, and it is a lightweight cable. The frequency range for twisted pair cable is from 0 to
3.5KHz. A twisted pair consists of two insulated copper wires arranged in a regular spiral pattern.
Types of Twisted Pair
There are two types of twisted pair Shielded twisted pair and unshielded twisted pair
Coaxial Cable
Coaxial cable is very commonly used transmission media, for example, TV wire is usually a
coaxial cable. The name of the cable is coaxial as it contains two conductors parallel to each
other. It has a higher frequency as compared to Twisted pair cable.
Fiber Optic
Fiber optic cable is a cable that uses electrical signals for communication. Fiber optic is a cable
that holds the optical fibers coated in plastic that are used to send the data by pulses of light. The
plastic coating protects the optical fibers from heat, cold, electromagnetic interference from other
types of wiring. Fiber optics provide faster data transmission than copper wires.
Unguided Transmission
An unguided transmission transmits the electromagnetic waves without using any physical
medium. Therefore, it is also known as wireless transmission. In unguided media, air is the media
through which the electromagnetic energy can flow easily. Unguided transmission is broadly
classified into three categories:
Radio waves
Radio waves are the electromagnetic waves that are transmitted in all the directions of free space.
Radio waves are omnidirectional, i.e., the signals are propagated in all the directions. The range
in frequencies of radio waves is from 3Khz to 1 khz. In the case of radio waves, the sending and
receiving antenna are not aligned, i.e., the wave sent by the sending antenna can be received by
any receiving antenna. An example of the radio wave is FM radio.
Microwaves
Microwaves are of two types:
Terrestrial microwave
Satellite microwave communication.
Page 8 of 248
Terrestrial Microwave Transmission
Terrestrial Microwave transmission is a technology that transmits the focused beam of a radio
signal from one ground-based microwave transmission antenna to another. Microwaves are the
electromagnetic waves having the frequency in the range from 1GHz to 1000 GHz. Microwaves
are unidirectional as the sending and receiving antenna is to be aligned, i.e., the waves sent by the
sending antenna are narrowly focused. In this case, antennas are mounted on the towers to send a
beam to another antenna which is km away. It works on the line-of-sight transmission, i.e., the
antennas mounted on the towers are the direct sight of each other.
Satellite Microwave Communication
A satellite is a physical object that revolves around the earth at a known height. Satellite
communication is more reliable nowadays as it offers more flexibility than cable and fiber optic
systems. We can communicate with any point on the globe by using satellite communication.
Infrared
An infrared transmission is a wireless technology used for communication over short ranges. The
frequency of the infrared in the range from 300 GHz to 400 THz. It is used for short-range
communication such as data transfer between two cell phones, TV remote operation, data transfer
between a computer and cell phone resides in the same closed area
Transmission modes
The way in which data is transmitted from one device to another device is known as transmission
mode. The transmission mode is also known as the communication mode. Each communication
channel has a direction associated with it, and transmission media provide the direction. The
transmission mode is defined in the physical layer. The Transmission mode is divided into three
categories:
Simplex mode
Half-duplex mode
Full-duplex mode
Simplex mode
In Simplex mode, the communication is unidirectional, i.e., the data flow in one direction. A
device can only send the data but cannot receive it or it can receive the data but cannot send the
data.
Page 9 of 248
Half-Duplex mode
n a Half-duplex channel, direction can be reversed, i.e., the station can transmit and receive the
data as well. Messages flow in both the directions, but not at the same time. The entire bandwidth
of the communication channel is utilized in one direction at a time.
Full-duplex mode
In Full duplex mode, the communication is bi-directional, i.e., the data flow in both the directions.
Both the stations can send and receive the message simultaneously. Full-duplex mode has two
simplex channels. One channel has traffic moving in one direction, and another channel has traffic
flowing in the opposite direction. The Full-duplex mode is the fastest mode of communication
between devices. The most common example of the full-duplex mode is a telephone network.
Multiplexing
Multiplexing is a technique used to combine and send the multiple data streams over a single
medium. The process of combining the data streams is known as multiplexing and hardware used
for multiplexing is known as a multiplexer. Multiplexing is achieved by using a device called
Multiplexer (MUX) that combines n input lines to generate a single output line. Multiplexing
follows many-to-one, i.e., n input lines and one output line. Demultiplexing is achieved by using
a device called Demultiplexer (DEMUX) available at the receiving end. DEMUX separates a
signal into its component signals (one input and n outputs). Therefore, we can say that
demultiplexing follows the one-to-many approach. The following lists are some of the reasons
why we are using multiplexing techniques
The transmission medium is used to send the signal from sender to receiver. The medium
can only have one signal at a time.
If there are multiple signals to share one medium, then the medium must be divided in
such a way that each signal is given some portion of the available bandwidth. For
example: If there are 10 signals and bandwidth of medium is100 units, then the 10 unit is
shared by each signal.
When multiple signals share the common medium, there is a possibility of collision.
Multiplexing concept is used to avoid such collision.
Transmission services are very expensive.
Multiplexing Techniques
Multiplexing techniques can be classified as indicating in the diagram
Page 10 of 248
Frequency-division Multiplexing (FDM)
Page 11 of 248
Wavelength Division Multiplexing (WDM)
Wavelength Division Multiplexing is same as FDM except that the optical signals are transmitted
through the fiber optic cable.
WDM is used on fiber optics to increase the capacity of a single fiber.
It is used to utilize the high data rate capability of fiber optic cable.
It is an analog multiplexing technique.
Optical signals from different source are combined to form a wider band of light with the
help of multiplexer.
At the receiving end, demultiplexer separates the signals to transmit them to their
respective destinations.
Time Division Multiplexing
It is a digital technique. In Time Division Multiplexing technique, all signals operate at the same
frequency with different time. In Time Division Multiplexing technique, the total time available
in the channel is distributed among different users. Therefore, each user is allocated with different
time interval known as a Time slot at which data is to be transmitted by the sender. A user takes
control of the channel for a fixed amount of time. In Time Division Multiplexing technique, data
is not transmitted simultaneously rather the data is transmitted one-by-one.
There are two types of TDM:
Synchronous TDM
Asynchronous TDM
Synchronous TDM
The capacity of the channel is not fully utilized as the empty slots are also transmitted which is
having no data.
Asynchronous TDM
Computer Network
Computer Network is a group of computers connected with each other through communication
links so that various devices can interact with each other through a network. The aim of the
computer network is the sharing of resources among various devices. In the case of computer
network technology, there are several types of networks that vary from simple to complex level.
Page 13 of 248
Application of Computer Network
Resource sharing: Resource sharing is the sharing of resources such as programs, printers, and
data among the users on the network without the requirement of the physical location of the
resource and user.
Communication medium: Computer network behaves as a communication medium among the
users. For example, a company contains more than one computer has an email system which the
employees use for daily communication.
E-commerce: Computer network is also important in businesses. We can do the business over the
internet. For example, amazon.com is doing their business over the internet, i.e., they are doing
their business over the internet.
Computer Network Architecture
Computer Network Architecture is defined as the physical and logical design of the software,
hardware, protocols, and media of the transmission of data. Simply we can say that how
computers are organized and how tasks are allocated to the computer.
The two types of network architectures are used:
Peer-To-Peer network
Client/Server network
Peer-To-Peer network
Peer-To-Peer network is a network in which all the computers are linked together with equal
privilege and responsibilities for processing the data. Peer-To-Peer network is useful for small
environments, usually up to 10 computers. Peer-To-Peer network has no dedicated server. Special
permissions are assigned to each computer for sharing the resources, but this can lead to a
problem if the computer with the resource is down.
Page 14 of 248
If one computer stops working but, other computers will not stop working.
It is easy to set up and maintain as each computer manages itself.
Client/Server network is a network model designed for the end users called clients, to access the
resources such as songs, video, etc. from a central computer known as Server. The central
controller is known as a server while all other computers in the network are called clients. A
server performs all the major operations such as security and network management.
A server is responsible for managing all the resources such as files, directories, printer, etc. All
the clients communicate with each other through a server. For example, if client1 wants to send
some data to client 2, then it first sends the request to the server for the permission. The server
sends the response to the client 1 to initiate its communication with the client 2.
Page 16 of 248
A metropolitan area network is a network that covers a larger geographic area by
interconnecting a different LAN to form a larger network.
Government agencies use MAN to connect to the citizens and private industries.
In MAN, various LANs are connected to each other through a telephone exchange line.
The most widely used protocols in MAN are RS-232, Frame Relay, ATM, ISDN, OC-3,
ADSL, etc.
It has a higher range than Local Area Network (LAN).
Uses Of Metropolitan Area Network:
MAN is used in communication between the banks in a city.
It can be used in an Airline Reservation.
It can be used in a college within a city.
It can also be used for communication in the military.
WAN (Wide Area Network)
A Wide Area Network is a network that extends over a large geographical area such as
states or countries.
A Wide Area Network is quite bigger network than the LAN.
A Wide Area Network is not limited to a single location, but it spans over a large
geographical area through a telephone line, fibre optic cable or satellite links.
The internet is one of the biggest WAN in the world.
A Wide Area Network is widely used in the field of Business, government, and education.
Internetworking
Topology defines the structure of the network and shows how all the components are
interconnected to each other.
There are two types of topologies:
1. physical and
2. logical topology.
Physical topology: is the geometric representation of all the nodes in a network. There are six
types of network topology which are Bus Topology, Ring Topology, Tree Topology, Star
Topology, Mesh Topology, and Hybrid Topology.
Logical Topology: is a concept in networking that defines the architecture of the communication
mechanism for all nodes in a network.
Computer Network Models
A communication subsystem is a complex piece of Hardware and software. Early attempts for
implementing the software for such subsystems were based on a single, complex, unstructured
program with many interacting components. The resultant software was very difficult to test and
modify. To overcome such problem, a layered approach was developed, that is the networking
concept is divided into several layers, and each layer is assigned a particular task. Therefore, we
can say that networking tasks depend upon the layers.
Network Layered Architecture
The main aim of the layered architecture is to divide the design into small pieces. Each lower
layer adds its services to the higher layer to provide a full set of services to manage
communications and run the applications. It provides modularity and clear interfaces, i.e.,
provides interaction between subsystems. It ensures the independence between layers by
providing the services from lower to higher layer without defining how the services are
Page 18 of 248
implemented. Therefore, any modification in a layer will not affect the other layers. There are two
important network architectures: the OSI reference model and the TCP/IP reference model.
The basic elements of layered architecture are services, protocols, and interfaces.
Service: It is a set of actions that a layer provides to the higher layer.
Protocol: It defines a set of rules that a layer uses to exchange the information with peer
entity. These rules mainly concern about both the contents and order of the messages used.
Interface: It is a way through which the message is transferred from one layer to another
layer.
In a layer n architecture, layer n on one machine will have a communication with the layer n on
another machine and the rules used in a conversation are known as a layer-n protocol.
OSI Model
OSI stands for Open System Interconnection is a reference model that describes how information
from a software application in one computer moves through a physical medium to the software
application in another computer. OSI consists of seven layers, and each layer performs a
particular network function. OSI model was developed by the International Organization for
Standardization (ISO) in 1984, and it is now considered as an architectural model for the inter-
computer communications. OSI model divides the whole task into seven smaller and manageable
tasks. Each layer is assigned a particular task.
Characteristics of OSI Model
The OSI model is divided into two layers: upper layers and lower layers. The upper layer of the
OSI model mainly deals with the application related issues, and they are implemented only in the
software. The application layer is closest to the end user. Both the end user and the application
layer interact with the software applications. An upper layer refers to the layer just above another
layer. The lower layer of the OSI model deals with the data transport issues. The data link layer
and the physical layer are implemented in hardware and software. The physical layer is the lowest
layer of the OSI model and is closest to the physical medium. The physical layer is mainly
responsible for placing the information on the physical medium.
Seven Layers of OSI Model
There are the seven OSI layers. Each layer has different functions.
Page 19 of 248
Physical layer
The main functionality of the physical layer is to transmit the individual bits from one node to
another node. It is the lowest layer of the OSI model. It establishes, maintains and deactivates the
physical connection. It specifies the mechanical, electrical and procedural network interface
specifications.
Functions of a Physical layer:
Line Configuration: It defines the way how two or more devices can be connected
physically.
Data Transmission: It defines the transmission mode whether it is simplex, half-duplex
or full-duplex mode between the two devices on the network.
Topology: It defines the way how network devices are arranged.
Signals: It determines the type of the signal used for transmitting the information.
Data-Link Layer
This layer is responsible for the error-free transfer of data frames. It defines the format of the data
on the network. This layer provides a reliable and efficient communication between two or more
devices. It is mainly responsible for the unique identification of each device that resides on a local
network.
Page 20 of 248
Data link layer contains two sub-layers:
Logical Link Control Layer
It is responsible for transferring the packets to the Network layer of the receiver that is
receiving. It identifies the address of the network layer protocol from the header. It also
provides flow control.
Media Access Control Layer
A Media access control layer is a link between the Logical Link Control layer and the
network's physical layer. It is used for transferring the packets over the network.
Functions of the Data-link layer
Framing: The data link layer translates the physical's raw bit stream into packets known
as Frames. The Data link layer adds the header and trailer to the frame. The header which
is added to the frame contains the hardware destination and source address.
Physical Addressing: The Data link layer adds a header to the frame that contains a
destination address. The frame is transmitted to the destination address mentioned in the
header.
Flow Control: Flow control is the main functionality of the Data-link layer. It is the
technique through which the constant data rate is maintained on both the sides so that no
data get corrupted. It ensures that the transmitting station such as a server with higher
processing speed does not exceed the receiving station, with lower processing speed.
Error Control: Error control is achieved by adding a calculated value CRC (Cyclic
Redundancy Check) that is placed to the Data link layer's trailer which is added to the
message frame before it is sent to the physical layer. If any error seems to occurr, then the
receiver sends the acknowledgment for the retransmission of the corrupted frames.
Access Control: When two or more devices are connected to the same communication
channel, then the data link layer protocols are used to determine which device has control
over the link at a given time.
Network Layer
It is a layer 3 that manages device logical addressing, tracks the location of devices on the
network.
Page 21 of 248
It determines the best path to move data from source to the destination based on the network
conditions, the priority of service, and other factors. The network layer is responsible for routing
and forwarding the packets. Routers are the layer 3 devices, they are specified in this layer and
used to provide the routing services within an internetwork. The protocols used to route the
network traffic are known as Network layer protocols.
Functions of Network Layer:
Internetworking: An internetworking is the main responsibility of the network layer. It
provides a logical connection between different devices.
Addressing: A Network layer adds the source and destination address to the header of the
frame. Addressing is used to identify the device on the internet.
Routing: Routing is the major component of the network layer, and it determines the best
optimal path out of the multiple paths from source to the destination.
Packetizing: A Network Layer receives the packets from the upper layer and converts
them into packets. This process is known as Packetizing.
Transport Layer
The Transport layer is a Layer 4 ensures that messages are transmitted in the order in which they
are sent and there is no duplication of data. The main responsibility of the transport layer is to
transfer the data completely. It receives the data from the upper layer and converts them into
smaller units known as segments. This layer can be termed as an end-to-end layer as it provides
a point-to-point connection between source and destination to deliver the data reliably.
The two protocols used in this layer are:
Transmission Control Protocol
It is a standard protocol that allows the systems to communicate over the internet.
It establishes and maintains a connection between hosts. When data is sent over the TCP
connection, then the TCP protocol divides the data into smaller units known as segments. Each
segment travels over the internet using multiple routes, and they arrive in different orders at the
destination. The transmission control protocol reorders the packets in the correct order at the
receiving end.
User Datagram Protocol
User Datagram Protocol is a transport layer protocol.
It is an unreliable transport protocol as in this case receiver does not send any acknowledgment
when the packet is received, the sender does not wait for any acknowledgment. Therefore, this
makes a protocol unreliable.
Page 22 of 248
Functions of Transport Layer:
Service-point addressing: Computers run several programs simultaneously due to this reason,
the transmission of data from source to the destination not only from one computer to another
computer but also from one process to another process. The transport layer adds the header that
contains the address known as a service-point address or port address. The responsibility of the
network layer is to transmit the data from one computer to another computer and the
responsibility of the transport layer is to transmit the message to the correct process.
Segmentation and reassembly: When the transport layer receives the message from the upper
layer, it divides the message into multiple segments, and each segment is assigned with a
sequence number that uniquely identifies each segment. When the message has arrived at the
destination, then the transport layer reassembles the message based on their sequence numbers.
Connection control: Transport layer provides two services Connection-oriented service and
connectionless service. A connectionless service treats each segment as an individual packet, and
they all travel in different routes to reach the destination. A connection-oriented service makes a
connection with the transport layer at the destination machine before delivering the packets. In
connection-oriented service, all the packets travel in the single route.
Flow control: The transport layer also responsible for flow control but it is performed end-to-end
rather than across a single link.
Error control: The transport layer is also responsible for Error control. Error control is
performed end-to-end rather than across the single link. The sender transport layer ensures that
message reach at the destination without any error.
Session Layer
It is a layer 3 in the OSI model. The Session layer is used to establish, maintain and synchronizes
the interaction between communicating devices.
Functions of Session layer:
Dialog control: Session layer acts as a dialog controller that creates a dialog between two
processes or we can say that it allows the communication between two processes which can be
either half-duplex or full-duplex.
Synchronization: Session layer adds some checkpoints when transmitting the data in a sequence.
If some error occurs in the middle of the transmission of data, then the transmission will take
place again from the checkpoint. This process is known as Synchronization and recovery.
Presentation Layer
Page 23 of 248
A Presentation layer is mainly concerned with the syntax and semantics of the information
exchanged between the two systems. It acts as a data translator for a network. This layer is a part
of the operating system that converts the data from one presentation format to another format. The
Presentation layer is also known as the syntax layer.
Functions of Presentation layer:
Translation: The processes in two systems exchange the information in the form of
character strings, numbers and so on. Different computers use different encoding methods,
the presentation layer handles the interoperability between the different encoding methods.
It converts the data from sender-dependent format into a common format and changes the
common format into receiver-dependent format at the receiving end.
Encryption: Encryption is needed to maintain privacy. Encryption is a process of
converting the sender-transmitted information into another form and sends the resulting
message over the network.
Compression: Data compression is a process of compressing the data, i.e., it reduces the
number of bits to be transmitted. Data compression is very important in multimedia such
as text, audio, video.
Application Layer
An application layer serves as a window for users and application processes to access network
service. It handles issues such as network transparency, resource allocation, etc. An application
layer is not an application, but it performs the application layer functions.
This layer provides the network services to the end-users.
Functions of Application layer:
File transfer, access, and management (FTAM): An application layer allows a user to access
the files in a remote computer, to retrieve the files from a computer and to manage the files in a
remote computer.
Mail services: An application layer provides the facility for email forwarding and storage.
Directory services: An application provides the distributed database sources and is used to
provide that global information about various objects.
TCP/IP model
The TCP/IP model was developed prior to the OSI model. It is a working model. It is not exactly
similar to the OSI model. The TCP/IP model consists of five layers: the application layer,
transport layer, network layer, data link layer and physical layer. The first four layers provide
Page 24 of 248
physical standards, network interface, internetworking, and transport functions that correspond to
the first four layers of the OSI model and these four layers are represented in TCP/IP model by a
single layer called the application layer. TCP/IP is a hierarchical protocol made up of interactive
modules, and each of them provides specific functionality.
Network Addressing and Routing
Layer-3 in the OSI model is called Network layer, in the TCP/IP model is called Internet.
Network layer manages options pertaining to host and network addressing, managing sub-
networks, and internetworking. Network layer takes the responsibility for routing packets from
source to destination within or outside a subnet. Two different subnets may have different
addressing schemes or non-compatible addressing types. Same with protocols, two different
subnets may be operating on different protocols which are not compatible with each other.
Network layer has the responsibility to route the packets from source to destination, mapping
different addressing schemes and protocols. Internet protocol is widely respected and deployed
Network Layer protocol which helps to communicate end to end devices over the internet. It
comes in two flavors. IPv4 which has ruled the world for decades but now is running out of
address space. IPv6 is created to replace IPv4 and hopefully mitigates limitations of IPv4 too.
Network Addressing
Layer 3 network addressing is one of the major tasks of Network Layer. Network Addresses are
always logical i.e. these are software based addresses which can be changed by appropriate
configurations. A network address always points to host / node / server or it can represent a whole
network. Network address is always configured on network interface card and is generally
mapped by system with the MAC address (hardware address or layer-2 address) of the machine
for Layer-2 communication. IP addressing provides mechanism to differentiate between hosts and
network. Because IP addresses are assigned in hierarchical manner, a host always resides under a
specific network. The host which needs to communicate outside its subnet, needs to know
destination network address, where the packet/data is to be sent.
Hosts in different subnet need a mechanism to locate each other. This task can be done by DNS.
DNS is a server which provides Layer-3 address of remote host mapped with its domain name or
FQDN. When a host acquires the Layer-3 Address (IP Address) of the remote host, it forwards all
its packet to its gateway. A gateway is a router equipped with all the information which leads to
route packets to the destination host. Routers take help of routing tables, which has the following
information:
Page 25 of 248
Address of destination network
Method to reach the network
Routers upon receiving a forwarding request, forwards packet to its next hop (adjacent router)
towards the destination. The next router on the path follows the same thing and eventually the
data packet reaches its destination.
Network address can be of one of the following:
Unicast (destined to one host)
Multicast (destined to group)
Broadcast (destined to all)
Anycast (destined to nearest one)
A router never forwards broadcast traffic by default. Multicast traffic uses special treatment as it
is most a video stream or audio with highest priority. Anycast is just similar to unicast, except
that the packets are delivered to the nearest destination when multiple destinations are available.
Network Routing
When a device has multiple paths to reach a destination, it always selects one path by preferring it
over others. This selection process is termed as Routing. Routing is done by special network
devices called routers or it can be done by means of software processes. The software-based routers
have limited functionality and limited scope.
A router is always configured with some default route. A default route tells the router where to
forward a packet if there is no route found for specific destination. In case there are multiple
paths existing to reach the same destination, router can make decision based on the following
information:
Hop Count
Bandwidth
Metric
Prefix-length
Delay
Routes can be statically configured or dynamically learnt. One route can be configured to be
preferred over others.
Unicast routing
Most of the traffic on the internet and intranets known as unicast data or unicast traffic is sent
with specified destination. Routing unicast data over the internet is called unicast routing.
Page 26 of 248
Broadcast routing
By default, the broadcast packets are not routed and forwarded by the routers on any network.
Routers create broadcast domains. But it can be configured to forward broadcasts in some special
cases. A broadcast message is destined to all network devices.
Multicast Routing
Multicast routing is special case of broadcast routing with significance difference and challenges.
In broadcast routing, packets are sent to all nodes even if they do not want it. But in Multicast
routing, the data is sent to only nodes which wants to receive the packets.
Anycast Routing
Anycast packet forwarding is a mechanism where multiple hosts can have same logical address.
When a packet destined to this logical address is received, it is sent to the host which is nearest in
routing topology. Anycast routing is done with help of DNS server. Whenever an Anycast packet
is received it is enquired with DNS to where to send it. DNS provides the IP address which is the
nearest IP configured on it.
Routing Protocols
Distance Vector is simple routing protocol which takes routing decision on the number of hops
between source and destination. A route with a smaller number of hops is considered as the best
route. Every router advertises its set best routes to other routers. Ultimately, all routers build up
their network topology based on the advertisements of their peer routers, for example, Routing
Information Protocol (RIP).
Link State Routing Protocol
Link State protocol is slightly complicated protocol than Distance Vector. It takes into account the
states of links of all the routers in a network. This technique helps routes build a common graph of
the entire network. All routers then calculate their best path for routing purposes, for example,
Open Shortest Path First (OSPF) and Intermediate System to Intermediate System (ISIS).
Routing Algorithms
Time to Live (TTL) can be used to avoid infinite looping of packets. There exists another
approach for flooding, which is called Selective Flooding to reduce the overhead on the network.
In this method, the router does not flood out on all the interfaces, but selective ones.
Shortest Path
Routing decision in networks, are mostly taken on the basis of cost between source and
destination. Hop count plays major role here. Shortest path is a technique which uses various
algorithms to decide a path with minimum number of hops. Common shortest path algorithms
are:
Dijkstra's algorithm
Bellman Ford algorithm
Floyd Warshall algorithm
Internetworking
In real world scenario, networks under same administration are generally scattered
geographically. There may exist requirement of connecting two different networks of same kind as
well as of different kinds. Routing between two networks is called internetworking. Networks can
be considered different based on various parameters such as, Protocol, topology, Layer-2 network
and addressing scheme. In internetworking, routers have knowledge of each other‘s address and
addresses beyond them. They can be statically configured go on different network or they can
learn by using internetworking routing protocol.
Routing protocols which are used within an organization or administration are called Interior
Gateway Protocols or IGP. RIP, OSPF are examples of IGP Routing between different
organizations or administrations may have Exterior Gateway Protocol, and there is only one
EGP i.e., Border Gateway Protocol.
Network Layer Protocols
Every computer in a network has an IP address by which it can be uniquely identified and
addressed. An IP address is Layer-3 (Network Layer) logical address. This address may change
every time a computer restarts. A computer can have one IP at one instance of time and another IP
at some different time.
Address Resolution Protocol (ARP)
Page 28 of 248
While communicating, a host needs Layer-2 (MAC) address of the destination machine which
belongs to the same broadcast domain or network. A MAC address is physically burnt into the
Network Interface Card (NIC) of a machine and it never changes. On the other hand, IP address
on the public domain is rarely changed. If the NIC is changed in case of some fault, the MAC
address also changes. This way, for Layer-2 communication to take place, a mapping between the
two is required.
To know the MAC address of remote host on a broadcast domain, a computer wishing to initiate
communication sends out an ARP broadcast message asking, ―Who has this IP address?‖ Because
it is a broadcast, all hosts on the network segment (broadcast domain) receive this packet and
process it. ARP packet contains the IP address of destination host, the sending host wishes to talk
to. When a host receives an ARP packet destined to it, it replies back with its own MAC address.
Once the host gets destination MAC address, it can communicate with remote host using Layer-2
link protocol. This MAC to IP mapping is saved into ARP cache of both sending and receiving
hosts. Next time, if they require to communicate, they can directly refer to their respective ARP
cache. Reverse ARP is a mechanism where host knows the MAC address of remote host but
requires to know IP address to communicate.
Internet Control Message Protocol (ICMP)
ICMP is network diagnostic and error reporting protocol. ICMP belongs to IP protocol suite and
uses IP as carrier protocol. After constructing ICMP packet, it is encapsulated in IP packet.
Because IP itself is a best-effort non-reliable protocol, so is ICMP.
Any feedback about network is sent back to the originating host. If some error in the network
occurs, it is reported by means of ICMP. ICMP contains dozens of diagnostic and error reporting
messages.
ICMP-echo and ICMP-echo-reply are the most commonly used ICMP messages to check the
reachability of end-to-end hosts. When a host receives an ICMP-echo request, it is bound to send
back an ICMP-echo-reply. If there is any problem in the transit network, the ICMP will report
that problem.
Internet Protocol Version 4 (IPv4)
IPv4 is 32-bit addressing scheme used as TCP/IP host addressing mechanism. IP addressing
enables every host on the TCP/IP network to be uniquely identifiable. IPv4 provides hierarchical
addressing scheme which enables it to divide the network into sub-networks, each with well-
Page 29 of 248
defined number of hosts. IP addresses are divided into many categories:
Class A: It uses first octet for network addresses and last three octets for host addressing.
Class B: It uses first two octets for network addresses and last two for host addressing.
Class C: It uses first three octets for network addresses and last one for host addressing.
Class D: It provides flat IP addressing scheme in contrast to hierarchical structure
for above three.
Class E: It is used as experimental.
IPv4 also has well-defined address spaces to be used as private addresses (not routable on
internet), and public addresses (provided by ISPs and are routable on internet). Though IP is not
reliable one; it provides ‗Best-Effort-Delivery‘ mechanism.
Internet Protocol Version 6 (IPv6)
Exhaustion of IPv4 addresses gave birth to a next generation Internet Protocol version 6. IPv6
addresses its nodes with 128-bit wide address providing plenty of address space for future to be
used on entire planet or beyond. IPv6 has introduced Anycast addressing but has removed the
concept of broadcasting. IPv6 enables devices to self-acquire an IPv6 address and communicate
within that subnet. This auto-configuration removes the dependability of Dynamic Host
Configuration Protocol (DHCP) servers. This way, even if the DHCP server on that subnet is
down, the hosts can communicate with each other.
IPv6 provides new feature of IPv6 mobility. Mobile IPv6-equipped machines can roam around
without the need of changing their IP addresses. IPv6 is still in transition phase and is expected to
replace IPv4 completely in coming years. At present, there are few networks which are running on
IPv6. There are some transition mechanisms available for IPv6-enabled networks to speak and
roam around different networks easily on IPv4. These are:
Tunneling
NAT-PT
Transport Layer Services
Transport layer offers end-to-end connection between two processes on remote hosts. Transport
breaks data received into smaller size segments, numbers each byte, and hands over to lower layer
for delivery.
Page 30 of 248
Functions
This Layer is the first one which breaks the information data, supplied by Application
layer in to smaller units called segments. It numbers every byte in the segment and
maintains their accounting.
This layer ensures that data must be received in the same sequence in which it was sent.
This layer provides end-to-end delivery of data between hosts which may or may not
belong to the same subnet.
All server processes intend to communicate over the network are equipped with well-
known Transport Service Access Points (TSAPs) also known as port numbers.
End-to-End Communication
A process on one host identifies its peer host on remote network by means of TSAPs, also known
as Port numbers. TSAPs are very well defined and a process which is trying to communicate with
its peer knows this in advance.
For example, when a DHCP client wants to communicate with remote DHCP server, it always
requests on port number 67. When a DNS client wants to communicate with remote DNS server, it
always requests on port number 53 (UDP).
The transmission Control Protocol (TCP) is one of the most important protocols of Internet
Protocols suite. It is most widely used protocol for data transmission in communication network
Page 31 of 248
such as internet.
Features
TCP is reliable protocol. That is, the receiver always sends either positive or negative
acknowledgement about the data packet to the sender, so that the sender always has bright
clue about whether the data packet is reached the destination or it needs to resend it.
TCP ensures that the data reaches intended destination in the same order it was sent.
TCP is connection oriented. TCP requires that connection between two remote points be
established before sending actual data.
TCP provides full duplex server, i.e. it can perform roles of both receiver and sender.
Header
Source Port (16-bits): It identifies source port of the application process on the sending
device.
Destination Port (16-bits): It identifies destination port of the application process on the
receiving device.
Sequence Number (32-bits): Sequence number of data bytes of a segment in a session.
Acknowledgement Number (32-bits): When ACK flag is set, this number contains the
next sequence number of the data byte expected and works as acknowledgement of the
Page 32 of 248
previous data received.
Data Offset (4-bits): This field implies both, the size of TCP header (32-bit words) and
the offset of data in current packet in the whole TCP segment.
Reserved (3-bits): Reserved for future use and all are set zero by default.
NS: Nonce Sum bit is used by Explicit Congestion Notification signaling process.
CWR: When a host receives packet with ECE bit set, it sets Congestion Windows
Reduced to acknowledge that ECE received.
ECE: It has two meanings:
If SYN bit is clear to 0, then ECE means that the IP packet has its CE (congestion
experience) bit set.
If SYN bit is set to 1, ECE means that the device is ECT capable.
URG: It indicates that Urgent Pointer field has significant data and should be processed.
ACK: It indicates that Acknowledgement field has significance. If ACK is cleared to 0,
it indicates that packet does not contain any acknowledgement
PSH: When set, it is a request to the receiving station to PUSH data as soon as it comes
to the receiving application without buffering it.
RST: Reset flag has the following features:
It is used to refuse an incoming connection.
It is used to reject a segment.
It is used to restart a connection.
SYN: This flag is used to set up a connection between hosts.
FIN: This flag is used to release a connection and no more data is exchanged thereafter.
Because packets with SYN and FIN flags have sequence numbers, they are processed in
correct order.
Windows Size: This field is used for flow control between two stations and indicates the
amount of buffer (in bytes) the receiver has allocated for a segment, i.e. how much data is
the receiver expecting.
Checksum: This field contains the checksum of Header, Data, and Pseudo Headers.
Urgent Pointer: It points to the urgent data byte if URG flag is set to 1.
Options: It facilitates additional options which are not covered by the regular header.
Option field is always described in 32-bit words. If this field contains data less than 32-bit,
padding is used to cover the remaining bits to reach 32- bit boundary.
Addressing
Page 33 of 248
TCP communication between two remote hosts is done by means of port numbers
(TSAPs). Ports numbers can range from 0 – 65535 which are divided as:
System Ports (0 – 1023)
User Ports (1024 – 49151)
Private/Dynamic Ports (49152 – 65535)
Connection Management
TCP communication works in Server/Client model. The client initiates the connection and
the server either accepts or rejects it. Three-way handshaking is used for connection
management.
Establishment
Client initiates the connection and sends the segment with a Sequence number. Server
acknowledges it back with its own Sequence number and ACK of client‘s segment which is one
more than client‘s Sequence number. Client after receiving ACK of its segment sends an
acknowledgement of Server‘s response.
Release
Either of server and client can send TCP segment with FIN flag set to 1. When the receiving end
responds it back by Acknowledging FIN, that direction of TCP communication is closed and
connection is released.
Bandwidth Management
TCP uses the concept of window size to accommodate the need of Bandwidth management.
Window size tells the sender at the remote end the number of data byte segments the receiver at
this end can receive. TCP uses slow start phase by using window size 1 and increases the window
Page 34 of 248
size exponentially after each successful communication. For example, the client uses windows size
2 and sends 2 bytes of data. When the acknowledgement of this segment received, the windows
size is doubled to 4 and next the segment sent will be 4 data bytes long. When the
acknowledgement of 4-byte data segment is received, the client sets windows size to 8 and so on.
If an acknowledgement is missed, i.e. data lost in transit network or it received NACK, then the
window size is reduced to half and slow start phase starts again.
Error Control and Flow Control
TCP uses port numbers to know what application process it needs to handover the data segment.
Along with that, it uses sequence numbers to synchronize itself with the remote host. All data
segments are sent and received with sequence numbers. The Sender knows which last data
segment was received by the Receiver when it gets ACK. The Receiver knows about the last
segment sent by the Sender by referring to the sequence number of recently received packet.
If the sequence number of a segment recently received does not match with the sequence number
the receiver was expecting, then it is discarded and NACK is sent back. If two segments arrive
with the same sequence number, the TCP timestamp value is compared to make a decision.
Congestion Control
When large amount of data is fed to system which is not capable of handling it, congestion
occurs. TCP controls congestion by means of Window mechanism. TCP sets a window size
telling the other end how much data segment to send. TCP may use three algorithms for
congestion control:
Slow Start
Timeout React
Timer Management
TCP uses different types of timers to control and management various tasks:
Keep-alive timer:
This timer is used to check the integrity and validity of a connection.
When keep-alive time expires, the host sends a probe to check if the connection still exists.
Retransmission timer:
Page 35 of 248
This timer maintains stateful session of data sent.
If the acknowledgement of sent data does not receive within the Retransmission time, the
data segment is sent again.
Persist timer:
To resume the session a host needs to send Window Size with some larger value.
If this segment never reaches the other end, both ends may wait for each other for infinite
time.
When the Persist timer expires, the host resends its window size to let the other end know.
Persist Timer helps avoid deadlocks in communication.
Timed-Wait:
After releasing a connection, either of the hosts waits for a Timed-Wait time to terminate
the connection completely.
This is in order to make sure that the other end has received the acknowledgement of its
connection termination request.
Timed-out can be a maximum of 240 seconds (4 minutes).
Crash Recovery
TCP is very reliable protocol. It provides sequence number to each of byte sent in segment. It
provides the feedback mechanism i.e. when a host receives a packet, it is bound to ACK that
packet having the next sequence number expected (if it is not the last segment).
When a TCP Server crashes mid-way communication and re-starts its process, it sends TPDU
broadcast to all its hosts. The hosts can then send the last data segment which was never
unacknowledged and carry onwards.
User Datagram Protocol (UDP)
The User Datagram Protocol (UDP) is simplest Transport Layer communication protocol
available of the TCP/IP protocol suite. It involves minimum amount of communication
mechanism. UDP is said to be an unreliable transport protocol but it uses IP services which
provides best effort delivery mechanism. In UDP, the receiver does not generate an
acknowledgement of packet received and in turn, the sender does not wait for any
acknowledgement of packet sent. This shortcoming makes this protocol unreliable as well as
Page 36 of 248
easier on processing.
Requirement of UDP
A question may arise, why do we need an unreliable protocol to transport the data? We deploy
UDP where the acknowledgement packets share significant amount of bandwidth along with the
actual data. For example, in case of video streaming, thousands of packets are forwarded towards
its users. Acknowledging all the packets is troublesome and may contain huge amount of
bandwidth wastage. The best delivery mechanism of underlying IP protocol ensures best efforts
to deliver its packets, but even if some packets in video streaming get lost, the impact is not
calamitous and can be ignored easily. Loss of few packets in video and voice traffic sometimes
goes unnoticed.
Features
UDP is used when acknowledgement of data does not hold any significance.
UDP is stateless.
UDP is suitable protocol for streaming applications such as VoIP, multimedia streaming.
UDP Header
Page 37 of 248
3. Length: Length field specifies the entire length of UDP packet (including header). It is
16-bits field and minimum value is 8-byte, i.e., the size of UDP header itself.
4. Checksum: This field stores the checksum value generated by the sender before
sending. IPv4 has this field as optional so when checksum field does not contain any
value, it is made 0 and all its bits are set to zero.
UDP application
Computer systems and computerized systems help human beings to work efficiently and explore
the unthinkable. When these devices are connected together to form a network, the capabilities are
enhanced multiple times. Some basic services computer network can offer are:
Directory Services
These services are mapping between name and its value, which can be variable value or fixed. This
software system helps to store the information, organize it, and provides various means of
accessing it.
Accounting
In an organization, a number of users have their user names and passwords mapped to them.
Directory Services provide means of storing this information in cryptic form and make available
when requested.
Authentication and Authorization
User credentials are checked to authenticate a user at the time of login and/or periodically. User
accounts can be set into hierarchical structure and their access to resources can be controlled
using authorization schemes.
Domain Name Services
DNS is widely used and one of the essential services on which internet works. This system maps
IP addresses to domain names, which are easier to remember and recall than IP addresses. Because
Page 38 of 248
network operates with the help of IP addresses and humans tend to remember website names, the
DNS provides website‘s IP address which is mapped to its name from the back-end on the request
of a website name from the user.
File Services
File services include sharing and transferring files over the network.
File Sharing
One of the reasons which gave birth to networking was file sharing. File sharing enables its users
to share their data with other users. User can upload the file to a specific server, which is
accessible by all intended users. As an alternative, user can make its file shared on its own
computer and provides access to intended users.
File Transfer
This is an activity to copy or move file from one computer to another computer or to multiple
computers, with help of underlying network. Network enables its user to locate other users in the
network and transfers files.
Communication Services
Email
Electronic mail is a communication method and something a computer user cannot work without.
This is the basis of today‘s internet features. Email system has one or more email servers. All its
users are provided with unique IDs. When a user sends email to other user, it is actually
transferred between users with help of email server.
Social Networking
Recent technologies have made technical life social. The computer savvy peoples, can find other
known peoples or friends, can connect with them, and can share thoughts, pictures, and videos.
Internet Chat
Internet chat provides instant text transfer services between two hosts. Two or more people can
communicate with each other using text-based Internet Relay Chat services. These days, voice
chat and video chat are very common.
Discussion Boards
Discussion boards provide a mechanism to connect multiple peoples with same interests. It
enables the users to put queries, questions, suggestions etc. which can be seen by all other users.
Other may respond as well.
Remote Access
Page 39 of 248
This service enables user to access the data residing on the remote computer. This feature is
known as Remote desktop. This can be done via some remote device, e.g. mobile phone or home
computer.
Application Services
These are nothing but providing network-based services to the users such as web services,
database managing, and resource sharing.
Resource Sharing
To use resources efficiently and economically, network provides a mean to share them. This may
include Servers, Printers, and Storage Media etc.
Databases
This application service is one of the most important services. It stores data and information,
processes it, and enables the users to retrieve it efficiently by using queries. Databases help
organizations to make decisions based on statistics.
Web Services
World Wide Web has become the synonym for internet. It is used to connect to the internet, and
access files and information services provided by the internet servers.
Application Layer applications
An application layer is the topmost layer in the TCP/IP model. It is responsible for handling high-
level protocols, issues of representation. This layer allows the user to interact with the application.
When one application layer protocol wants to communicate with another application layer, it
forwards its data to the transport layer. There is an ambiguity occurs in the application layer.
Every application cannot be placed inside the application layer except those who interact with the
communication system. For example: text editor cannot be considered in application layer while
web browser using HTTP protocol to interact with the network where HTTP protocol is an
application layer protocol.
Following are the main protocols used in the application layer:
HTTP: HTTP stands for Hypertext transfer protocol. This protocol allows us to access the data
over the World Wide Web. It transfers the data in the form of plain text, audio, video. It is known
as a Hypertext transfer protocol as it has the efficiency to use in a hypertext environment where
there are rapid jumps from one document to another.
SNMP: SNMP stands for Simple Network Management Protocol. It is a framework used for
managing the devices on the internet by using the TCP/IP protocol suite.
Page 40 of 248
SMTP: SMTP stands for Simple mail transfer protocol. The TCP/IP protocol that supports the e-
mail is known as a Simple mail transfer protocol. This protocol is used to send the data to another
e-mail address.
DNS: DNS stands for Domain Name System. An IP address is used to identify the connection of
a host to the internet uniquely. But people prefer to use the names instead of addresses. Therefore,
the system that maps the name to the address is known as Domain Name System.
TELNET: It is an abbreviation for Terminal Network. It establishes the connection between the
local computer and remote computer in such a way that the local terminal appears to be a terminal
at the remote system.
FTP: FTP stands for File Transfer Protocol. FTP is a standard internet protocol used for
transmitting the files from one computer to another computer.
System administration begins with a policy – a decision about what we want and what
should be, in relation to what we can afford.
The highest-level aim in system administration is to work towards a predictable system.
Predictability has limits. It is the basis of reliability, hence trust and therefore security.
Scalable systems are those that grow in accordance with policy; i.e. they continue to
function predictably, even as they increase in size.
Restriction of unnecessary privilege protects a system from accidental and malicious
damage, infection by viruses and prevents users from concealing their actions with false
identities. It is desirable to restrict users‘ privileges for the greater good of everyone on the
network.
System components
In system administration, the word system is used to refer both to the operating system of a
computer and often, collectively the set of all computers that cooperate in a network.
Network infrastructure
Humans: who use and run the fixed infrastructure, and cause most problems.
Host computers: computer devices that run software. These might be in a fixed location,
or mobile devices.
Network hardware: This covers a variety of specialized devices including the following
key components:
o Dedicated computing devices that direct traffic around the Internet.
o Routers talk at the IP address level, or ‗layer 3‘,1 simplistically speaking.
o Switches: fixed hardware devices that direct traffic around local area networks.
Switches talk at the level of Ethernet or ‗layer 2‘ protocols, in common parlance.
Page 42 of 248
o Cables: There are many types of cable that interconnect devices: fiber optic cables,
twisted pair cables, null-modem cables etc.
I. a technical layer of software for driving the hardware of the computer, like disk drives, the
keyboard and the screen; (ii) a filesystem which provides a way of organizing files
logically, and
II. a simple user interface which enables users to run their own programs and to manipulate
their files in a simple way. Of central importance to an operating system is a core software
system or kernel which is responsible for allocating and sharing the resources of the
system between several running programs or processes. It is supplemented by a number of
supporting services (paging, RPC, FTP, WWW etc.) which either assist the kernel or
extend its resource sharing to the network domain.
The operating system can be responsible for sharing the resources of a single computer, but
increasingly we are seeing distributed operating systems in which execution of programs and
sharing of resources happens without regard for hardware boundaries; or network operating
systems in which a central server adds functionality to relatively dumb workstations.
Sometimes programs which do not affect the job of sharing resources are called user
programs. In short, a computer system is composed of many subsystems, some of which are
software systems and some of which are hardware systems. The operating system runs
interactive programs for humans, services for local and distributed users and support programs
which work together to provide the infrastructure which enables machine resources to be
shared between many processes.
Most Unix-like
Operating systems support symmetric multi-threaded processing and all support simultaneous
logins by multiple users.
The purpose of a multi-user operating system is to allow multiple users to share the resources of a
single host. In order to do this, it is necessary to protect users from one another by giving them a
unique identity or user name and a private login area, i.e. by restricting their privilege. In short,
Page 43 of 248
we need to simulate a virtual workstation for each individual user, with private files and private
processes.
power to a large part of the world. As with all rapid commercial developments, the focus in
developing home operating systems was on immediate functionality, not on planning for the
future. The home computer revolution preceded the network revolution by a number of years and
Systems did not address security issues. Operating systems developed during this priod include
Windows, MacIntosh, DOS, Amiga-DOS. All of these systems are completely insecure: they
A fundamental prerequisite for security is the ability to restrict access to certain system resources.
The main reason why DOS, Windows 9x and the Macintosh are so susceptible to virus attacks is
because any user can change the operating system‘s files. Properly configured and bug-free
Unix/NT systems are theoretically immune to such attacks, if privilege is not abused, because
ordinary users do not have the privileges required to change system files.2 unfortunately the key
phrases properly configured and bug-free highlight the flaw in this dream. Shells or command
interpreters
Today it is common for operating systems to provide graphical window systems for all kinds of
tasks. These are often poorly suited to system administration because they only allow us to choose
between pre-programmed operations which the program designers foresaw when they wrote the
Page 44 of 248
program. Most operating systems provide an alternative command line user interface which has
some form of interpreted language, thus allowing users to express what they want with more
freedom and precision. Windows proprietary shells are rudimentary; UNIX shells are rich in
complexity and some of them are available for installation on Windows.
Shells can be used to write simple programs called scripts or batch files which often simplify
repetitive administrative tasks.
Full system auditing involves logging every single operation that the computer performs. This
consumes vast amounts of disk space and CPU time and is generally inadvisable unless one has a
specific reason to audit the system. Part of auditing used to be called system accounting from the
days when computer accounts really were accounts for real money. In the mainframe days, users
would pay for system time in dollars and thus accounting was important since it showed who
owed what [133], but this practice remains mainly on large super-computing installations today
and ‗computing farms‘.
Auditing has become an issue again in connection with security. Organizations have become
afraid of break-ins from system crackers and want to be able to trace the activities of the system in
order to be able to look back and find out the identity of a cracker. The other side of the coin is
that system accounting is so resource consuming that the loss of performance might be more
important to an organization than the threat of intrusion.
Privileged accounts
Operating systems that restrict user privileges need an account which can be used to configure
and maintain the system. Such an account must have access to the whole system, without regard
for restrictions. It is therefore called a privileged account. In UNIX the privileged account is
called root, also referred to colloquially as the super-user. In Windows, the Administrator account
is similar to UNIX‘s root, except that the administrator does not have automatic access to
everything as does root. Instead, he/she must be first granted access to an object. However the
Administrator always has the right to grant themselves access to a resource so in practice this
Page 45 of 248
feature just adds an extra level of caution. These accounts place virtually no restriction on what
the account holder can do. In a sense, they provide the privileged user with a skeleton key, a
universal pass to any part of the system.
The two most popular classes of operating system today are Unix-like operating systems (i.e.
those which are either derived from or inspired by System V or BSD) and Microsoft Windows
NT-like operating systems. We shall only discuss Windows NT and later derivatives of the
Windows family, in a network context. For the sake of placing the generalities in this book in a
clearer context, it is useful to compare ‗Unix‘ with Windows.
The file and directory structures of UNIX and Windows are rather different, but it is natural that
both systems have the same basic elements.
Page 46 of 248
Unix-like operating systems are many and varied, but they are basically similar in concept.
Filesystems
Files and filesystems are at the very heart of what system administration is about. Almost every
task in host administration or network configuration involves making changes to files. We need to
acquire a basic understanding of the principles of filesystems, so what better way than to examine
some of the most important filesystems in use today. Specifically, what we are interested in is the
user interfaces to common filesystems, not the technical details which are rather fickle. We could,
for instance, mention the fact that old filesystems were only 32 bit addressable and therefore
supported a maximum partition size of 2GB or 4GB, depending on their implementation details,
or that newer filesystems are 64 bit addressable and therefore have essentially no storage limits.
We could mention the fact that UNIX uses an index node system of block addressing, while DOS
uses a tabular lookup system: the list goes on. These technical details are of only passing interest
since they change at an alarming pace. What is more constant is the user functionality of the
Page 47 of 248
filesystems: how they allow file access to be restricted to groups of users, and what commands are
necessary to manage this.
UNIX has a hierarchical filesystem, which makes use of directories and subdirectories to form a
tree. All file systems on UNIX-like operating systems are based on a system of index nodes, or i
nodes, in which every file has an index entry stored in a special part of the filesystem. The i nodes
contain an extensible system of pointers to the actual disk blocks which are associated with the
file. The i node contains essential information needed to locate a file on the disk. The top or start
of the UNIX file tree is called the root filesystem or ‗/‘. Although the details of where common
files are located differ for different versions of UNIX, some basic features are the same.
The main subdirectories of the root directory together with the most important file are shown
below. Their contents are as follows.
/bin Executable (binary) programs. On most systems this is a separate directory to /usr/bin. In
SunOS, this is a pointer (link) to /usr/bin.
/etc miscellaneous programs and configuration files. This directory has become very messy
over the history of UNIX and has become a dumping ground for almost anything. Recent
versions of UNIX have begun to tidy up this directory by creating subdirectories /etc/mail,
/etc/inet etc.
/usr this contains the main meat of UNIX. This is where application software lives, together
with all of the basic libraries used by the OS.
/usr/bin more executables from the OS.
/usr/sbin Executables that are mainly of interest to system administrators.
/usr/local this is where users‘ custom software is normally added.
/sbin a special area for (often statically linked) system binaries. They are placed here to
distinguish commands used solely by the system administrator from user commands, and so
that they lie on the system root partition, where they are guaranteed to be accessible during
booting.
/sys this holds the configuration data which go to build the system kernel.
Page 48 of 248
/export Network servers only use this. This contains the disk space set aside for client
machines which do not have their own disks. It is like a ‗virtual disk‘ for diskless clients.
/dev and /devices A place where all the ‗logical devices‘ are collected. These are called
‗device nodes‘ in Unix and are created by mknod. Logical devices are UNIX‘s official entry
points for writing to devices. For instance, /dev/console is a route to the system console,
while /dev/kmem is a route for reading kernel memory. Device nodes enable devices to be
treated as though they were files.
/home (Called /users on some systems.) Each user has a separate login directory where files
can be kept. These are normally stored under /home by some convention decided by the
system administrator.
/root On newer Unix-like systems, root has been given a home-directory which is no longer
the root of the filesystem ‗/‘. The name root then loses its logic.
/var System V and mixed systems have a separate directory for spooling. Under old BSD
systems, /usr/spool contains spool queues and system data. /var/spool and /var/adm etc.
Page 49 of 248
Chapter Review Questions
1. One of the following is correct about transmission mode.
A. In simplex mode, data transmitted from the source to the receiver in both directions.
B. Data can be transmitted in both directions simultaneous in half duplex.
C. Data is transmitted in both directions alternatively in full duplex.
D. Telephone communication is a typical example of full duplex.
2. What does a Network Protocol define?
A. It defines the Syntax of a Message
B. It defines the Semantics of a Message
C. It defines the actions to take on receipt of a message
D. All
3. One of the following is the Default Mask of Class B.
A. 255.0.0.0 C. 255.248.0.0
B. 255.255.255.0 D. 255.255.0.0
4. What is the Class of the IP Address 132.21.34.78?
A. Class A C. Class B
B. Class C D. Class D
5. Select the incorrect statement about network connectivity.
A. Every client computer is equal in the client/server architecture.
B. There is high security in peer-to-peer network than client /server.
C. In peer-to-peer connectivity, there is no dedicated Server.
D. Client /Server network is more expensive than peer-to- peer.
6. Which one of the following pair of IP Addresses are within the same Block (Network)?
A. 189.34.455.25 and 189.34.455.24 C. 191.23.16.55 and 192.23.16.56
B. 246.22.31.66 and 246.22.32.254 D. 221.22.45.66 and 221.22.45.189
7. Which one of the following is an advantages of computer Networks?
A. Information Hacking C. Rapid Spread of Computer Viruses
B. Resource Sharing D. Vulnerability to remote exploits
8. One of the following statements is a valid MAC Address.
A. AB-34-SD-44-00-FK C. F0-EE-09-D0-90-AB
B. B0-EE-H9-D0-90-00 D. CC-EE-09-D0-90-FZ
9. One of the following is the most popular LAN Architecture (Technology)?
A. Ethernet B. Token Ring
B. FDDI D. DNS
10. Which one of the following statements is not true?
A. MAC Address is used at the Data link layer.
B. Transport layer uses IP address.
C. IP address is a logical address
D. MAC address is a physical address.
11. The Application Layer Protocol that dynamically assigns IP addresses to hosts is called:
A. HTTP C. DHCP
B. DNS D. SNMP
Page 50 of 248
12. The standard connector used by UTP cables is _______.
A. BNC C. ST
B. RJ-45 D. SC
13. One of the following is wrong about internetworking devices?
A. Repeaters are dedicated to extend the length of cable segments
B. Hub is a multi-port signal regenerator.
C. Router provides connectivity between two different LANs.
D. Both Switch and Hub Assigns IP address to Computers
14. One of the following is a parameter used to select either a Peer-to-Peer or Server based
network:
A. Size of the organization C. Level of Network Security required
B. Network budget D. All of the above
15. Which one of the following is true?
A. LAN has no geographic area limitations.
B. CSMA/CD is a type of Random-Access protocol.
C. Logical Topology refers to the way cables and computers are arranged
D. Physical topology refers to the data flow in a computer network
16. A Network Device that has two ports and used only to regenerate a weak signal is:
A. Hub C. Router
B. Switch D. Repeater
17. One of the following is not true?
A. The OSI Reference Model divides communication functions into Seven Layers.
B. The OSI Reference model is developed by the standard organization called ISO.
C. It discourages competition and innovation
D. The OSI Reference Model was developed for Interoperability purpose.
18. As data units are encapsulated down the OSI reference model, which one of the following
is the correct order?
A. User Data → Packet → Frame → Bits → Segment
B. Bits → Frame → Packet → Segment → User Data
C. User Data → Segment → Frame → Bits → Packet
D. User Data → Segment → Packet → Frame → Bits
19. The layer in the OSI Reference model responsible for Data format conversion, Data
compression and Encryption (Decryption) is ______.
A. Presentation C. Data Link
B. Session D. Application
C.
20. What is the purpose of flow control?
A. To ensure that data is retransmitted if an acknowledgment is not received
B. To reassemble segments in the correct order at the destination device
C. To provide a means for the receiver to govern the amount of data sent by the sender
D. To regulate the size of each segment
21. The Technique that converts digital data to analogue signal is called
Page 51 of 248
A. Encoding C. Modulation
B. Demodulation D. Decoding
22. Communication between a radio station and its listeners involves ________transmission.
A. Simplex C. half-duplex
B. full-duplex D. automatic
23. How does a host on an Ethernet LAN know when to transmit after a collision has
occurred?
A. In a CSMA/CD collision domain, multiple stations can successfully transmit data
simultaneously.
B. You can improve the CSMA/CD network by adding more hubs.
C. After a collision, the station that detected the collision has first priority to resend the
lost data.
D. After a collision, all stations run a random backoff algorithm. When the backoff delay
period has expired, all stations have equal priority to transmit data.
24. Which one of the following lists is not component of an analogue signals
A.Discrete pulse B. Amplitude
B. phase D. Wavelength
25. Which of the following does not describe router functions
A. Packet switching C. Broadcast forwarding
B. Packet filtering D. Internetwork communication
26. Routers operate at layer __. LAN switches operate at layer __.
A. 3, 2 B. 2, 3,
B. 3, 4 D. 2, 1
27. Acknowledgments, sequencing, and flow control are characteristics of which OSI layer?
A. Layer 3 B. Layer 5
B. Layer 4 D. Layer 7
28. Which two of the following are private IP addresses?
A. 12.0.0.1 C. 192.172.19.39
B. 172.40.14.36 D.192.168.24.43
29. What is the maximum number of IP addresses that can be assigned to hosts on a local
subnet that uses the 255.255.255.224 subnet mask?
A. 14 B. 15 C. 31 D. 30
30. You want to implement a network medium that is not susceptible to EMI. Which type of
cabling should you use?
A. coaxial B. Microwave C. Category 6 UTP D. Fiber-optic
31. Which one of the following routing protocol mappings incorrectly.
A. RIP-Distance vector C. OSPF-Link state
B. EIGRP-Hybrid D. IGRP-Link State
32. The likelihood of a threat source taking advantage of vulnerability is termed as_______
Page 52 of 248
A. Vulnerability
B. Threat
C. Risk
D. Exposure
33. The phrase ____ describe viruses, worms, Trojan horse attack applets and attack scripts.
A. Phishing
B. Virus
C. Malware
D. Spam
34. What role does biometrics have in logical access control?
A. Certification
B. Authorization
C. Authentication
D. Confirmation
35. The practice of embedding a message in a document, image, video or sound recording so
that its very existence is hidden is called?
A. Steganography.
B. Shielding.
C. Data diddling.
D. Anonymity.
36. A cryptographic _______________ is an algorithm that takes an arbitrary amount of data
input and produces a fixed-size output.
A. Hash Function
B. Encryption
C. Digital Signature
D. None
37. In __________ attack, the victim is targeted from a large number of individual
compromised systems simultaneously.
A. DDoS
B. Rootkit
C. DoS
D. Spyware
Page 53 of 248
38. A form of cryptosystem in which encryption and decryption are performed using
different keys is:
A. Symmetric key encryption
B. Secret- key
C. Conventional encryption.
D. Asymmetric encryption
39. Copyright provides what form of protection:
A. Protects an author‘s right to distribute his/her works.
B. Protects information that provides a competitive advantage.
C. Protects the right of an author to prevent unauthorized use of his/her works.
D. Protects the right of an author to prevent viewing of his/her works.
40. Which one of the following users is responsible to create account for all other users and to
assign rolls to them in the database server?
A. End user
B. Application user
C. Database administrator
D. Application Programmer
Page 55 of 248
Threat
• A set of circumstances that has the potential to cause loss or harm
• A potential for violation of security
Risk
• The potential of loss or damage Information Assets.
Control
• An action ,device , procedure or technique that eliminates or reduce vulnerability.
Who is vulnerable?
Financial institutions and banks
Internet service providers
Pharmaceutical companies
Government and defense agencies
Contractors to various government agencies
Multinational corporations
Anyone on the network
Why do we need security?
Protecting systems against attacks
Protect vital information while still allowing access to those who need it
• Trade secrets, medical records, etc.
Provide authentication and access control for resources
Guarantee availability of resources
To find means of mitigating risks
What should we protect?
Determining what to protect requires that we first what has value and to whom
Assets include: -
Hardware
• Computer component
• Network and communication channels
• Mobile devices
Software
• Operating system
• Off- the -shelf programs and apps
Page 56 of 248
• Custom or customized programs and apps
Data
• Files
• databases
Computer Security Goals
The three primary security goals are confidentiality, Integrity and Availability
1. Confidentiality
the ability of a system to ensure that assets are viewable only by authorized parties
2. Integrity
the ability of a system to ensure that assets are modifiable only by authorized parties
3. Availability
the ability of a system to ensure that assets are usable by and accessible to all
authorized parties
Additional goals of computer security
Aside from CIA authentication, non-repudiation and auditability are also desirable system
properties.
Authentication:
the ability of a system to confirm the identity of a sender
Non-repudiation
the ability of a system to confirm that a sender cannot convincingly deny having sent a
message
Auditability
the ability of a system to trace all actions related to a given asset
Harmful acts
Harm to information system can be effected in four different ways
1. Interception
2. Interruption
3. Modification
4. Fabrication
Each of these four acts can cause harm to a system by affecting its ability to ensure
confidentiality, integrity and availability
Page 57 of 248
Additional Details to CIA
Confidentiality
Confidentiality is roughly equivalent to privacy and avoids the unauthorized disclosure of
information. It involves the protection of data, providing access for those who are allowed to see
it while disallowing others from learning anything about its content. It prevents essential
information from reaching the wrong people while making sure that the right people can get it.
Data encryption is a good example to ensure confidentiality.
Use the ―need to know‖ basis for data access
How do we know who needs which data?
Access control specifies who can access what?
How do we know a user is the person or a system that they claim to be?
Need to verify their identity
Identification and authentication
Similarly, access to physical assets should be granted only on a ―need‖ basis
Example: Access to data center, a computer room or use of a desk top
Confidentiality is:
Difficult to ensure
Easiest to in terms of success
Tools for Confidentiality
Page 58 of 248
Integrity
Integrity refers to the methods for ensuring that data is real, accurate and safeguarded from
unauthorized user modification. It is the property that information has not be altered in an
unauthorized way, and that source of the information is genuine.
Integrity v confidentiality
Integrity is concerned with unauthorized modification of assets
Confidentiality is concerned with access to assets
Integrity is more difficult to measure than confidentiality
Integrity is not binary -there are degrees of integrity
Integrity is context dependent-integrity means different things in different situations
Integrity could refer to any combination of precision, accuracy, currency, consistency,
meaningfulness, usefulness etc.
Tools for Integrity
Availability
Availability is the property in which information is accessible and modifiable in a timely fashion
by those authorized to do so. It is the guarantee of reliable and constant access to our sensitive
data by authorized people.
Not understood very well yet
Availability is a complex issue
• Context – dependent
Page 59 of 248
• Could refer to any combination of asset properties
Example: usefulness, sufficient capacity, progressing at proper space completed in acceptable
amount of time
An asset can be considered available when there is:-
• A timely request response
• Fair allocation of resources(no starvation)
• Fault tolerance(no total breakdown)
• Ease of use
• Controlled concurrency (concurrency control, deadlock control etc.)
Tools for Availability
o Physical Protections
o Computational Redundancies
Computer Security threats
In computer security a threat is a possible danger that might exploit a vulnerability to breach
security and therefore cause possible harm.
A threat can be either intentional (i.e. hacking: an individual cracker or a criminal organization)
or "accidental" (e.g. the possibility of a computer malfunctioning, or the possibility of a natural
disaster such as an earthquake, a fire, or a tornado, hurricane) or otherwise a circumstance,
capability, action, or event.
Threat classification
Page 60 of 248
Threat communities
Subsets of the overall threat agent population that share key characteristics. The notion of threat
communities is a powerful tool for understanding who and what we‘re up against as we try to
manage risk
If the organization were to come under attack, what components of the organization would be
likely targets? For example, how likely is it that terrorists would target the company information
or systems?
The following threat communities are examples of the human malicious threat landscape
many organizations face:
1. Insiders (Internal)
• Employees
• Contractors (and vendors)
• Partners
2. Outsiders (External)
• Cyber-criminals (professional hackers and crackers)
• Spies
• Non-professional hackers
• Activists
• Nation-state intelligence services (e.g., counterparts to the CIA, etc.)
• Malware (virus/worm/etc.) authors
Malicious Code (Malware)
Page 61 of 248
Virus
A hidden, self-replicating section of computer software, usually malicious logic, that propagates
by infecting another program or system memory
Viruses can be divided in to two groups
Transient virus is active only when its host program is active
Resident virus establishes itself in the computer‘s memory and can remain active without its
host.
Worm
A computer program that can run independently, can propagate a complete working version of
itself on to other hosts in a network, and may consume computer resources destructively.
Trojan horse
A computer program that appears to have a useful function but also has a hidden and malicious
purpose that evades security mechanism, sometimes by exploiting the legitimating authorizations
of the user who invokes the program
Example: you downloaded a game app for your smart phone, when you launch the app, you will
able to play the game but the app is secretly made copies of your contacts list and transfer
information to the remote server.
Page 62 of 248
Zombie
A malicious software that enables a computer to be controlled by a remote master machine
Logic bomb
Malicious program logic that activates when specified conditions are met.
Time bomb
A type of logic bomb that activates at a specific date/time
Hiding a virus
Viruses can be hidden in many places i.e. in:
• boot sector
• memory
• application programs
• library files(e.g .dll files)
• other widely shared files and programs
Network security attacks
Network advantages
• Resource sharing
• Distribution of workload
• Increased reliability
• easy expandability and scalability
Network vulnerability
Several characteristics make networks vulnerable to attack, including:
• Anonymity
• Many point of attack
• Resource and workload sharing
• Network architecture is complex
• Networks have unknown boundary
Example: wireless node
Software Threats
Adversary
An adversary (a person/hacker/cracker who is interested in attacking your network) can use any
kind of attack to threat the network infrastructures. A network may face several other attacks
Page 63 of 248
from adversary while achieving above goals. In following section, it includes some most
common attacks
Computer Software Security threats
Reconnaissance Attack (Investigation)
In this kind of attack, an adversary collects as much information about your network as he
needed for other attacks.
This information includes IP address range, server location, running OS, software version, types
of devices etc.
Packet capturing software, Ping command, trace root command, who is lookup are some
example tools which can be used to collect this information. Adversary will use this information
in mapping your infrastructure for next possible attack.
Passive attack
In this attack an adversary deploys a sniffer tool and waits for sensitive information to be
captured. This information can be used for other types of attacks.
It includes packet sniffer tools, traffic analysis software, filtering clear text passwords from
unencrypted traffic and seeking authentication information from unprotected communication.
Once an adversary found any sensitive or authentication information, he will use that without the
knowledge of the user
Active Attack
In this attack an adversary does not wait for any sensitive or authentication information. He
actively tries to break or bypass the secured systems. It includes viruses, worms, Trojan horses,
stealing login information, inserting malicious code and penetrating network backbone. Active
attacks are the most dangerous in natures. It results in disclosing sensitive information,
modification of data or complete data lost.
Distributed Attack
In this attack an adversary hides malicious code in trusted software. Later this software is
distributed to many other users through the internet without their knowledge. Once end user
installs infected software, it starts sending sensitive information to the adversary silently.
Pirated software is heavily used for this purpose.
Insider Attack
According to a survey more than 70% attacks are insider. Insider attacks are divided in two
categories; intentionally and accidentally. In intentionally attack, an attacker intentionally
Page 64 of 248
damage network infrastructure or data. Usually intentionally attacks are done by disgruntled or
frustrated employees for money or revenge. In accidentally attack, damages are done by the
carelessness or lack of knowledge.
Hijacking
This attack usually takes place between running sessions. Hacker joins a running session and
silent disconnects other party. Then he starts communicating with active parties by using the
identity of disconnected party. Active party thinks that he is talking with original party and may
send sensitive information to the adversary.
Phishing
Phishing attack is gaining popularity from last couple of years. In this attack an adversary creates
fake email address or website which looks like a reputed mail address or popular site. Later
attacker sends email using their name. These emails contain convincing message, some time with
a link that leads to a fake site. This fake site looks exactly same as original site. Without knowing
the truth user tries to log on with their account information, hacker records this authentication
information and uses it on real site.
Spoofing
In this kind of attack an adversary changes the sources address of packet so receiver assumes
that packet comes from someone else. This technique is typically used to bypass the firewall
rules.
Buffer overflow attack
This attack is part of DoS technique. In this attack an adversary sends more data to an application
than its buffer size. It results in failure of service. This attack is usually used to halt a service or
server.
Exploit attack
Exploit attack is used after Reconnaissance attack. Once an attacker learned from reconnaissance
attack that which OS or software is running on target system, he starts exploiting vulnerability in
that particular software or OS.
Packet capturing attack
This attack is part of passive attack. In this attack an attacker uses a packet capturing software
which captures all packets from wire. Later he extracts information from these packets. This
information can be used to deploy several kinds of other attacks.
Page 65 of 248
Ping sweep attack
In this attack an attacker pings all possible IP addresses on a subnet to find out which hosts are
up. Once he finds an up system, he tries to scan the listening ports. From listing ports he can
learn about the type of services running on that system. Once he figures out the services, he can
try to exploit the vulnerabilities associated with those services.
DNS Query attack
DNS queries are used to discover information about public server on the internet. All OS
includes the tool for DNS queries such as lookup in Windows, Dig and Host in Linux. These
tools query a DNS server for information about specified domain. DNS server respond with
internal information such as Server IP address, Email Server, technical contacts etc. An
adversary can use this information in phishing or ping attack.
MiTM attacks
In this attack an adversary captures data from middle of transmission and changes it, then send it
again to the destination. Receiving person thinks that this message came from original source.
For example in a share trading company Jack is sending a message to Rick telling him to hold
the shares. An adversary intercepts this message in way that it looks like Jack is telling for sell.
When Rick receives this message, he will think that Jack is telling for the sell and he will sell the
shares. This is known as Man in the middle attack.
Botnets
Botnets are armies of remote-controlled devices used for the purpose of sending spam (including
Phishing scams), propagating malware and launching DDoS attacks. Botnets are the master-
mover of most cyber security threats in terms of the scope of damage they cause in CSP
networks across the globe
Computer Security Techniques
What is biometrics?
From the Greek meaning life (bio) and metric (to measure), the term ―biometrics‖ refers to
technologies for measuring and analyzing a person‘s physiological or behavioral characteristics.
In reality, biometrics refers to protecting network and physical security through physical and
behavioral biometric techniques.
Automatic recognition of people based on their anatomical (e.g., face, fingerprint, iris, retina)
and behavioral (e.g., signature, posture) individualities is called Biometrics. It is a form of
Page 66 of 248
information that helps in identifying one's physical characters such as psychosomatic, behavioral
characters, etc.
Biometrics
The physical biometric techniques include
• Finger printing
• hand and finger geometry
• facial recognition
• iris and retinal scanning
• Vascular pattern recognition.
Behavioral biometric techniques include
• speaker and voice recognition
• signature verification, and
• Keystroke dynamics.
Mitigating network security threats
Port scanning
A software program that is designed to examine one or more IP Addresses and record which
ports are open and which known vulnerabilities are present. Network administrator or security
analyst can use port scanner to evaluate the strengths and weaknesses of a network.
• An attacker can also use port scanner to assess how or at which point to attack
• High-quality port scanners are freely available to both white-hats and black hats alike.
Segmenting network
One way of controlling threats from port scanner is to implement network segmentation. In a
segmented network, many hosts belong to a protected sub-networks that are not directly visible
to the outside world.
Authentication, Authorization and Accounting (AAA)
Authentication
• The Process of identifying a user or computer
Authorization
• The process of determining level of access for user or computer
Accounting
• The process of keeping a log of activity by a user or computer
Page 67 of 248
User Authentication
There are two protocols used to provide authentication, authorization and accounting.
• Remote Authentication Dial In Service (RADIUS)
• Terminal Access Controller Access Control System+(TACACS+)
One significant difference is that TACACS+ relies on TCP connections while RADIUS uses
UDP connections
What is firewall?
Firewalls are an absolutely indispensable tool in the part of security of computer networks. A
firewall is s device (hardware, software or both) that is designed to:
• prevent unauthorized outside users from accessing a network or workstation
• Prevent inside users from transmitting sensitive information or accessing unsecure
resources.
A firewall protects a local network from the outside global network. Firewall work by inspecting
each inbound or outbound packet and determining weather it should be blocked or allowed to
pass through properly implemented firewall can reduce or eliminate many network threats It can
be implemented in a router, gateway, or special host.
Types of firewall
Page 68 of 248
c. Application proxy gateways
d. Circuit-level gateways
e. Guard firewalls
f. Personal firewalls
a. Packet filtering gateway
All internet traffic in the network is of the packets form. A packet consists the following
information
Source IP address
Destination IP address
The data
Error checking information
Protocol information
And additional options
b. Stateless and Statefull Firewall
Statefull filtering are the most modern approach of firewall, it combines the capabilities of NAT
firewalls, circuit level firewalls and application firewalls into a common system.
This approach validates connection before allowing data to be transferred. These firewalls filters
traffic initially with packet characteristics and rules and also includes the session validation
check to make sure that the specific session is allowed.
Stateless firewalls watch the traffic packet by packet and filter them based on Firewalls
individual rules. Each packet is individually checked and filtered.
They do not attempt to correlate the packets that came before and then judge if there is a
malicious potential or intention. However, it is necessary to watch a set of packets between a
source and a destination to infer any malicious intent. Statefull firewalls can watch traffic
streams from end to end.
c. Application proxy gateway
These firewall understand and work on layer 7 of OSI i.e; application layer of the network stack.
Application firewall inspect the payload of the IP packet that contains a TCP/UDP segment
within which it inspects the application layer data.
Circuit level firewall
Circuit level filtering works at the session layer of OSI model. Traffic to the remote compute is
made as though the traffic is originated from a circuit level firewall. This modification will
Page 69 of 248
partially allow to hide the information about the protected network but has a drawback that it
does not filter individual packets in a given connection.
Page 70 of 248
Cryptography
What is Cryptography?
Cryptography (from Greek kryptos “hidden, secret”and graphein “writing” is the practice and
study of techniques for secure communication in the presence of third party called adversaries.
The process of converting from plaintext to cipher text is known as encryption; restoring the
plaintext from the cipher text is decryption.
Cryptographic Terminology
Encryption
The process of encoding a message so that its meaning is not obvious
Encryption: the process of converting an original message into a form that cannot be
understood by unauthorized individuals.
Cryptography
The art / science of keeping a message secure
Cryptanalysis
The art / science of breaking cipher Ext
Cryptanalysis: Techniques used for deciphering a message without any knowledge of the
enciphering details fall into the area of cryptanalysis. Cryptanalysis is what the lay person calls
"breaking the code." It is about analyzing hidden messages using a statistical/analytical
approach.
Cryptology
Cryptography + cryptanalysis
Cryptology is the science of encryption and decryption that encompasses two disciplines.
The areas of cryptography and cryptanalysis together are called cryptology. It is more
about reading hidden messages
Plain text (Also known as clear text)
The original form of a message (unencrypted form of a message)
By convention written in UPPERCASE
Cipher text (Cryptogram)
The encrypted form of a message
By convention written in lowercase
Page 71 of 248
A key or crypto variable: the information used in conjunction with the algorithm to create the
cipher text from the plaintext. It can be a series of bits used in a mathematical algorithm or the
knowledge of how to manipulate the plaintext
Key space: the entire range of values that can possibly be used to construct an individual key
Cryptosystems: The combination of algorithm, key and key management functions used to
perform cryptographic operations.
Steganography: The process of hiding messages, usually within graphic images
To encrypt a message, we need
an encryption algorithm
an encryption key
The plaintext.
Types of cryptography
We can divide all the cryptography algorithms (ciphers) into two groups:
Symmetric key (also called secret-key) cryptography algorithms and
Asymmetric (also called public-key) cryptography algorithms
Symmetric key cryptography
In symmetric-key cryptography, the same key is used by both parties. The sender uses this key
and an encryption algorithm to encrypt data; the receiver uses the same key and the
corresponding decryption algorithm to decrypt the data. The key is shared.
Symmetric encryption is a form of cryptosystem in which encryption and decryption are
performed using the same key. It is also known as conventional encryption
Page 72 of 248
Absolutely,
Symmetric key encryption systems are typically at least 10,000 times faster than Public key
encryptions.
Page 73 of 248
In a transposition cipher, there is no substitution of
Characters; instead, their locations change. A character in the first position of the plaintext may
appear in the tenth position of the cipher text. A character in the eighth position may appear in
the first position. In other words, a transposition cipher reorders the symbols in a block of
symbols. A transposition cipher reorders (permutes) symbols in a block of symbols. The simplest
such cipher is the rail fence technique, in which the plaintext is written down as a sequence of
diagonals and then read off as a sequence of rows
For example, to encipher the message "meet me after the toga party" with a rail fence of depth 2,
we write the following:
mematrhtgpry
etefeteoaat
The encrypted message is mematrhtgpryetefeteoaat,
This sort of thing would be trivial to cryptanalyze.
Similarly, Encrypt “Let us meet at hawassa ” by yourself.
“Let us meet at hawassa ”
l t s e tta as
e u meahwsa
The encrypted message is ltsettaaseumeahwsa
Simple modern ciphers
The traditional ciphers we have studied so far are character-oriented. With the advent of the
computer, ciphers need to be bit-oriented. This is so because the information to be encrypted is
not just text; it can also consist of numbers, graphics, audio, and video data.
It is convenient to convert these types of data into a stream of bits, encrypt the stream, and then
send the encrypted stream. In addition, when text is treated at the bit level, each character is
replaced by 8 (or 16) bits, which means the number of symbols becomes 8 (or 16). Mingling and
mangling bits provides more security than mingling and mangling characters. Modem ciphers
use a different strategy than the traditional ones. A modern symmetric cipher is a combination
of simple ciphers. In other words, a modern cipher uses several simple ciphers to achieve its
goal. We first discuss these simple ciphers.
Simple modern cipher examples
One-time pad cipher
Page 74 of 248
XOR Cipher
Stream Ciphers
Block Ciphers
Diffusion and confusion
Symmetric-key cryptography Algorithms
Page 75 of 248
Public Key Cryptography
First revolution in cryptography in hundreds of years Originally introduced in a paper in 1976:
―New directions in cryptography‖, by Diffie and Hellman.In asymmetric or public-key
cryptography, there are two keys: a private key and a public key. The private key is kept by the
receiver. The public key is announced to the public.
Characteristics of public key encryption
Note
Two encryption/ decryption possibilities exist in a key pair
1. The plaintext is encrypted with a private key. Or decrypted with a public key.
2. The plaintext is encrypted with public key and decrypted with a private key.
Given the usefulness of public key encryption, do we still need symmetric key encryption?
Absolutely,
Symmetric key encryption systems are typically at least 10,000 times faster than Public key
encryptions.
How it works?
Page 76 of 248
Public Key Cryptography: Security Uses
Authentication
Digital Signatures
Proving that a message is generated by a particular individual
Non-repudiation: the signing individual can not be denied, because only him/her knows the
private key.
Hash function:
Cryptographic hash function is a mathematical transformation that takes a message of arbitrary
length and computes it a fixed-length (short) number.
Properties
(Let the hash of a message m be h(m))
For any m, it is relatively easy to compute h(m). Given h(m), there is no way to find an m that
hashes to h(m) in a way that is substantially easier than going through all possible values of m
and computing h(m) for each one. It is computationally infeasible to find two values that hash to
the same thing.
Password hashing
The system stores a hash of the password (not the password itself) When a password is supplied,
it computes the password‘s hash and compares it with the stored value.
Message integrity
Using cryptographic hash functions to generate a MAC
Page 77 of 248
Public key Cryptography Algorithms
RSA algorithm (Rivest-Shamir-Adleman) Algorithm
Elliptic Curve Digital Signature Algorithm (ECDSA)
Elliptic-curve Diffie - Hellman (ECDH)
Program Security
Page 78 of 248
TOCTTOU(Time of Check to Time of Out)
Undocumented access point
Controls against program threats
We can control program threats through the following
Sandboxing
Modularity
Encapsulation
information hiding
Mutual suspicion and confinement
Generic Diversity
Page 79 of 248
Review Questions (Computer Security and Privacy)
Page 80 of 248
6. What is the inverse of confidentiality, integrity, and availability (C.I.A.) triad in risk
management?
A. misuse, exposure, destruction
B. authorization, non-repudiation, integrity
C. disclosure, alteration, destruction
D. confidentiality, integrity, availability
7. The practice of embedding a message in a document, image, video or sound recording
so that its very existence is hidden is called?
E. Anonymity.
F. Steganography.
G. Shielding.
H. Data diddling.
8. ____ are devices or programs that control the flow of network traffic between networks
or hosts that employ differing security postures
A. Spoofing
B. Scanning
C. Network address translation
D. Firewall
9. Which of the following is not mandatory security of the operating system?
A. Access authorization
B. Trustworthiness
C. Authentication usage
D. Cryptographic usage
10. A form of cryptosystem in which encryption and decryption are performed using the
same key is:
E. Symmetric key encryption
F. Conventional encryption.
G. Asymmetric encryption
H. All but C
Page 81 of 248
11. Which of the following entity is ultimately responsible for information security within
an organization?
A. IT Security Officer
B. Project Managers
C. Department Directors
D. Senior Management
12. The absence of one of the CIA leads mores to denial of service (DOS).
A. confidentiality
B. availability
C. integrity
D. none
13. Cryptography does not concern itself with:
A. Availability
B. Authenticity
C. Integrity
D. Confidentiality
14. Existence of a weakness, design or implementation error that can lead to an unexpected,
undesirable event compromising the security of a system
A. Threat
B. Vulnerability
C. Exploit
D. Risk
15. Which of the followings is an example of simple substitution algorithm?
A. Rivest, Shamir, Adleman (RSA)
B. Data Encryption Standard (DES)
C. Caesar cipher
D. Blowfish
16. What is the effective length of a secret key in the Data Encryption Standard (DES)
algorithm?
A. 56-bit
B. 64-bit
C. 32-bit
D. 16-bit
Page 82 of 248
17. A program security vulnerability that occurs when two concurrently executing
processes produce incorrect computational results.
A. Buffer over flow
B. Race condition
C. TOCTTOU
D. Incomplete mediation
18. A Proxy server is used for which of the following?
A. To provide security against unauthorized users
B. To process client requests for web pages
C. To process client requests for database access
D. To provide TCP/IP
19. Trojan-Horse programs
A. are legitimate programs that allow unauthorized access
B. are hacker programs that do not show up on the system
C. really do not usually work
D. are usually immediately discovered
20. All of the following are examples of real security and privacy risks. EXCEPT:
A. Hackers
B. Spam
C. Viruses
D. Identity theft
21. Suppose an employee demands the root access to a UNIX system, where you are the
administrator; that right or access should not be given to the employee unless that
employee has work that requires certain rights, privileges. It can be considered as a
perfect example of which principle of cyber security?
A. Least privileges
B. Open Design
C. Separation of Privileges
D. Both A & C
Page 83 of 248
22. Which one of the following is considered as the most secure Linux operating system
that also provides anonymity and the incognito option for securing the user's
information?
A. Ubuntu
B. Tails
C. Fedora
D. All of the above
23. Which of the following known as the oldest phone hacking techniques used by hackers
to make free calls?
A. Phreaking
B. Phishing
C. Cracking
D. Spraining
24. __ is a computer crime in which a criminal break into a computer system for exploring
details of information etc.
A. Hacking
B. Spoofing
C. Eavesdropping
D. Phishing
25. The ability to recover and read deleted or damaged files from a criminal‘s computer is
an example of a law enforcement specialty called:
A. robotics
B. simulation
C. computer forensics
D. animation
26. Software, such as viruses, worms and Trojan horses, that has a malicious intent is
known as:
A. spyware
B. adware
C. spam
D. malware
Page 84 of 248
27. An electronic file that uniquely identifies individuals and websites on the internet and
enables secure, confidential communications.
A. Digital signature
B. Digital certificates
C. Encryption
D. Firewalls
28. The private content of a transaction, if unprotected, can be intercepted when it goes
through the route over the internet is
A. spoofing
B. Snooping
C. sniffing
D. eavesdropping
Page 85 of 248
Part II: Computer Organization and Architecture
Module Objective
On completion of the module successfully, students will be able to:
Carry out design and development of complex elements, such as user interfaces,
multiprocessing, and fault-tolerant components;
Describe the basic structure and operation of a digital computer
Explain in detail the operation of the arithmetic unit including the algorithms &
implementation of fixed-point and floating-point addition, subtraction, multiplication &
division.
Identify different ways of communicating with I/O devices and standard I/O interfaces.
Describe different performance enhancement of computer architecture
Explain the basic structure of computer hardware & software
Identify the processes involved in the basic operations of CPU
Demonstrate basic concepts of circuits and their design
2.1. Introduction
Computer organization
Design of the components and functional blocks using which computer systems
are built.
Analogy: civil engineers task during building construction (cement, bricks, iron
rod and other building materials)
Computer Architecture
How to integrate the components to build the computer system to achieve a desire
level of performance?
Analogy: Architects task during the planning of a building (overall layout, floor
plan etc.)
Computer architecture comprises rules, methods, and procedures that describe the execution
and functionality of the entire computer system. In general terms, computer architecture refers to
how a computer system is designed using compatible technologies. This article will tell you how
computer architecture is classified into a disciplinary method.
Page 86 of 248
The term architecture in computer literature signifies the efforts of Sir Lyle R. Johnson and Sir
Frederick P. Brooks, members of the Machine Organization department, in 1959. Sir Johnson
noted his description of formats, instruction types, hardware limitations, along with speed
improvements. These were at the level of system architecture, a term that is more useful than
machine organization. Succeedingly, a computer user can use that term in many less precise
methods.
Earlier, computer architects designed computer architecture on paper. It was then directly built
into a final hardware form. Later, they assembled computer architecture designs materially in the
form of transistor-transistor logic (TTL) computers. By the 1990s, new computer architectures
are typically built, examined, and tweaked inside another computer architecture, in a computer
architecture simulator, or the interior part of an FPGA, as a microprocessor before perpetrating to
the ultimate hardware form.
Page 87 of 248
Input and output mechanisms and peripherals.
The von Neumann design thus constitutes the foundation of modern computing. The Harvard
architecture, a similar model, had committed data addresses and buses for reading and writing to
memory. It wins because von Neumann's architecture was easier to execute in real hardware.
Logic Gates
The logic gates are the main structural part of a digital system.
Logic Gates are a block of hardware that produces signals of binary 1 or 0 when input
logic requirements are satisfied.
Each gate has a distinct graphic symbol, and its operation can be described by means of
algebraic expressions.
The seven basic logic gates include: AND, OR, XOR, NOT, NAND, NOR, and XNOR.
The relationship between the input-output binary variables for each gate can be
represented in tabular form by a truth table.
Each gate has one or two binary input variables designated by A and B and one binary
AND GATE:
o The AND gate is an electronic circuit which gives a high output only if all its inputs are
high. The AND operation is represented by a dot (.) sign.
o output variable designated by x.
OR GATE:
The OR gate is an electronic circuit which gives a high output if one or more of its inputs are
high. The operation performed by an OR gate is represented by a plus (+) sign.
Page 88 of 248
NOT GATE:
The NOT gate is an electronic circuit which produces an inverted version of the input at its
output. It is also known as an Inverter.
NAND GATE:
The NOT-AND (NAND) gate which is equal to an AND gate followed by a NOT gate. The
NAND gate gives a high output if any of the inputs are low. The NAND gate is represented by a
AND gate with a small circle on the output. The small circle represents inversion.
NOR GATE:
The NOT-OR (NOR) gate which is equal to an OR gate followed by a NOT gate. The NOR gate
gives a low output if any of the inputs are high. The NOR gate is represented by an OR gate with
a small circle on the output. The small circle represents inversion.
Page 89 of 248
Exclu
sive-OR/ XOR GATE:
The 'Exclusive-OR' gate is a circuit which will give a high output if one of its inputs is high but
not both of them. The XOR operation is represented by an encircled plus sign.
EXCLUSIVE-NOR/Equivalence GATE:
The 'Exclusive-NOR' gate is a circuit that does the inverse operation to the XOR gate. It will
give a low output if one of its inputs is high but not both of them. The small circle represents
inversion
Page 90 of 248
Basic gates
The OR, AND, and NOT are the three basic logic gates as they together can construct the logic
circuit for any given Boolean expression.
Derived gates
The logic gates which are derived from the basic gates such as AND, OR, NOT gates are called
derived gates. These derived gates have their own unique Symbols, Truth Tables and Boolean
Expressions. Here we will explore the most common derived gates such as NAND Gate, NOR
Gate, EX-OR Gate, and EX-NOR Gate
Universal gate
A universal gate is a gate which can implement any Boolean function without need to use any
other gate type. The NAND and NOR gates are universal gates. In practice, this is advantageous
since NAND and NOR gates are economical and easier to fabricate and are the basic gates used
in all IC digital logic families.
Boolean algebra
Boolean algebra can be considered as an algebra that deals with binary variables and logic
operations. Boolean algebraic variables are designated by letters such as A, B, x, and y. The
basic operations performed are AND, OR, and complement.
The Boolean algebraic functions are mostly expressed with binary variables, logic operation
symbols, parentheses, and equal sign. For a given value of variables, the Boolean function can be
either 1 or 0. For instance, consider the Boolean function:
F = x + y'z
The logic diagram for the Boolean function F = x + y'z can be represented as:
Page 91 of 248
The Boolean function F = x + y'z is transformed from an algebraic expression into a logic
diagram composed of AND, OR, and inverter gates.
Inverter at input 'y' generates its complement y'.
There is an AND gate for the term y'z, and an OR gate is used to combine the two terms
(x and y'z).
The variables of the function are taken to be the inputs of the circuit, and the variable
symbol of the function is taken as the output of the circuit.
The truth table for the Boolean function F = x + y'z can be represented as:
Combinational Circuits
A combinational circuit comprises of logic gates whose outputs at any time are determined
directly from the present combination of inputs without any regard to previous inputs. A
combinational circuit performs a specific information-processing operation fully specified
logically by a set of Boolean functions. The basic components of a combinational circuit are:
input variables, logic gates, and output variables.
Page 92 of 248
In our previous sections, we learned about combinational circuit and their working. The
combinational circuits have set of outputs, which depends only on the present combination of
inputs. Below is the block diagram of the synchronous logic circuit.
Examples of combinational circuits: Adder, Subtractor, Converter, and Encoder/Decoder,
multiplexer/de-multiplexer
The 'n' input variables come from an external source whereas the 'm' output variables go to an
external destination. In many applications, the source or destination are storage registers.
Decoder
The combinational circuit that change the binary information into 2N output lines is known
as Decoders. The binary information is passed in the form of N input lines. The output lines
define the 2N-bit code for the binary information. In simple words, the Decoder performs the
reverse operation of the Encoder. At a time, only one input line is activated for simplicity. The
produced 2N-bit output code is equivalent to the binary information.
Encoders
The combinational circuits that change the binary information into N output lines are known
as Encoders. The binary information is passed in the form of 2N input lines. The output lines
define the N-bit code for the binary information. In simple words, the Encoder performs the
reverse operation of the Decoder. At a time, only one input line is activated for simplicity. The
produced N-bit output code is equivalent to the binary information.
Multiplexer
A multiplexer is a combinational circuit that has 2n input lines and a single output line. Simply,
the multiplexer is a multi-input and single-output combinational circuit. The binary information
is received from the input lines and directed to the output line. On the basis of the values of the
selection lines, one of these data inputs will be connected to the output.
Page 93 of 248
Unlike encoder and decoder, there are n selection lines and 2n input lines. So, there is a total of
2N possible combinations of inputs. A multiplexer is also treated as Mux.
De-multiplexer
A De-multiplexer is a combinational circuit that has only 1 input line and 2N output lines.
Simply, the multiplexer is a single-input and multi-output combinational circuit. The information
is received from the single input lines and directed to the output line. On the basis of the values
of the selection lines, the input will be connected to one of these outputs. De-multiplexer is
opposite to the multiplexer.
Unlike encoder and decoder, there are n selection lines and 2n outputs. So, there is a total of
2n possible combinations of inputs. De-multiplexer is also treated as De-mux.
Sequential circuits
The sequential circuit is a special type of circuit that has a series of inputs and outputs. The
outputs of the sequential circuits depend on both the combination of present inputs and previous
outputs. The previous output is treated as the present state. So, the sequential circuit contains the
combinational circuit and its memory storage elements. A sequential circuit doesn't need to
always contain a combinational circuit. So, the sequential circuit can contain only the memory
element.
Page 94 of 248
Difference between the combinational circuits and sequential circuits are given below:
Sequential circuits refer to the combinational logic circuits that consist of input variables (X)
and logic gates (or Computational circuits) along with the output variable (Z).
For example, flip-flops, counter, register, clocks,
Page 95 of 248
synchronization of the outputs is done with either only negative edges of the clock signal or only
positive edges.
Clock Signal and Triggering
Clock signal
A clock signal is a periodic signal in which ON time and OFF time need not be the same. When
ON time and OFF time of the clock signal are the same, a square wave is used to represent the
clock signal. Below is a diagram which represents the clock signal:
A clock signal is considered as the square wave. Sometimes, the signal stays at logic, either high
5V or low 0V, to an equal amount of time. It repeats with a certain time period, which will be
equal to twice the 'ON time' or 'OFF time'.
Types of Triggering
These are two types of triggering in sequential circuits:
Level triggering
The logic High and logic Low are the two levels in the clock signal. In level triggering, when the
clock pulse is at a particular level, only then the circuit is activated. There are the following types
of level triggering:
Positive level triggering
In a positive level triggering, the signal with Logic High occurs. So, in this triggering, the circuit
is operated with such type of clock signal. Below is the diagram of positive level triggering:
Page 96 of 248
Negative level triggering
In negative level triggering, the signal with Logic Low occurs. So, in this triggering, the circuit is
operated with such type of clock signal. Below is the diagram of Negative level triggering:
Edge triggering
In clock signal of edge triggering, two types of transitions occur, i.e., transition either from Logic
Low to Logic High or Logic High to Logic Low.
Based on the transitions of the clock signal, there are the following types of edge triggering:
Positive edge triggering
The transition from Logic Low to Logic High occurs in the clock signal of positive edge
triggering. So, in positive edge triggering, the circuit is operated with such type of clock signal.
The diagram of positive edge triggering is given below.
Counters
A special type of sequential circuit used to count the pulse is known as a counter, or a collection
of flip flops where the clock signal is applied is known as counters.
Page 97 of 248
The counter is one of the widest applications of the flip flop. Based on the clock pulse, the output
of the counter contains a predefined state. The number of the pulse can be counted using the
output of the counter.
There are the following types of counters:
o Asynchronous Counters
o Synchronous Counters
Asynchronous or ripple counters
The Asynchronous counter is also known as the ripple counter. Below is a diagram of the 2-
bit Asynchronous counter in which we used two T flip-flops. Apart from the T flip flop, we can
also use the JK flip flop by setting both of the inputs to 1 permanently. The external clock pass to
the clock input of the first flip flop, i.e., FF-A and its output, i.e., is passed to clock input of the
next flip flop, i.e., FF-B.
Synchronous counters
In the Asynchronous counter, the present counter's output passes to the input of the next
counter. So, the counters are connected like a chain. The drawback of this system is that it
creates the counting delay, and the propagation delay also occurs during the counting stage.
The synchronous counter is designed to remove this drawback.
In the synchronous counter, the same clock pulse is passed to the clock input of all the flip
flops. The clock signals produced by all the flip flops are the same as each other. Below is the
diagram of a 2-bit synchronous counter in which the inputs of the first flip flop, i.e., FF-A, are
set to 1. So, the first flip flop will work as a toggle flip-flop. The output of the first flip flop is
passed to both the inputs of the next JK flip flop.
Ripple Counter
Ripple counter is a special type of Asynchronous counter in which the clock pulse ripples
through the circuit. The n-MOD ripple counter forms by combining n number of flip-flops. The
n-MOD ripple counter can count 2n states, and then the counter resets to its initial value.
Features of the Ripple Counter:
o Different types of flip flops with different clock pulse are used.
o It is an example of an asynchronous counter.
o The flip flops are used in toggle mode.
o The external clock pulse is applied to only one flip flop. The output of this flip flop is
treated as a clock pulse for the next flip flop.
Page 98 of 248
o In counting sequence, the flip flop in which external clock pulse is passed, act as LSB.
Latches
A Latch is a special type of logical circuit. The latches have low and high two stable states. Due
to these states, latches also refer to as bistable-multivibrators. A latch is a storage device that
holds the data using the feedback lane. The latch stores 1 -bit until the device set to 1. The latch
changes the stored data and constantly trials the inputs when the enable input set to 1.
Based on the enable signal, the circuit works in two states. When the enable input is high, then
both the inputs are low, and when the enable input is low, both the inputs are high.
Page 99 of 248
D (Data or Delay) flip-flop.
D flip flop is a widely used flip flop in digital systems. The D flip flop is mostly used in shift-
registers, counters, and input synchronization.
T (Toggle) flip-flop.
Just like JK flip-flop, T flip flop is used. Unlike JK flip flop, in T flip flop, there is only single
input with the clock input. The T flip flop is constructed by connecting both of the inputs of JK
flip flop together as a single input.
Examples:
(10100)2, (11011)2, (11001)2, (000101)2, (011010)2.
Decimal Number System
The decimal numbers are used in our day to day life. The decimal number system contains ten
digits from 0 to 9(base 10). Here, the successive place value or position, left to the decimal point
holds units, tens, hundreds, thousands, and so on.
The position in the decimal number system specifies the power of the base (10). The 0 is the
minimum value of the digit, and 9 is the maximum value of the digit. For example, the decimal
number 2541 consist of the digit 1 in the unit position, 4 in the tens position, 5 in the hundreds
position, and 2 in the thousand positions and the value will be written as:
(2×1000) + (5×100) + (4×10) + (1×1)
(2×103) + (5×102) + (4×101) + (1×100)
2000 + 500 + 40 + 1
2541
Octal Number System
The octal number system has base 8(means it has only eight digits from 0 to 7). There are only
eight possible digit values to represent a number. With the help of only three bits, an octal
number is represented. Each set of bits has a distinct value between 0 and 7.
Below, we have described certain characteristics of the octal number system:
Characteristics:
1. An octal number system carries eight digits starting from 0, 1, 2, 3, 4, 5, 6, and 7.
2. It is also known as the base 8 number system.
3. The position of a digit represents the 0 power of the base (8). Example: 80
4. The position of the last digit represents the x power of the base (8). Example: 8x, where x
represents the last position, i.e., 1
(0.25)10 = (.01)2
Decimal to Octal Conversion
For converting decimal to octal, there are two steps required to perform, which are as follows:
1. In the first step, we perform the division operation on the integer and the successive quotient
with the base of octal(8).
2. Next, we perform the multiplication on the integer and the successive quotient with the base
of octal (8).
Example 1: (152.25)10
Step 1:
Divide the number 152 and its successive quotients with base 8.
The microoperations most often encountered in digital computers are classified into four
categories:
Register transfer microoperations
Arithmetic microoperations (on numeric data stored in the registers)
Logic microoperations (bit manipulations on non-numeric data)
Shift microoperations
Flynn's Classification of Computers
M.J. Flynn proposed a classification for the organization of a computer system by the number of
instructions and data items that are manipulated simultaneously.
The sequence of instructions read from memory constitutes an instruction stream.
The operations performed on the data in the processor constitute a data stream.
Flynn's classification divides computers into four major groups that are:
1. Single instruction stream, single data stream (SISD)
2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)
Flynn's classification of Computers
Some well-liked Operating Systems are Linux, Windows, OS X, Solaris, Chrome OS, etc.
A program that acts as intermediary between a user of a computer and the computer hardware. A
set of programs that coordinates all activities among computer hardware resources. An operating
system is a program that acts as an interface between the user and the computer hardware and
controls the execution of all kinds of programs.
Abstraction
Applications do not need tailored for each possible device that might be present on a
system
Arbitration
Applications
Operating system
User Interface
o The part of the OS that you interface with.
Kernel
o The core of the OS. Interacts with the BIOS (at one end), and the UI (at the other
end).
File Management System
o Organizes and manages files.
Processor: It controls the processes within the computer and carries out its data
processing functions. When there is only one processor available, it is in combination
termed as the central processing unit (CPU), which you must be familiar with.
An Operating System supplies different kinds of services to both the users and to the programs as
well. It also provides application programs (that run within an Operating system) an environment
to execute it freely. It provides users the services run various programs in a convenient manner.
User Interface
Program Execution
File system manipulation
Input / Output Operations
Communication
Resource Allocation
Error Detection
Accounting
Security and protection
This chapter will give a brief description of what services an operating system usually provides
to users and those programs that are and will be running within it.
Usually, Operating system comes in three forms or types. Depending on the interface their types
have been further subdivided. These are:
The command line interface (CLI) usually deals with using text commands and a technique for
entering those commands. The batch interface (BI): commands and directives are used to manage
those commands that are entered into files and those files get executed. Another type is the
graphical user interface (GUI): which is a window system with a pointing device (like mouse or
trackball) to point to the I/O, choose from menus driven interface and to make choices viewing
from a number of lists and a keyboard to entry the texts.
The operating system must have the capability to load a program into memory and execute that
program. Furthermore, the program must be able to end its execution, either normally or
abnormally / forcefully.
Programs need has to be read and then write them as files and directories. File handling portion
of operating system also allows users to create and delete files by specific name along with
extension, search for a given file and / or list file information. Some programs comprise of
permissions management for allowing or denying access to files or directories based on file
ownership.
A program which is currently executing may require I/O, which may involve file or other I/O
device. For efficiency and protection, users cannot directly govern the I/O devices. So, the OS
provide a means to do I/O Input / Output operation which means read or write operation with any
file.
Process needs to swap over information with other process. Processes executing on same
computer system or on different computer systems can communicate using operating system
support. Communication between two processes can be done using shared memory or via
message passing.
Resource Allocation
When multiple jobs running concurrently, resources must need to be allocated to each of them.
Resources can be CPU cycles, main memory storage, file storage and I/O devices. CPU
scheduling routines are used here to establish how best the CPU can be used.
Error Detection
Errors may occur within CPU, memory hardware, I/O devices and in the user program. For each
type of error, the OS takes adequate action for ensuring correct and consistent computing.
Accounting
This service of the operating system keeps track of which users are using how much and what
kinds of computer resources have been used for accounting or simply to accumulate usage
statistics.
Protection includes in ensuring all access to system resources in a controlled manner. For making
a system secure, the user needs to authenticate him or her to the system before using (usually via
login ID and password).
Category Name
Desktop Windows
OS X
UNIX
Linux
Chrome OS
A desktop operating system is a complete operating system that works on desktops, laptops, and
some tablets
The Macintosh operating system has earned a reputation for its ease of use Latest version is OS
X..Chrome OS is a Linux-based operating system designed to work primarily with web apps.
The operating system on mobile devices and many consumer electronics is called a mobile
operating system and resides on firmware.
Android is an open source, Linux-based mobile operating system designed by Google for
smartphones and tablets.
Windows Phone, developed by Microsoft, is a proprietary mobile operating system that runs on
some smartphones.
Example: DOS
Example: Windows
3. Multi-user multi-taking
Allows two or more users to run programs at the same time. Some operating systems permit
hundreds or even thousands of concurrent users.
Issues:
Limited memory
Slow processors
Small display screens.
Usually most features of typical OS‘s are not included at the expense of the developer.
Emphasis is on I/O operations.
Memory Management and Protection features are usually absent.
Example : kontiki os
microkernel architecture
multithreading
symmetric multiprocessing
distributed operating systems
object-oriented design
During the olden days, computer systems allowed only one program to be executed at one time.
This is why that program had complete power of the system and had access to all or most of the
The more fused or complex the operating system is, the more it is expected to do on behalf of its
users. Even though its main concern is the execution of user programs, it also requires taking
care of various system tasks which are better left outside the kernel itself. So a system must
consist of a set of processes: operating system processes, executing different system code and
user processes which will be executing user code. In this chapter, you will learn about the
processes that are being used and managed by the operating system.
What is Process?
A process is mainly a program in execution where the execution of a process must progress in
sequential order or based on some priority or algorithms. In other words, it is an entity that
represents the fundamental working that has been assigned to a system.
When a program gets loaded into the memory, it is said to as a process. This processing can be
categorized into four sections. These are:
Heap
Stack
Data
Text
Process Concept
There's a question which arises while discussing operating systems that involves when to call all
the activities of the CPU. Even on a single-user operating system like Microsoft Windows, a user
may be capable of running more than a few programs at one time like MS Word processor,
different web browser(s) and an e-mail messenger. Even when the user can execute only one
program at a time, the operating system might require maintaining its internal programmed
activities like memory management. In these respects, all such activities are similar, so we call
all of them as 'processes.'
As a process executes, it changes state. The state of a process is defined in part by the current
activity of that process. Each process may be in one of the following states:
The process model that has been discussed in previous tutorials described that a process was an
executable program that is having a single thread of control. The majority of the modern
operating systems now offer features enabling a process for containing multiple threads of
control. In this tutorial, there are many concepts associated with multithreaded computer
structures. There are many issues related to multithreaded programming and how it brings effect
on the design of any operating systems. Then you will learn about how the Windows XP and
Linux OS maintain threads at the kernel level.
A thread is a stream of execution throughout the process code having its program counter which
keeps track of lists of instruction to execute next, system registers which bind its current working
variables. Threads are also termed as lightweight process. A thread uses parallelism which
provides a way to improve application performance.
The advantages of multithreaded programming can be categorized into four major headings -
All the threads must have a relationship between them (i.e., user threads and kernel threads).
Here is a list which tells the three common ways of establishing this relationship.
In a single-processor system, only one job can be processed at a time; rest of the job must wait
until the CPU gets free and can be rescheduled. The aim of multiprogramming is to have some
process to run at all times, for maximizing CPU utilization. The idea is simple. In this case, the
process gets executed until it must wait, normally for the completion of some I/O request.
In a simple operating system, the CPU then just stands idle. All this waiting time is wasted; no
fruitful work can be performed. With multiprogramming, you can use this time to process other
jobs productively.
Whenever the CPU gets idle, the operating system (OS) has to select one of the processes in the
ready queue for execution. The selection process is performed by the short-term scheduler (also
known as CPU scheduler). The scheduler picks up a process from the processes in memory
which are ready to be executed and allocate the CPU with that process.
Preemptive Scheduling
CPU scheduling choices may take place under the following four conditions:
When a process toggles from the running state to its waiting state
When a process toggles from the running state to its ready state (an example can be when
an interrupt occurs)
When a process toggles from the waiting state to its ready state (for example, at the
completion of Input / Output)
CPU scheduling treats with the issues of deciding which of the processes in the ready queue
needs to be allocated to the CPU. There are several different CPU scheduling algorithms used
nowadays within an operating system. In this tutorial, you will get to know about some of them.
Terminology
Arrival time(AT)
The time that the CPU requires actually to complete the process
On the negative side, the average waiting time under the FCFS policy is often quite long. First-
Come, First-Served (FCFS) Scheduling
A different approach to CPU scheduling is the shortest-job-first (SJF) scheduling algorithm. This
algorithm associates with each process the length of the process‘s next CPU burst. When the
CPU is available, it is assigned to the process that has the smallest next CPU burst. If the next
CPU bursts of two processes are the same, FCFS scheduling is used to break the tie.
Round Robin algorithm is the most common of all algorithms. It uses quantum time (time
slice). Quantum time :the maximum time in which the CPU can give a process at a single point
of time. Before pause that process and move to another process inside the queue
A time quantum is generally from 10 to 100 milliseconds in length. The ready queue is treated as
a circular queue. Round Robin scheduling algorithm is preemptive. The average waiting time
under the RR policy is often long
If we use a time quantum of 4 milliseconds, then process P1 gets the first 4 milliseconds. Since it
requires another 20 milliseconds, it is preempted after the first time quantum, and the CPU is
given to the next process in the queue, process P2. Process P2 does not need 4 milliseconds, so it
quits before its time quantum expires. The CPU is then given to the next process, process P3.
Once each process has received 1 time quantum, the CPU is returned to process P1 for an
additional time quantum. The resulting RR schedule is as follows
Let‘s calculate the average waiting time for this schedule. P1 waits for 6 milliseconds (10 - 4),
P2 waits for 4 milliseconds, and P3 waits for 7 milliseconds. Thus, the average waiting time is
17/3 = 5.66 milliseconds.
Another class of scheduling algorithms has been created for situations in which processes are
easily classified into different groups. For example, a common division is made between
foreground (interactive) processes and background (batch) processes.
These two types of processes have different response-time requirements and so may have
different scheduling needs. In addition, foreground processes may have priority (externally
defined) over background processes
Thread Scheduling
Kernel thread scheduled onto available CPU is system-contention scope (SCS) – competition
among all threads in system.
IPC methods
socket :provides point to point communication and two way communication between two
processes
Dead Locks
System Model
A system model or structure consists of a fixed number of resources to be circulated among some
opposing processes. The resources are then partitioned into numerous types, each consisting of
some specific quantity of identical instances. Memory space, CPU cycles, directories and files,
I/O devices like keyboards, printers and CD-DVD drives are prime examples of resource types.
When a system has 2 CPUs, then the resource type CPU got two instances.
1. Request: When the request can't be approved immediately (where the case may be when
another process is utilizing the resource), then the requesting job must remain waited
until it can obtain the resource.
2. Use: The process can run on the resource (like when the resource is a printer, its
job/process is to print on the printer).
Release: The process releases the resource (like, terminating or exiting any specific
process).
A deadlock state can occur when the following four circumstances hold simultaneously within a
system:
Mutual exclusion: At least there should be one resource that has to be held in a non-
sharable manner; i.e., only a single process at a time can utilize the resource. If other
process demands that resource, the requesting process must be postponed until the
resource gets released.
Hold and wait: A job must be holding at least one single resource and waiting to obtain
supplementary resources which are currently being held by several other processes.
No preemption: Resources can't be anticipated; i.e., a resource can get released only
willingly by the process holding it, then after that, the process has completed its task.
Circular wait: The circular - wait situation implies the hold-and-wait state or condition,
and hence all the four conditions are not completely independent. They are
interconnected among each other.
Normally you can deal with the deadlock issues and situations in one of the three ways
mentioned below:
You can let the system to enter any deadlock condition, detect it, and then recover.
You can overlook the issue altogether and assume that deadlocks never occur within the
system.
In this chapter, you will learn about the various working capabilities of IPC (Inter-process
communication) within an Operating system along with usage. Processes executing concurrently
in the operating system might be either independent processes or cooperating processes. A
process is independent if it cannot be affected by the other processes executing in the system.
There are numerous reasons for providing an environment or situation which allows process co-
operation:
Information sharing: Since some users may be interested in the same piece of information
(for example, a shared file), you must provide a situation for allowing concurrent access
to that information.
Computation speedup: If you want a particular work to run fast, you must break it into
sub-tasks where each of them will get executed in parallel with the other tasks. Note that
such a speed-up can be attained only when the computer has compound or various
processing elements like CPUs or I/O channels.
Modularity: You may want to build the system in a modular way by dividing the system
functions into split processes or threads.
Convenience: Even a single user may work on many tasks at a time. For example, a user
may be editing, formatting, printing, and compiling in parallel.
2. message passing.
In the shared-memory model, a region of memory which is shared by cooperating processes gets
established. Processes can be then able to exchange information by reading and writing all the
data to the shared region. In the message-passing form, communication takes place by way of
messages exchanged among the cooperating processes.
Shared Memory
Interprocess communication (IPC) usually utilizes shared memory that requires communicating
processes for establishing a region of shared memory. Typically, a shared-memory region resides
within the address space of any process creating the shared memory segment. Other processes
that wish for communicating using this shared-memory segment must connect it to their address
space.
Note that, normally what happens, the operating system tries to check one process from
accessing other's process's memory. Shared memory needs that two or more processes agree to
remove this limitation. They can then exchange information via reading and writing data within
the shared areas.
The form of the data and the location gets established by these processes and are not under the
control of the operating system. The processes are also in charge to ensure that they are not
writing to the same old location simultaneously.
Basic Hardware
Main memory and different registers built inside the processor itself are the only primary storage
that the CPU can have the right to use directly by accessing. There are some machine
instructions which take memory addresses as arguments or values, but none of them take disk
addresses. So, any instructions in implementation and any data which is used by the instructions
should have to be in one of these direct accessing storage devices. When the data are not in
memory, they have to be moved there before the CPL can work on them.
Registers which are built into the CPU are accessible within one single cycle of the CPU clock.
Most CPUs' can interpret those instructions and carry out simple operations on register contents
at the rate of 1 or more process per clock tick. The same may not be said for main memory,
which gets accessed via a transaction on the memory bus.
Usually, a program inhabits on a disk in a binary executable form of a file. For executing, the
program must be fetched into memory and positioned within a process (list in the queue).
Depending on the usage of memory management, the process may get moved between disk and
memory at the time of its execution. The processes on the disk are then waiting to be brought
into main memory for implementing form the input queue. The normal method is to choose any
one of the processes in the input queue and to load that process into the memory.
As the process gets executed, it is able now to access instructions and data from memory.
Ultimately, the process expires, and its memory space is declared as available/free. Most systems
let user process to exist in any part of the physical memory. Therefore, even if the address space
of the computer begins at 00000, the first address of the user process need not have to be 00000.
This approach can affect the addresses which the user program can use.
Normally, the binding of instructions and data onto memory addresses can be done at any of the
step given below:
Compile time: Compile time is the phase where the process will reside in memory and
eventually absolute code can be generated.
Load time: At compile time, when the process will reside in memory, the compiler must
generate relocatable code. In that case, final binding gets delayed until load time.
Execution time: Execution time is the time that a program or instruction takes for
executing a particular task.
Virtual Memory
In this chapter, you will gather knowledge about what virtual memory is and how they are being
managed within the operating system, along with its working. Virtual memory is a technical
concept that lets the execution of different processes which are not totally in memory. One main
benefit of this method is that programs can be larger than the physical memory.
Also, virtual memory abstracts primary memory into a very large, consistent array of storage that
divides logical memory as viewed by the user from that of physical memory. This technique is
used to free programmers from the anxiety of memory-storage limitations.
Virtual memory also permits processes for sharing files easily and for implementing shared
memory. Moreover, it offers a well-organized mechanism for process creation. Virtual memory
is not that easy to apply and execute. However, this technique may substantially decrease
performance if it is not utilized carefully.
Think of how an executable program could have loaded from within a disk into its memory. One
choice would be to load the complete program in physical memory at a program at the time of
execution. However, there is a problem with this approach, which you may not at first need the
entire program in memory. So the memory gets occupied unnecessarily.
An alternative way is to load pages only when they are needed/required initially. This method is
termed as demand paging and is commonly utilized in virtual memory systems. Using this
demand-paged virtual memory, pages gets only loaded as they are demanded at the time of
program execution; pages which are never accessed will never load into physical memory.
A demand - paging scheme is similar to a paging system with swapping feature where processes
exist in secondary memory (typically in a disk). As you want to execute any process, you swap it
into memory internally. Rather than swapping the complete process into memory, you can use a
"lazy swapper." A "lazy swapper" in no way swaps a page into memory unnecessarily unless that
page required for execution.
The hardware required for supporting demand paging is the same that is required for paging and
swapping:
Page table: Page table can mark an entry invalid or unacceptable using a valid-invalid bit.
Secondary memory: Secondary memory retains those pages which are not there in main
memory. The secondary memory is generally a high-speed disk. It is also known as a
swap device, and the segment of disk used for this purpose is termed as swap space.
In this chapter, you will learn about the different file tribute, concepts of file and its storage
along with operations on files.
File Attributes
A file is named, for the ease of its users and is referred by its name. A name is usually a string of
characters like filename.cpp, along with an extension which designates the file format. Some
systems (like Linux) distinguish between uppercase and lowercase characters in names, whereas
other systems don't. When a file is given a name, it becomes independent of the process, the user
and also the system which created it. Let's suppose, one user might make the file filename.cpp,
and another user might be editing that file by deducing its name. The file's owner may write the
file to a compact disk (CD) or send it via an e-mail or copy it across a network, and it could still
be called filename.cpp on the destination system.
A file's attributes vary from one operating system to another but typically consist of these:
Name: Name is the symbolic file name and is the only information kept in human
readable form.
Identifier: This unique tag is a number that identifies the file within the file system; it is
in non-human-readable form of the file.
Type: This information is needed for systems which support different types of files or its
format.
Location: This information is a pointer to a device which points to the location of the file
on the device where it is stored.
Size: The current size of the file (which is in bytes, words, etc.) which possibly the
maximum allowed size gets included in this attribute.
Protection: Access-control information establishes who can do the reading, writing,
executing, etc.
Date, Time & user identification: This information might be kept for the creation of the
file, its last modification and last used. These data might be useful for in the field of
protection, security, and monitoring its usage.
File Operations
A file is an abstract data type. For defining a file properly, we need to consider the operations
that can be performed on files. The operating system can provide system calls to create, write,
read, reposition, delete, and truncate files. There are six basic file operations within an Operating
system. These are:
Creating a file: There are two steps necessary for creating a file. First, space in the file
system must be found for the file. We discuss how to allocate space for the file. Second,
an entry for the new file must be made in the directory.
Writing a file: To write to a file, you make a system call specify about both the name of
the file along with the information to be written to the file.
Reading a file: To read from a file, you use a system call which specifies the name of the
file and where within memory the next block of the file should be placed.
The three major jobs of a computer are Input, Output, and Processing. In a lot of cases, the most
important job is Input / Output, and the processing is simply incidental. For example, when you
browse a web page or edit any file, our immediate attention is to read or enter some information,
not for computing an answer. The primary role of the operating system in computer Input /
Output is to manage and organize I/O operations and all I/O devices. In this chapter, you will
learn about the various uses of input output devices concerning the operating system.
The controlling of various devices that are connected to the computer is a key concern of
operating-system designers. This is because I/O devices vary so widely in their functionality and
speed (for example a mouse, a hard disk and a CD-ROM), varied methods are required for
controlling them. These methods form the I/O sub-system of the kernel of OS that separates the
rest of the kernel from the complications of managing I/O devices.
I/O Hardware
Computers operate many huge kinds of devices. The general categories of storage devices are
like disks, tapes, transmission devices (like network interface cards, modems) and human
interface devices (like screen, keyboard, etc.).
Figure: Types AI
Artificial Intelligence - Based on functionality
1. Reactive Machines
o Purely reactive machines are the most basic types of Artificial Intelligence.
o Such AI systems do not store memories or past experiences for future actions.
o These machines only focus on current scenarios and react on it as per possible best
action.
o IBM's Deep Blue system is an example of reactive machines.
Why AI Now?
One of the greatest innovators in the field of machine learning was John McCarthy, widely
recognized as the "Father of Artificial Intelligence".
In the mid-1950s, McCarthy coined the term "Artificial Intelligence" and defined it as "the
science of making intelligent machines".
The algorithms have been here since then. Why is AI more interesting now?
The answer is:
Computing power has not been strong enough
Computer storage has not been large enough
Big data has not been available
Fast Internet has not been available
Another strong force is the major investments from big companies (Google, Microsoft,
Facebook, and YouTube) because their datasets became much too big to handle traditionally.
Interesting Questions
Studying AI raises many interesting questions:
"Can computers think like humans?"
"Can computers be smarter than humans?"
"Can computers take over the world?"
1957 First programming language for numeric and scientific computing (FORTRAN)
1959 John McCarthy and Marvin Minsky founded the MIT Artificial Intelligence Project
Agent Terminology
Depth-First Search
It is implemented in recursion with LIFO stack data structure. It creates the same set of nodes as
Breadth-First method, only in the different order.
As the nodes on the single path are stored in each iteration from root to leaf node, the space
requirement to store nodes is linear. With branching factor b and depth as m, the storage space
is bm.
Disadvantage − this algorithm may not terminate and go on infinitely on one path. The solution
to this issue is to choose a cut-off depth. If the ideal cut-off is d, and if chosen cut-off is lesser
than d, then this algorithm may fail. If chosen cut-off is more than d, then execution time
increases.
Its complexity depends on the number of paths. It cannot check duplicate nodes.
end
Logical Connectives: Logical connectives are used to connect two simpler propositions or
representing a sentence logically. We can create compound propositions with the help of logical
connectives. There are mainly five connectives, which are given as follows:
Humans are best at understanding, reasoning, and interpreting knowledge. Human knows things,
which is knowledge and as per their knowledge they perform various actions in the real
world. But how machines do all these things comes under knowledge representation and
reasoning. Hence we can describe Knowledge representation as following:
1. Declarative Knowledge:
o Declarative knowledge is to know about something.
o It includes concepts, facts, and objects.
o It is also called descriptive knowledge and expressed in declarative sentences.
o It is simpler than procedural language.
2. Procedural Knowledge
o It is also known as imperative knowledge.
o Procedural knowledge is a type of knowledge which is responsible for knowing how to
do something.
o It can be directly applied to any task.
o It includes rules, strategies, procedures, agendas, etc.
o Procedural knowledge depends on the task on which it can be applied.
3. Meta-knowledge:
o Knowledge about the other types of knowledge is called Meta-knowledge.
4. Heuristic knowledge:
The above diagram is showing how an AI system can interact with the real world and what
components help it to show intelligence. AI system has Perception component by which it
retrieves information from its environment. It can be visual, audio or another form of sensory
input. The learning component is responsible for learning from data captured by Perception
comportment. In the complete cycle, the main components are knowledge representation and
Reasoning. These two components are involved in showing the intelligence in machine-like
humans. These two components are independent with each other but also coupled together. The
planning and execution depend on analysis of Knowledge representation and reasoning.
Approaches to knowledge representation:
There are mainly four approaches to knowledge representation, which are given below:
Machine Learning is said as a subset of artificial intelligence that is mainly concerned with the
development of algorithms which allow a computer to learn from the data and past experiences
on their own. The term machine learning was first introduced by Arthur Samuel in 1959. We
can define it in a summarized way as:
“Machine learning enables a machine to automatically learn from data, improve performance
from experiences, and predict things without being explicitly programmed”
With the help of sample historical data, which is known as training data, machine learning
algorithms build a mathematical model that helps in making predictions or decisions without
being explicitly programmed. Machine learning brings computer science and statistics together
for creating predictive models. Machine learning constructs or uses the algorithms that learn
from historical data. The more we will provide the information, the higher will be the
performance.
11.. 12.C 13. 14. 15. 16. 17. 18. 19. 20.
21.A 22. 23. 24. 25. 26. 27. 28. 29. 30.
31. 32. 33. 34. 35. 36. 37. 38. 39. 40.
41. 42. 43. 44. 45. 46. 47. 48. 49. 50.
Languages
Activity 1.6
Explain the similarity and difference between natural and formal languages?
Define the strings?
What is concatenation?
We are all familiar with the notion of natural languages, such as English. Dictionaries define the
term informally as a system suitable for the expression of certain ideas, facts, or concepts,
including a set of symbols and rules for their manipulation. But, this is not sufficient as a
definition for the study of formal languages. We need a precise definition for the term.
To define language formally, we start with a finite, nonempty set Σ of symbols, called the
alphabet. From the individual symbols we construct strings, which are finite sequences of
symbols from the alphabet. The se of strings is called language. For example, if the alphabet Σ =
{a, b}, then abab and aaabbba are strings on Σ. In this module we will use lowercase letters a, b,
c,…for elements of Σ and the lettrs u, υ, w,…for string names. For example for assigning strings
to a letter will write:
w = abaaa,
to indicate that the string named w has the specific value abaaa.
Since by definition, the alphabet Σ is finite, we can enumerate all words (strings) over Σ. That is,
we can order all words (string) by sorting them in a right-infinite sequence. This ordering can be
done in many ways, but the most standard ordering is the alphabetical enumeration. For instance,
for the binary alphabet Σ = {0, 1}, the set Σ* of all words over Σ is sorted by alphabetical
enumeration like this:
Definition 1.5: Let A and B be languages. We define the operations union, concatenation, and
star-closure as follows:
You are already familiar with the union operation. It simply takes all the strings in both A and B
and lumps them together into one language.
The concatenation operation is a little trickier. It attaches a string from A in front of a string from
B in all possible ways to get the strings in the new language.
The star operation is a bit different from the other two because it applies to a single language
rather than to two different languages. That is, the star operation is a unary operation instead of a
binary operation. It works by attaching any number of strings in A together to get a string in the
new language. Because ―any number‖ includes 0 as a possibility, the empty string is always a
member of A∗, no matter what A is.
Example 1.7
Let the alphabet Σ be the standard 26 letters {a, b, . . . , z}. If A = {good, bad} and
B = {boy, girl}, then
Example 1.8
Let Σ = {a, b}. Then
∑ * = { λ,a,b,aa,ab,bb,aaa,aab,……}
The set:{a,aa,aab} is a language on Σ. Because of it has a finite number of
sentences, we call it a finite language. The set L={ anbn : n≥0} is also a
language on Σ. The strings aabb and aaaabbbb are in the language L, but the
string abb is not in L. This language is infinite. Most interesting languages are
infinite.
Since languages are sets, the operation of sets like union, intersection, and difference of two
languages are immediately defined. The complement of a language is defined with respect to Σ*;
that is, the complement of L is = ∑* - L. The reverse of a language is the set of all string
Example 1.9
If L= { anbn : n≥0} then
The reverse of L is: LR {bnan : n≥0}
4.1.1. Grammars
Activity
Define Grammar Formally?
What is linear grammar?
A grammar for the English language tells us whether a particular sentence is well-formed or not.
A typical rule of English grammar is ―a sentence can consist of a noun phrase followed by a
predicate.‖
Example 1.11
Consider the grammar: G = ({S},{a,b},S,P}, with P given by
S→aSb
S→λ then
S aSb aaSbb aabb, so we can write
S aabb
The string aabb is a sentence in the language generated by G, while aaSbb is a sentential form.
A grammar G completely defines L(G), but it may not be easy to get a very explicit description
of the language from the grammar. Here, however, the answer is fairly clear. It is not hard to
conjecture that L(G)={anbn : n 0}
Regular Grammars
Grammars are often an alternative way of specifying languages. Whenever we define a language
family through an automaton or in some other way, we are interested in knowing what kind of
grammar we can associate with the family.
Context-Free Grammars
The productions in a regular grammar are restricted in two ways: The left side must be a single variable,
while the right side has a special form. To create grammars that are more powerful, we must relax some
of these restrictions. By retaining the restriction on the left side, but permitting anything on the right, we
get context-free grammars.
Definition: A grammar G = (V, T, S, P) is said to be context-free if all productions in P have the form
A → x,
Where A ∈ V and x ∈ (V ∪ T)*. A language L is said to be context-free if and only if there is a context
free grammar G such that L= L (G).
In this chapter, we first define regular expressions as a means of representing certain subsets of
strings over and prove that regular sets are precisely those accepted by finite automata or
transition systems. We use pumping lemma for regular sets to prove that certain sets are not
regular. We then discuss closure properties of regular sets. Finally, we give the relation between
regular sets and regular grammars.
Activity 3.1
1. What are the primitive regular expressions?
2. What are the valid mathematical operator used in regular expression
3. What is regular expression?
We now consider the class of languages obtained by applying union, concatenation, and Kleene
star for finitely many times on the basis elements. These languages are known as regular
languages and the corresponding finite representations are known as regular expressions.
Definition 3.1: Let Σ be an alphabet, a regular expression r over Σ denotes a language L(r) over
Σ. Say that r is a regular expression if r is
In items 1 and 2, the regular expressions a and λ represent the languages {a} and { λ },
respectively. In item 3, the regular expression ∅ represents the empty language. In items 4, 5,
and 6, the expressions represent the languages obtained by taking the union or concatenation
of the languages r1 and r2, or the star of the language r, respectively. These symbol a, λ and
are called primitive regular expression.
Definition 3.2: If r is a regular expression, then the language represented by r is denoted by
L(r). Further, a language L is said to be regular if there is a regular expression r such that L =
L(r).
Remark
1. A regular language over an alphabet Σ is the one that can be obtained from the empty set
(∅), {λ}, and {a}, for a Σ, by finitely many applications of union, concatenation and
Kleene star.
5. Express the language L over {0,1} that contains 01 or 10 as sub-string with regular
expression?
Solution
L = { x | 01 is substring of x} { x | 10 is substring of x}
= {y01z | y,z Σ*} {u10v | u,v Σ*}
= Σ*{01} Σ* Σ*{10} Σ*
= {0,1}*{01}{0,1}* {0,1}*{10}{0,1}*
Since, Σ*, {01}, and {10} are regular by the rule that concatenation, union and kleene
operation on regular expression then L regular Language. The regular expression
represent L is given below.
(0+1)*01(0+1)* + (0+1)*10(0+1)*
6. Express with regular expression the set of all strings over {a,b} which do not
contain ab as a substring.
Solution
By analyzing the language one can observe that precisely the language is as follows.
{ bnam : n,m>=0 }
Thus, the regular expression of the language is b*a*.
Definition 3.3: Two regular expressions r1 and r2 are said to be equivalent if they represent the
same language; in which case, we write r1 r2.
Theorem 3.2: A language is regular if and only if some regular expression describes it. This
theorem has two directions. We state and prove each direction as a separate lemma.
Proof Idea: Say that we have a regular expression r describing some language L. We show how
to convert r into an NFA recognizing L. by definition, if an NFA recognizes L then L is regular.
Proof: Let‘s convert r into an NFA N. We consider the six cases in the formal definition of
regular expressions.
1. r = a, for some a ∈ Σ. Then L(r) = {a}, and the following NFA recognizes L(r).
a
Note that this machine fits the definition of an NFA but not that of a DFA because it has
some states with no exiting arrow for each possible input symbol. Of course, we could
have presented an equivalent DFA here; but an NFA is all we need for now, and it is
easier to describe.
r 1
r 2
M (r 2)
Figure 3.4: Automaton for L(r1 + r2).
5. r = r1.r2.
M (r1) M (r 2)
r r 2
6. r = r∗
M (r1)
r
*
Figure 3.6: Automaton for L(r1 ).
Automata
Activity 1.8
What is automaton?
Mention and explain the three different types of automata?
The automaton can produce output of some form. It may have a temporary storage device,
consisting of an unlimited number of cells, each capable of holding a single symbol from an
alphabet (not necessarily the same one as the input alphabet). The automaton can read and
change the contents of the storage cells. Finally, the automaton has a control unit, which can be
in any one of a finite number of internal states, and which can change state in some defined
manner. An automaton is assumed to operate in a discrete timeframe. At any given time, the
control unit is in some internal state, and the input mechanism is scanning a particular symbol on
the input file. The internal state of the control unit at the next time step is determined by the next-
state or transition function. This transition function gives the next state in terms of the current
state, the current input symbol, and the information currently in the temporary storage.
During the transition from one time interval to the next, output may be produced or the
information in the temporary storage changed. The term configuration will be used to refer to a
particular state of the control unit, input file, and temporary storage. The transition of the
automaton from one configuration to the next will be called a move. In the coming chapter we
will discuss in detail about automata, but here we are going to introduce to the different types of
automata.
TYPES AUTOMATA
Power of Automata
Finite automata are good models for computers with an extremely limited amount of memory.
We interact with such computers all the time, as they lie at the heart of various electromechanical
devices. We will take a closer look at finite automata from a mathematical perspective. We will
develop a precise definition of a finite automaton, terminology for describing and manipulating
finite automata, and theoretical results that describe their power and limitations.
Strings, by definition are finite (have only a finite number of symbols). Most languages of
interest are infinite (contain an infinite number of strings) However, in order to work with these
languages, we must be able to specify or describe them in ways that are finite. Finite automata
use to describe these infinite languages. Finite automata are finite collections of states with
transition rules that take you from one state to another.
More formally, if M = (Q, Σ,δ,q0,F) is a deterministic finite automaton, then its associated
transition graph (state diagram) has exactly |Q| vertices, each one labeled with a different qi Q.
For every transition rule δ(qi,a) = qj, the graph has an edge (qi,qj) labeled a. The vertex
associated with q0 is called the initial vertex, while those labeled with qf F are the final
vertices. It is a trivial matter to convert from the (Q, Σ,δ,q0,F) formal definition of a DFA to its
transition graph representation and vice versa.
The deterministic Finite Automat will be present in three features; these representations are:
1. Instantaneous description
2. Transition graph (sate diagram)
3. Transition table
Example 2.1:
Consider a machine given by M1 = ({q0, q1, q2}, {0, 1}, , q0,{ql}), Where the transition
function (δ) (movement) is given by the following instantaneous description.
a a, b
q b q a, b q
0 1 2
The automaton in Figure 2.7 remains in its initial state q0 until the first b is encountered. If this is
also the last symbol of the input, then the string is accepted since q1 is final state. If not, the DFA
goes into state q2, from which it can never escape (such sate is called trap state). Here the state
q2 is a trap state. We see clearly from the transitional graph that the automaton accepts all
strings consisting of an arbitrary number of a's, followed by a single b. All other input strings are
rejected. In set notation, the language accepted by the automaton is
L = {anb:n≥0}.
In contrast to a DFA, where we have a unique next state for a transition from a state on an input
symbol, now we consider a finite automaton with nondeterministic transitions. A transition is
nondeterministic if there are several (possibly zero) next states from a state on an input symbol
or without any input. A transition without input is called as -transition. A nondeterministic
finite automaton is defined in the similar lines of a DFA in which transitions may be
nondeterministic.
M = (Q, Σ, δ, q0, F), where Q, Σ, q0 and F are as in a DFA; whereas, the transition function δ is
as below:
Note that there are three major differences between this definition and the definition of a DFA.
1. In a nondeterministic accepter, the range of δ is in the power set of Q (pot(Q)), 2Q, so that
its value is not a single element of Q but a subset of it. This subset defines the set of all
possible states that can be reached by the transition. If, for instance, the current state is q1,
the symbol a is read, and δ(q1,a) = {q0,q2} : then either q0 or q2 could be the next state of
the NFA.
Like DFA's, nondeterministic accepters can be represented by transition graphs. The vertices are
determined by Q, while an edge (qi,qj) with label is in the graph if and only if δ(qi; ) contains
qj. Note that since may be the empty string, there can be some edges labeled λ.
A string is accepted by an NFA if there is some sequence of possible moves that will put the
machine in a final state at the end of the string. A string is rejected (that is, not accepted) only if
there is no possible sequence of moves by which a final state can be reached. Nondeterminism
can therefore be viewed as involving ―intuitive‖ insight by which the best move can be chosen at
every state (assuming that the NFA wants to accept every string).
Note: If S is a set, then Pot (S) denotes the power set of S (2S), the set of all subsets of S. For a
finite set S of size n, Pot (S) has size 2n.
Example 2.7:
Consider the NFA figure 2.14. The NFA accepts all the words (strings) from {0,1}* that
end with 01. Note that there are two "0" arrows leaving q0, one to itself q0 and the second to q1,
which would be forbidden in a DFA.
0,1
start q 0 q 1
1 q 2
0
PUSHDOWN AUTOMATA
Activity 4.5
1. Define Push down automata formally?
2. What is the language recognized by Pushdown automata?
3. Prove that the language generated by CFG accepts by Pushdown Automata?
In this section we introduce a new type of computational model called pushdown automata.
These automata are like nondeterministic finite automata but have an extra component called a
State
control
a b b a input
Figure 4.10: Schematic of a finite automaton
x a b b a input
y
z
stack
Figure 4.11: Schematic of a pushdown automaton
A pushdown automaton (PDA) can write symbols on the stack and read them back later. Writing
a symbol ―pushes down‖ all the other symbols on the stack. At any time the symbol on the top of
the stack can be read and removed. The remaining symbols then move back up. Writing a
symbol on the stack is often referred to as pushing the symbol, and removing a symbol is
referred to as popping it. Note that all access to the stack, for both reading and writing, may be
done only at the top. In other words a stack is a ―last in, first out‖ storage device. If certain
information is written on the stack and additional information is written afterward, the earlier
information becomes inaccessible until the later information is removed.
A stack is valuable because it can hold an unlimited amount of information. Recall that a finite
automaton is unable to recognize the language {0n1n| n ≥ 0} because it cannot store very large
numbers in its finite memory. A PDA is able to recognize this language because it can use its
stack to store the number of 0s it has seen. Thus the unlimited nature of a stack allows the PDA
to store numbers of unbounded size. The following informal description shows how the
automaton for this language works.
Read symbols from the input. As each 0 is read, push it onto the stack. As soon as 1s are seen,
pop a 0 off the stack for each 1 read. If reading the input is finished exactly when the stack
becomes empty of 0s, accept the input. If the stack becomes empty while 1s remain or if the 1s
are finished while the stack still contains 0s or if any 0s appear in the input following 1s, reject
the input.
As mentioned earlier, pushdown automata may be nondeterministic. Deterministic and
nondeterministic pushdown automata are not equivalent in power.
Nondeterministic pushdown automata recognize certain languages that no deterministic
pushdown automata can recognize. Recall that deterministic and nondeterministic finite
automata do recognize the same class of languages, so the pushdown automata situation is
different. We focus on nondeterministic pushdown automata because these automata are
equivalent in power to context-free grammars.
The formal definition of a pushdown automaton is similar to that of a finite automaton, except
for the stack. The stack is a device containing symbols drawn from some alphabet. The machine
may use different alphabets for its input and its stack, so now we specify both an input alphabet
Σ and a stack alphabet Γ.
At the heart of any formal definition of an automaton is the transition function, which describes
its behavior.The domain of the transition function is Q × {Σ∪ } × {Γ∪ }. Thus the current
state,next input symbol read, and top symbol of the stack determine the next move ofa pushdown
automaton. Either symbol may be , causing the machine to movewithout reading a symbol from
the input or without reading a symbol from thestack.
Example 2.17
The following is the formal description of the PDA that recognizesthe language {0n1n| n ≥
0}. Let M1 be (Q, Σ, Γ, δ, q1, z0, F), where
Q = {q1, q2, q3, q4},
Σ = {0,1},
Γ = {0, z0},
F = {q1, q4}, and
Input: 0 1
Stack: 0 z0 0 z0 0 z0
q1 {(q2,z0)}
q2 {(q2,0)} {(q3, )}
q3 {(q3, )} {(q4, )}
q4
We can also use a state diagram to describe a PDA, as in Figures 4.12. Such diagrams are similar
to the state diagrams used to describe finite automata, modified to show how the PDA uses its
stack when going from state to state. We write ―a,b → c‖ to signify that when the machine is
reading an a from the input, it may replace the symbol b on the top of the stack with a c. Any of
a, b, and c may be . If a is , the machine may make this transition without reading any symbol
from the input. If b is ε, the machine may make this transition without reading and popping any
symbol from the stack. If c is , the machine does not write any symbol on the stack when going
along this transition.
, z0 0, 0
q 1
q 2
1,0
q q 3
1,0
, z0
4
Figure 2.12: State diagram for the PDA M1 that recognizes {0n1n| n ≥ 0}
We turn now to a much more powerful model, first proposed by Alan Turing in 1936, called the
Turing machine. Similar to a finite automaton but with an unlimited and unrestricted memory, a
Turing machine is a much more accurate model of a general-purpose computer. A Turing
machine can do everything that a real computer can do. Nonetheless, even a Turing machine
cannot solve certain problems. In a very real sense, these problems are beyond the theoretical
limits of computation.
The Turing machine model uses an infinite tape as its unlimited memory. It has a tape head that
can read and write symbols and move around on the tape. Initially the tape contains only the
input string and is blank everywhere else. If the machine needs to store information, it may write
this information on the tape. To read the information that it has written, the machine can move its
head back over it. The machine continues computing until it decides to produce an output. The
outputs accept and reject are obtained by entering accepting (final states) and rejecting states. If
it doesn‘t enter an accepting or a rejecting state, it will go on forever, never halting.
The following table shows a comparison of how a Turing machine differs from Finite
Automaton and Pushdown Automaton.
The heart of the definition of a Turing machine is the transition function δ because it tells us how
the machine gets from one step to the next. For a Turing machine, δ takes the form: Q×Γ→
Q×Γ× {L, R}. That is, when the machine is in a certain state q and the head is over a tape square
containing a symbol a, and if δ(q, a) = (r, b, L), the machine writes the symbol b replacing the a,
and goes to state r. The third component is either L or R and indicates whether the head moves to
the left or right after writing. In this case, the L indicates a move to the left.
Definition 5.1: A Turing machine is a 7-tuple, (Q, Σ, Γ, δ, q0, ,F), where Q, Σ, Γ,F are all finite
sets and
1. Q is the set of states,
2. Σ is the input alphabet not containing the blank symbol ,
3. Γ is the tape alphabet, where ∈ Γ and Σ ⊆ Γ, where
4. is blank symbol
5. δ: Q × Γ→ Q × Γ × {L, R} is the transition function,
6. q0∈ Q is the start state, and
7. F Q is the accepting states
A Turing machine M = (Q, Σ, Γ, δ, q0, F) computes as follows. Initially, M receives its input
w = w1w2 . . . wn∈ Σ* on the leftmost n squares of the tape, and the rest of the tape is blank (i.e.,
filled with blank symbols). The head starts on the leftmost square of the tape. Note that Σ does
not contain the blank symbol, so the first blank appearing on the tape marks the end of the input.
Once M has started, the computation proceeds according to the rules described by the transition
function. If M ever tries to move its head to the left off the left-hand end of the tape, the head
stays in the same place for that move, even though the transition function indicates L. The
computation continues until it enters either the accepting or rejecting states, at which point it
halts. If neither occurs, M goes on forever.
As a Turing machine computes, changes occur in the current state, the current tape contents, and
the current head location. A setting of these three items is called a configuration of the Turing
machine. Configurations often are represented in a special way. For a state q and two strings u
and v over the tape alphabet Γ, we write
uqv for the configuration where the current state is q, the current tape contents is uv, and the
current head location is the first symbol of v. The tape contains only blanks following the last
symbol of v. For example,1011q701111 represents the configuration when the tape is 101101111,
Here we formalize our intuitive understanding of the way that a Turing machine computes. Say
that configuration C1yields configuration C2 if the Turing machine can legally go from C1 to C2
in a single step. We define this notion formally as follows.
Suppose that we have a, b, and c in Γ, as well as u and v in Γ* and states qi and qj. In that case,
uaqibv and uqjacv are two configurations. Say that
uaqibv yields uqjacv
if in the transition function δ(qi, b) = (qj, c, L). That handles the case where the Turing machine
moves leftward. For a rightward move, say that
uaqibv yields uacqjv
if δ(qi,b) = (qj,c,R).
Special cases occur when the head is at one of the ends of the configuration. For the left-hand
end, the configuration qibv yields qjcv if the transition is left moving (because we prevent the
machine from going off the left-hand end of thetape), and it yields cqjv for the right-moving
transition. For the right-hand end,the configuration uaqi is equivalent to uaqi because we
assume that blanks follow the part of the tape represented in the configuration. Thus we can
handlethis case as before, with the head no longer at the right-hand end.
The start configuration of M on input w is the configuration q0w, which indicates that the
machine is in the start state q0 with its head at the leftmost position on the tape. In an accepting
configuration, the state of the configuration is accepting state. In a rejecting configuration, the
state of the configuration is the non-final states. Accepting and rejecting configurations are
halting configurations and do not yield further configurations. A Turing machine M accepts input
w if a sequence of configurations C1,C2, . . . , Ck exists, where
1. C1 is the start configuration of M on input w,
2. each Ci yields Ci+1, and
3. Ck is an accepting configuration.
The collection of strings that M accepts is the language of M, or the language recognized by M,
denoted L(M).
Activity 4.5
1. What is a compiler?
2. List and Explain steps of Compiler?
3. Why we design compiler?
The high-level language is converted into binary language in various phases. A compiler is a
program that converts high-level language to assembly language. Similarly, an assembler is a
program that converts the assembly language to machine-level language.
For example, a typical program using C compiler, is executed on a host machine in the following
manner.
Before diving straight into the concepts of compilers, we should understand a few other tools that
work closely with compilers.
Preprocessor
A preprocessor, generally considered as a part of compiler, is a tool that produces input for
compilers. It deals with macro-processing, augmentation, file inclusion, language extension, etc.
Interpreter
An interpreter, like a compiler, translates high-level language into low-level machine language.
The difference lies in the way they read the source code or input. A compiler reads the whole
source code at once, creates tokens, checks semantics, generates intermediate code, executes the
whole program and may involve many passes. In contrast, an interpreter reads a statement from
the input, converts it to an intermediate code, executes it, then takes the next statement in
sequence. If an error occurs, an interpreter stops execution and reports it. Whereas a compiler
reads the whole program even if it encounters several errors.
Assembler
An assembler translates assembly language programs into machine code. The output of an
assembler is called an object file, which contains a combination of machine instructions as well
as the data required to place these instructions in memory.
Linker
Linker is a computer program that links and merges various object files together in order to make
an executable file. All these files might have been compiled by separate assemblers. The major
task of a linker is to search and locate referenced module/routines in a program and to determine
the memory location where these codes will be loaded, making the program instruction to have
absolute references.
Cross-compiler
A compiler that runs on platform (A) and is capable of generating executable code for platform
(B) is called a cross-compiler.
Source-to-source Compiler
A compiler that takes the source code of one programming language and translates it into the
source code of another programming language is called a source-to-source compiler.
Pass: A pass refers to the traversal of a compiler through the entire program.
Phase: A phase of a compiler is a distinguishable stage, which takes input from the previous
stage, processes and yields output that can be used as input for the next stage. A pass can have
more than one phase. The compilation process is a sequence of various phases. Each phase takes
input from its previous stage, has its own representation of source program, and feeds its output
to the next phase of the compiler. Let us understand the phases of a compiler.
The first phase of a compiler is called lexical analysis or scanning. The lexical analyzer reads the
stream of characters making up the source program. And groups the characters into meaningful
sequences called lexemes. For each lexeme, the lexical analyzer produces as output a token of
the form (token-name, attribute-value) that it passes on to the subsequent phase, syntax analysis.
In the token, the first component token-name is an abstract symbol that is used during syntax
analysis, and the second component attribute-value points to an entry in the symbol table for this
token. Information from the symbol-table entry is needed for semantic analysis and code
generation.
poison=initial + rate*60
The characters in this assignment could be grouped into the following lexemes and mapped into
the following tokens passed on to the syntax analyzer:
1. Position is a lexeme that would be mapped into a token (id, I), where id is an abstract
symbol standing for identifier and 1 point to the symbol table entry for position. The
symbol-table entry for an identifier holds information about the identifier, such as its
name and type.
2. The assignment symbol = is a lexeme that is mapped into the token (=). Since this token
needs no attribute-value, we have omitted the second component. We could have used
any abstract symbol such as assign for the token-name, but for notational convenience we
have chosen to use the lexeme itself as the name of the abstract symbol.
3. Initial is a lexeme that is mapped into the token (id,2), where 2 points to the symbol-table
entry for initial .
4. + is a lexeme that is mapped into the token (+).
5. Rate is a lexeme that is mapped into the token (id,3), where 3 points to the symbol-table
entry for rate.
6. * Is a lexeme that is mapped into the token (*).
7. 60 is a lexeme that is mapped into the token (60).'
Syntax Analysis
The second phase of the compiler is syntax analysis or parsing. The parser uses the first
components of the tokens produced by the lexical analyzer to create a tree-like intermediate
representation that depicts the grammatical structure of the token stream. A typical
representation is a syntax tree in which each interior node represents an operation and the
poison=initial + rate*60
Are to be performed. The tree has an interior node labeled * with (id,3) as its left child and the
integer 60 as its right child. The node (id,3) represents the identifier rate. The node labeled *
makes it explicit that we must first multiply the value of r a t e by 60. The node labeled +
indicates that we must add the result of this multiplication to the value of initial. The root of the
tree, labeled =, indicates that we must store the result of this addition into the location for the
identifier position. This ordering of operations is consistent with the usual conventions of
arithmetic which tell us that multiplication has higher precedence than addition, and hence that
the multiplication is to be performed before the addition. The subsequent phases of the compiler
use the grammatical structure to help analyze the source program and generate the target
program.
Semantic Analysis
The semantic analyzer uses the syntax tree and the information in the symbol table to check the
source program for semantic consistency with the language definition. It also gathers type
information and saves it in either the syntax tree or the symbol table, for subsequent use during
intermediate-code generation.
An important part of semantic analysis is typing checking, where the compiler checks that each
operator has matching operands. For example, many programming language definitions require
an array index to be an integer; the compiler must report an error if a floating-point number is
used to index an array.
The language specification may permit some type conversions called coercions. For example, a
binary arithmetic operator may be applied to either a pair of integers or to a pair of floating-point
numbers. If the operator is applied to a floating-point number and an integer, the compiler may
convert or coerce the integer into a floating-point number
Semantic analysis checks whether the parse tree constructed follows the rules of language. For
example, assignment of values is between compatible data types, and adding string to an integer.
After syntax and semantic analysis of the source program, many compilers generate an explicit
low-level or machine-like intermediate representation, which we can think of as a program for an
abstract machine. This intermediate representation should have two important properties:
Code Optimization
The machine-independent code-optimization phase attempts to improve the intermediate code so
that better target code will result. Usually better means faster, but other objectives may be
desired, such as shorter code, or target code that consumes less power. For example, a
straightforward algorithm generates the intermediate code (1.3), using an instruction for each
t1 = id3 * 60.0
id1 = id2 + t1
There is a great variation in the amount of code optimization different compilers perform. In
those that do the most, the so-called "optimizing compilers," a significant amount of time is
spent on this phase. There are simple optimizations that significantly improve the running time
of the target program without slowing down compilation too much Code Generation
Code Generation
The code generator takes as input an intermediate representation of the source program and maps
it into the target language. If the target language is machine code, registers or memory locations
are selected for each of the variables used by the program. Then, the intermediate instructions are
translated into sequences of machine instructions that perform the same task. A crucial aspect of
code generation is the judicious assignment of registers to hold variables. For example, using
registers R 1and R2,the intermediate code in (1.4) might get translated into the machine code
LDF R2 , id3
MULF R2 , R2 , #60.0
LDF Rl , id2
ADDF Rl , Rl , R2
STF idl , Rl
The first operand of each instruction specifies a destination. The F in each instruction tells us that
it deals with floating-point numbers. The code in above loads the contents of address id3 into
register R2, then multiplies it with floating-point constant 60.0. The # signifies that 60.0 is to be
treated as an immediate constant. The third instruction moves id2 into register R1 and the fourth
adds to it the value previously computed in register R2. Finally, the value in register R1 is stored
into the address of idl, so the code correctly implements the assignment statement
These tools use specialized languages for specifying and implementing specific components, and
many use quite sophisticated algorithms. The most successful tools are those that hide the details
of the generation algorithm and produce components that can be easily integrated into the
remainder of the compiler. Some commonly used compiler-construction tools include
SELF-TEST QUESTIONS
1. Assume B is a subset of S and S is universal set, Based on this assumption, which one of
the following is not equal to B
A. B- B. B C. D.B S
2. Which of the following alternative is true about Demorgan‘s Law
A. = ⋂ C. S1 S2 = ⋂
B. . = ⋃ D. = ⋃
3. If we have a relation R in the set C= {0,1,2,. ,. ,. ,20} if our relation is a divides b, so
which statement can be true about R
A. R is symmetric C. R is transitive
B. . R is equivalence D. R is not a Relation
4. If S1 and S2 are two sets. Assume both sets contain same elements, then which of the
following statement can‘t true?
A. S1 S2 B. S2 S1 C. S1 S1 D. S1 S2
5. If a set A={ 0,1,2} then, identify the false statement from the following list
A. 2A B. {0,1,2} 2A C. 2|A|=3 D. {{0,1,2}} 2A
6. When do we say a function is injective?
A. If different element in domain have different images
B. If all element in the range are image of some element in domain
C. If all element in the domain are image of some element in range
D. If different element in range have different domain
7. One of the following cannot true statement about a tree
A. A tree is a connected graph with no circuits or loops.
B. In a tree there is one and only one path between every pair of vertices.
C. A tree with n vertices has n-1 edges.
D. If a disconnected graph with n vertices has n - 1 edge, then it is a tree
8. From the following list of automata which one is the most powerful
A. Finite Automata B. Turing Machine C. Pushdown Automata D. None
9. Sequence of finite symbols, which contain variables as well as terminals, are called ____
A. Strings B. Sentence C. Sentential Form D. Word
10. If we have a grammar in the form of A → Bx, or A → x. where A, B V, x T*
A. Left Linear Grammar C. Right Linear Grammar
B. Simple Grammar D. Context-Free Grammar
a b
b b b
q 2
a
A. All sets of strings from ={a,b} that ends with substring abb
B. All sets of strings from ={a,b} that starts with substring bab
C. All sets of strings from ={a,b} that ends with substring bab
D. All sets of strings from ={a,b} that starts with substring ab
12. Which of the following strings accepted by the finite automata in question 1
A. aaababaab C. bbbbabab
B. aaabbbaaabba D. aaabbaabbabb
13. From the following lists of automata, which one is with memory?
A. FA B. DFA C. NFA D. PDA
14. Considering the transition graph below, the language accepted by it is
q 0
a q b q b q
1 3 4
b a a
a, b
q 2
a, b
A. The set of all strings (a, b)* starting with ab
B. Set of all string (a, b)* starting with abb
C. The set of strings contains {abb}
D. set of strings contains (ab, abb}
15. What type of finite automata is the graph in question 4
A. Deterministic C. Nondeterministic
B. Context free D. Derivation tree
0,1 0,
A. 0110 B. 01001 C. 100100000 D. 000000
Considering the automaton given below and choose the correct answers for Questions 7-9:
0
0,1
a, b 1 a
1 1
q 2
0
17. M is a _____________
A. nondeterministic finite automaton
B. deterministic finite automaton accepting {0,1}*
C. DFA accepting all strings over {0,1} having 3m 0's and 3n 1's, m, n >=1
D. Deterministic finite automaton
18. M accepts
A.01110 B. 10001 C.01010 D. 11111
19. L (M) is equal to
A. {03m 13n| m,n>=0}
B. {03m 13n| m,n>=1}
C. {w| w has 111 as substring}
D. {w| w has 3n 1‘s, n>=1}
20. Which of the following statements are acceptable?
A. Automata is a complex model of digital computer
B. If a Language is accepted by any DFA there is equivalent NFA that accepts it.
C. Many languages may accept by single machine
D. Some regular languages may not have finite automata
21. The regular expression r= b*ab*ab* represents
A. All sets of strings from ={a,b} that have exactly two a‘s
B. All sets of strings from ={a,b} that have at least two a‘s
C. All sets of strings from ={a,b} that have at most two a‘s
D. All are possible answers