Dbms Os Network
Dbms Os Network
1. What is Data?
a. Data is a collection of raw, unorganized facts and details like text, observations, figures, symbols,
and descriptions of things etc.
In other words, data does not carry any specific purpose and has no significance by itself.
Moreover, data is measured in terms of bits and bytes – which are basic units of information in the
context of computer storage and processing.
b. Data can be recorded and doesn’t have any meaning unless processed.
2. Types of Data
a. Quantitative
i. Numerical form
ii. Weight, volume, cost of an item.
b. Qualitative
i. Descriptive, but not numerical.
p
ii. Name, gender, hair color of a person.
3. What is Information?
a. Info. Is processed, organized, and structured data.
el
b. It provides context of the data and enables decision making.
c. Processed data that make sense to us.
d. Information is extracted from the data, by analyzing and interpreting pieces of data.
e. E.g., you have data of all the people living in your locality, its Data, when you analyze and interpret
the data and come to some conclusion that:
eH
i. There are 100 senior citizens.
ii. The sex ratio is 1.1.
iii. Newborn babies are 100.
These are information.
4. Data vs Information
a. Data is a collection of facts, while information puts those facts into context.
b. While data is raw and unorganized, information is organized.
od
c. Data points are individual and sometimes unrelated. Information maps out that data to provide a
big-picture view of how it all fits together.
d. Data, on its own, is meaningless. When it’s analyzed and interpreted, it becomes meaningful
information.
e. Data does not depend on information; however, information depends on data.
f. Data typically comes in the form of graphs, numbers, figures, or statistics. Information is typically
presented through words, language, thoughts, and ideas.
C
g. Data isn’t sufficient for decision-making, but you can make decisions based on information.
5. What is Database?
a. Database is an electronic place/system where data is stored in a way that it can be easily accessed,
managed, and updated.
b. To make real use Data, we need Database management systems. (DBMS)
6. What is DBMS?
a. A database-management system (DBMS) is a collection of interrelated data and a set of
programs to access those data. The collection of data, usually referred to as the database,
contains information relevant to an enterprise. The primary goal of a DBMS is to provide a way to
store and retrieve database information that is both convenient and efficient.
b. A DBMS is the database itself, along with all the software and functionality. It is used to perform
different operations, like addition, access, updating, and deletion of the data.
7.
p
a. File-processing systems has major disadvantages.
i. Data Redundancy and inconsistency
ii. Difficulty in accessing data
el
iii. Data isolation
iv. Integrity problems
v. Atomicity problems
vi. Concurrent-access anomalies
vii. Security problems
eH
b. Above 7 are also the Advantages of DBMS (answer to “Why to use DBMS?”)
od
C
LEC-2: DBMS Architecture
p
e. Logical level / Conceptual level:
i. The conceptual schema describes the design of a database at the conceptual level,
describes what data are stored in DB, and what relationships exist among those data.
el
ii. User at logical level does not need to be aware about physical-level structures.
iii. DBA, who must decide what information to keep in the DB use the logical level of
abstraction.
iv. Goal: ease to use.
f. View level / External level:
eH
i. Highest level of abstraction aims to simplify users’ interaction with the system by
providing different view to different end-user.
ii. Each view schema describes the database part that a particular user group is interested
and hides the remaining database from that user group.
iii. At the external level, a database contains several schemas that sometimes called as
subschema. The subschema is used to describe the different view of the database.
iv. At views also provide a security mechanism to prevent users from accessing certain parts
od
of DB.
C
g.
2. Instances and Schemas
a. The collection of information stored in the DB at a particular moment is called an instance of DB.
b.The overall design of the DB is called the DB schema.
c.Schema is structural description of data. Schema doesn’t change frequently. Data may change
frequently.
d. DB schema corresponds to the variable declarations (along with type) in a program.
e. We have 3 types of Schemas: Physical, Logical, several view schemas called subschemas.
f. Logical schema is most important in terms of its effect on application programs, as programmers
construct apps by using logical schema.
g. Physical data independence, physical schema change should not affect logical
schema/application programs.
3. Data Models:
a. Provides a way to describe the design of a DB at logical level.
b. Underlying the structure of the DB is the Data Model; a collection of conceptual tools for describing
data, data relationships, data semantics & consistency constraints.
c. E.g., ER model, Relational Model, object-oriented model, object-relational data model etc.
4. Database Languages:
p
a. Data definition language (DDL) to specify the database schema.
b. Data manipulation language (DML) to express database queries and updates.
c. Practically, both language features are present in a single DB language, e.g., SQL language.
el
d. DDL
i. We specify consistency constraints, which must be checked, every time DB is updated.
e. DML
i. Data manipulation involves
1. Retrieval of information stored in DB.
eH
2. Insertion of new information into DB.
3. Deletion of information from the DB.
4. Updating existing information stored in DB.
ii. Query language, a part of DML to specify statement requesting the retrieval of
information.
5. How is Database accessed from Application programs?
a. Apps (written in host languages, C/C++, Java) interacts with DB.
od
b. E.g., Banking system’s module generating payrolls access DB by executing DML statements from
the host language.
c. API is provided to send DML/DDL statements to DB and retrieve the results.
i. Open Database Connectivity (ODBC), Microsoft “C”.
ii. Java Database Connectivity (JDBC), Java.
6. Database Administrator (DBA)
a. A person who has central control of both the data and the programs that access those data.
C
b. Functions of DBA
i. Schema Definition
ii. Storage structure and access methods.
iii. Schema and physical organization modifications.
iv. Authorization control.
v. Routine maintenance
1. Periodic backups.
2. Security patches.
3. Any upgrades.
7. DBMS Application Architectures: Client machines, on which remote DB users work, and server machines
on which DB system runs.
a. T1 Architecture
i. The client, server & DB all present on the same machine.
b. T2 Architecture
i. App is partitioned into 2-components.
ii. Client machine, which invokes DB system functionality at server end through query
language statements.
iii. API standards like ODBC & JDBC are used to interact between client and server.
c. T3 Architecture
i. App is partitioned into 3 logical components.
ii. Client machine is just a frontend and doesn’t contain any direct DB calls.
iii. Client machine communicates with App server, and App server communicated with DB
system to access data.
iv. Business logic, what action to take at that condition is in App server itself.
v. T3 architecture are best for WWW Applications.
vi. Advantages:
1. Scalability due to distributed application servers.
2. Data integrity, App server acts as a middle layer between client and DB, which
p
minimize the chances of data corruption.
3. Security, client can’t directly access DB, hence it is more secure.
el
eH
od
C
LEC-3: Entity-Relationship Model
1. Data Model: Collection of conceptual tools for describing data, data relationships, data semantics, and consistency
constraints.
2. ER Model
1. It is a high level data model based on a perception of a real world that consists of a collection of basic objects, called
entities and of relationships among these objects.
2. Graphical representation of ER Model is ER diagram, which acts as a blueprint of DB.
3. Entity: An Entity is a “thing” or “object” in the real world that is distinguishable from all other objects.
1. It has physical existence.
2. Each student in a college is an entity.
3. Entity can be uniquely identified. (By a primary attribute, aka Primary Key)
4. Strong Entity: Can be uniquely identified.
5. Weak Entity: Can’t be uniquely identified., depends on some other strong entity.
1. It doesn’t have sufficient attributes, to select a uniquely identifiable attribute.
2. Loan -> Strong Entity, Payment -> Weak, as instalments are sequential number counter can be generated
p
separate for each loan.
3. Weak entity depends on strong entity for existence.
4. Entity set
el
1. It is a set of entities of the same type that share the same properties, or attributes.
2. E.g., Student is an entity set.
3. E.g., Customer of a bank
5. Attributes
eH
1. An entity is represented by a set of attributes.
2. Each entity has a value for each of its attributes.
3. For each attribute, there is a set of permitted values, called the domain, or value set, of that attribute.
4. E.g., Student Entity has following attributes
A. Student_ID
B. Name
C. Standard
D. Course
od
E. Batch
F. Contact number
G. Address
5. Types of Attributes
1. Simple
1. Attributes which can’t be divided further.
C
p
2. Unary, Only one entity participates. e.g., Employee manages employee.
3. Binary, two entities participates. e.g., Student takes Course.
4. Ternary relationship, three entities participates. E.g, Employee works-on branch, employee works-on job.
el
5. Binary are common.
7. Relationships Constraints
1. Mapping Cardinality / Cardinality Ratio
1. Number of entities to which another entity can be associated via a relationship.
eH
2. One to one, Entity in A associates with at most one entity in B, where A & B are entity sets. And an entity
of B is associated with at most one entity of A.
1. E.g., Citizen has Aadhar Card.
3. One to many, Entity in A associated with N entity in B. While entity in B is associated with at most one
entity in A.
1. e.g., Citizen has Vehicle.
4. Many to one, Entity in A associated with at most one entity in B. While entity in B can be associated with
N entity in A.
od
1. Basic ER Features studied in the LEC-3, can be used to model most DB features but when complexity increases, it is
better to use some Extended ER features to model the DB Schema.
2. Specialisation
1. In ER model, we may require to subgroup an entity set into other entity sets that are distinct in some way with other
entity sets.
2. Specialisation is splitting up the entity set into further sub entity sets on the basis of their functionalities,
specialities and features.
3. It is a Top-Down approach.
4. e.g., Person entity set can be divided into customer, student, employee. Person is superclass and other specialised
entity sets are subclasses.
1. We have “is-a” relationship between superclass and subclass.
2. Depicted by triangle component.
5. Why Specialisation?
1. Certain attributes may only be applicable to a few entities of
p
the parent entity set.
2. DB designer can show the distinctive features of the sub entities.
3. To group such entities we apply Specialisation, to overall refine the DB blueprint.
el
3. Generalisation
1. It is just a reverse of Specialisation.
2. DB Designer, may encounter certain properties of two entities are overlapping. Designer may consider to make a
new generalised entity set. That generalised entity set will be a super class.
eH
3. “is-a” relationship is present between subclass and super class.
4. e.g., Car, Jeep and Bus all have some common attributes, to avoid data repetition for the common attributes. DB
designer may consider to Generalise to a new entity set “Vehicle”.
5. It is a Bottom-up approach.
6. Why Generalisation?
1. Makes DB more refined and simpler.
2. Common attributes are not repeated.
4. Attribute Inheritance
od
1. Relational Model (RM) organises the data in the form of relations (tables).
2. A relational DB consists of collection of tables, each of which is assigned a unique name.
3. A row in a table represents a relationship among a set of values, and table is collection of such relationships.
4. Tuple: A single row of the table representing a single data point / a unique record.
5. Columns: represents the attributes of the relation. Each attribute, there is a permitted value, called domain of the
attribute.
6. Relation Schema: defines the design and structure of the relation, contains the name of the relation and all the
columns/attributes.
7. Common RM based DBMS systems, aka RDBMS: Oracle, IBM, MySQL, MS Access.
8. Degree of table: number of attributes/columns in a given table/relation.
9. Cardinality: Total no. of tuples in a given relation.
10. Relational Key: Set of attributes which can uniquely identify an each tuple.
11. Important properties of a Table in Relational Model
1. The name of relation is distinct among all other relation.
p
2. The values have to be atomic. Can’t be broken down further.
3. The name of each attribute/column must be unique.
4. Each tuple must be unique in a table.
el
5. The sequence of row and column has no significance.
6. Tables must follow integrity constraints - it helps to maintain data consistency across the tables.
12. Relational Model Keys
1. Super Key (SK): Any P&C of attributes present in a table which can uniquely identify each tuple.
eH
2. Candidate Key (CK): minimum subset of super keys, which can uniquely identify each tuple. It contains no
redundant attribute.
1. CK value shouldn’t be NULL.
3. Primary Key (PK):
1. Selected out of CK set, has the least no. of attributes.
4. Alternate Key (AK)
1. All CK except PK.
5. Foreign Key (FK)
od
p
primary key must contain unique as well as not null values.
6. FOREIGN KEY: Whenever there is some relationship between two entities, there must be some common
attribute between them. This common attribute must be the primary key of an entity set and will become the
el
foreign key of another entity set. This key will prevent every action which can result in loss of connection
between tables.
eH
od
C
LEC-8: Transform - ER Model to Relational Model
1. Both ER-Model and Relational Model are abstract logical representation of real world enterprises. Because the two
models implies the similar design principles, we can convert ER design into Relational design.
2. Converting a DB representation from an ER diagram to a table format is the way we arrive at Relational DB-design from
an ER diagram.
3. ER diagram notations to relations:
1. Strong Entity
1. Becomes an individual table with entity name, attributes becomes columns of the relation.
2. Entity’s Primary Key (PK) is used as Relation’s PK.
3. FK are added to establish relationships with other relations.
2. Weak Entity
1. A table is formed with all the attributes of the entity.
2. PK of its corresponding Strong Entity will be added as FK.
3. PK of the relation will be a composite PK, {FK + Partial discriminator Key}.
3. Single Values Attributes
p
1. Represented as columns directly in the tables/relations.
4. Composite Attributes
1. Handled by creating a separate attribute itself in the original relation for each composite attribute.
el
2. e.g., Address: {street-name, house-no}, is a composite attribute in customer relation, we add address-street-
name & address-house-name as new columns in the attribute and ignore “address” as an attribute.
5. Multivalued Attributes
1. New tables (named as original attribute name) are created for each multivalued attribute.
eH
2. PK of the entity is used as column FK in the new table.
3. Multivalued attribute’s similar name is added as a column to define multiple values.
4. PK of the new table would be {FK + multivalued name}.
5. e.g., For Strong entity Employee, dependent-name is a multivalued attribute.
1. New table named dependent-name will be formed with columns emp-id, and dname.
2. PK: {emp-id, name}
3. FK: {emp-id}
6. Derived Attributes: Not considered in the tables.
od
7. Generalisation
1. Method-1: Create a table for the higher level entity set. For each lower-level entity set, create a table that
includes a column for each of the attributes of that entity set plus a column for each attribute of the primary key
of the higher-level entity set.
For e.g., Banking System generalisation of Account - saving & current.
1. Table 1: account (account-number, balance)
C
p
el
eH
od
C
LEC-9: SQL in 1-Video
p
1. SQL is Structured Query language used to perform CRUD operations in R-DB, while MySQL is a RDBMS used to
store, manage and administrate DB (provided by itself) using SQL.
el
SQL DATA TYPES (Ref: https://www.w3schools.com/sql/sql_datatypes.asp)
1. In SQL DB, data is stored in the form of tables.
2. Data can be of different types, like INT, CHAR etc.
DATATYPE Description
eH
CHAR string(0-255), string with size = (0, 255], e.g.,
CHAR(251)
VARCHAR string(0-255)
TINYTEXT String(0-255)
TEXT string(0-65535)
od
BLOB string(0-65535)
MEDIUMTEXT string(0-16777215)
MEDIUMBLOB string(0-16777215)
LONGTEXT string(0-4294967295)
C
LONGBLOB string(0-4294967295)
DATE YYYY-MM-DD
TIMESTAMP YYYYMMDDHHMMSS
TIME HH:MM:SS
BOOLEAN 0/1
p
3. Size: TINY < SMALL < MEDIUM < INT < BIGINT.
4. Variable length Data types e.g., VARCHAR, are better to use as they occupy space equal to the actual data size.
5. Values can also be unsigned e.g., INT UNSIGNED.
el
6. Types of SQL commands:
1. DDL (data definition language): defining relation schema.
1. CREATE: create table, DB, view.
2. ALTER TABLE: modification in table structure. e.g, change column datatype or add/remove columns.
eH
3. DROP: delete table, DB, view.
4. TRUNCATE: remove all the tuples from the table.
5. RENAME: rename DB name, table name, column name etc.
2. DRL/DQL (data retrieval language / data query language): retrieve data from the tables.
1. SELECT
3. DML (data modification language): use to perform modifications in the DB
1. INSERT: insert data into a relation
2. UPDATE: update relation data.
od
MANAGING DB (DDL)
1. Creation of DB
1. CREATE DATABASE IF NOT EXISTS db-name;
2. USE db-name; //need to execute to choose on which DB CREATE TABLE etc commands will be executed.
//make switching between DBs possible.
3. DROP DATABASE IF EXISTS db-name; //dropping database.
4. SHOW DATABASES; //list all the DBs in the server.
5. SHOW TABLES; //list tables in the selected DB.
DATA RETRIEVAL LANGUAGE (DRL)
1. Syntax: SELECT <set of column names> FROM <table_name>;
2. Order of execution from RIGHT to LEFT.
3. Q. Can we use SELECT keyword without using FROM clause?
1. Yes, using DUAL Tables.
2. Dual tables are dummy tables created by MySQL, help users to do certain obvious actions without referring to user
defined tables.
3. e.g., SELECT 55 + 11;
SELECT now();
SELECT ucase(); etc.
4. WHERE
1. Reduce rows based on given conditions.
2. E.g., SELECT * FROM customer WHERE age > 18;
5. BETWEEN
1. SELECT * FROM customer WHERE age between 0 AND 100;
2. In the above e.g., 0 and 100 are inclusive.
p
6. IN
1. Reduces OR conditions;
2. e.g., SELECT * FROM officers WHERE officer_name IN ('Lakshay', ‘Maharana Pratap', ‘Deepika’);
el
7. AND/OR/NOT
1. AND: WHERE cond1 AND cond2
2. OR: WHERE cond1 OR cond2
3. NOT: WHERE col_name NOT IN (1,2,3,4);
eH
8. IS NULL
1. e.g., SELECT * FROM customer WHERE prime_status is NULL;
9. Pattern Searching / Wildcard (‘%’, ‘_’)
1. ‘%’, any number of character from 0 to n. Similar to ‘*’ asterisk in regex.
2. ‘_’, only one character.
3. SELECT * FROM customer WHERE name LIKE ‘%p_’;
10. ORDER BY
1. Sorting the data retrieved using WHERE clause.
od
CONSTRAINTS (DDL)
1. Primary Key
p
el
eH
1. PK is not null, unique and only one per table.
2. Foreign Key
1. FK refers to PK of other table.
2. Each relation can having any number of FK.
od
3. UNIQUE
1. Unique, can be null, table can have multiple unique attributes.
2. CREATE TABLE customer (
…
email VARCHAR(1024) UNIQUE,
…
);
4. CHECK
1. CREATE TABLE customer (
…
CONSTRAINT age_check CHECK (age > 12),
…
);
2. “age_check”, can also avoid this, MySQL generates name of constraint automatically.
5. DEFAULT
1. Set default value of the column.
2. CREATE TABLE account (
…
saving-rate DOUBLE NOT NULL DEFAULT 4.25,
…
);
6. An attribute can be PK and FK both in a table.
7. ALTER OPERATIONS
1. Changes schema
2. ADD
1. Add new column.
2. ALTER TABLE table_name ADD new_col_name datatype ADD new_col_name_2 datatype;
3. e.g., ALTER TABLE customer ADD age INT NOT NULL;
3. MODIFY
1. Change datatype of an attribute.
p
2. ALTER TABLE table-name MODIFY col-name col-datatype;
3. E.g., VARCHAR TO CHAR
ALTER TABLE customer MODIFY name CHAR(1024);
el
4. CHANGE COLUMN
1. Rename column name.
2. ALTER TABLE table-name CHANGE COLUMN old-col-name new-col-name new-col-datatype;
3. e.g., ALTER TABLE customer CHANGE COLUMN name customer-name VARCHAR(1024);
eH
5. DROP COLUMN
1. Drop a column completely.
2. ALTER TABLE table-name DROP COLUMN col-name;
3. e.g., ALTER TABLE customer DROP COLUMN middle-name;
6. RENAME
1. Rename table name itself.
2. ALTER TABLE table-name RENAME TO new-table-name;
3. e.g., ALTER TABLE customer RENAME TO customer-details;
od
p
JOINING TABLES
1. All RDBMS are relational in nature, we refer to other tables to get meaningful outcomes.
2. FK are used to do reference to other table.
el
3. INNER JOIN
1. Returns a resultant table that has matching values from both the tables or all the tables.
2. SELECT column-list FROM table1 INNER JOIN table2 ON condition1
INNER JOIN table3 ON condition2
eH
…;
3. Alias in MySQL (AS)
1. Aliases in MySQL is used to give a temporary name to a table or a column in a table for the purpose of
a particular query. It works as a nickname for expressing the tables or column names. It makes the query short
and neat.
2. SELECT col_name AS alias_name FROM table_name;
3. SELECT col_name1, col_name2,... FROM table_name AS alias_name;
4. OUTER JOIN
od
1. LEFT JOIN
1. This returns a resulting table that all the data from left table and the matched data from the right table.
2. SELECT columns FROM table LEFT JOIN table2 ON Join_Condition;
2. RIGHT JOIN
1. This returns a resulting table that all the data from right table and the matched data from the left table.
2. SELECT columns FROM table RIGHT JOIN table2 ON join_cond;
C
3. FULL JOIN
1. This returns a resulting table that contains all data when there is a match on left or right table data.
2. Emulated in MySQL using LEFT and RIGHT JOIN.
3. LEFT JOIN UNION RIGHT JOIN.
4. SELECT columns FROM table1 as t1 LEFT JOIN table2 as t2 ON t1.id = t2.id
UNION
SELECT columns FROM table1 as t1 RIGHT JOIN table2 as t2 ON t1.id = t2.id;
5. UNION ALL, can also be used this will duplicate values as well while UNION gives unique values.
5. CROSS JOIN
1. This returns all the cartesian products of the data present in both tables. Hence, all possible variations
are reflected in the output.
2. Used rarely in practical purpose.
3. Table-1 has 10 rows and table-2 has 5, then resultant would have 50 rows.
4. SELECT column-lists FROM table1 CROSS JOIN table2;
6. SELF JOIN
1. It is used to get the output from a particular table when the same table is joined to itself.
2. Used very less.
3. Emulated using INNER JOIN.
4. SELECT columns FROM table as t1 INNER JOIN table as t2 ON t1.id = t2.id;
7. Join without using join keywords.
1. SELECT * FROM table1, table2 WHERE condition;
2. e.g., SELECT artist_name, album_name, year_recordedFROM artist, albumWHERE artist.id = album.artist_id;
SET OPERATIONS
1. Used to combine multiple select statements.
2. Always gives distinct rows.
Combines multiple tables based on matching Combination is resulting set from two or more
condition. SELECT statements.
p
Column wise combination. Row wise combination.
Data types of two tables can be different. Datatypes of corresponding columns from each
el
table should be the same.
The number of column(s) selected may or may not The number of column(s) selected must be the
be the same from each table. same from each table.
eH
Combines results horizontally. Combines results vertically.
3. UNION
1. Combines two or more SELECT statements.
2. SELECT * FROM table1
UNION
SELECT * FROM table2;
od
3. Number of column, order of column must be same for table1 and table2.
4. INTERSECT
1. Returns common values of the tables.
2. Emulated.
3. SELECT DISTINCT column-list FROM table-1 INNER JOIN table-2 USING(join_cond);
4. SELECT DISTINCT * FROM table1 INNER JOIN table2 ON USING(id);
5. MINUS
C
1. This operator returns the distinct row from the first table that does not occur in the second table.
2. Emulated.
3. SELECT column_list FROM table1 LEFT JOIN table2 ON condition WHERE table2.column_name IS NULL;
4. e.g., SELECT id FROM table-1 LEFT JOIN table-2 USING(id) WHERE table-2.id IS NULL;
SUB QUERIES
1. Outer query depends on inner query.
2. Alternative to joins.
3. Nested queries.
4. SELECT column_list (s) FROM table_name WHERE column_name OPERATOR
(SELECT column_list (s) FROM table_name [WHERE]);
5. e.g., SELECT * FROM table1 WHERE col1 IN (SELECT col1 FROM table1);
6. Sub queries exist mainly in 3 clauses
1. Inside a WHERE clause.
2. Inside a FROM clause.
3. Inside a SELECT clause.
7. Subquery using FROM clause
1. SELECT MAX(rating) FROM (SELECT * FROM movie WHERE country = ‘India’) as temp;
8. Subquery using SELECT
1. SELECT (SELECT column_list(s) FROM T_name WHERE condition), columnList(s) FROM T2_name WHERE
condition;
9. Derived Subquery
1. SELECT columnLists(s) FROM (SELECT columnLists(s) FROM table_name WHERE [condition]) as new_table_name;
10. Co-related sub-queries
1. With a normal nested subquery, the inner SELECT query
runs first and executes once, returning values to be used by
the main query. A correlated subquery, however, executes
once for each candidate row considered by the outer query.
In other words, the inner query is driven by the outer query.
p
JOIN VS SUB-QUERIES
el
JOINS SUBQUERIES
Faster Slower
MySQL VIEWS
1. A view is a database object that has no values. Its contents are based on the base table. It contains rows and columns
od
NOTE: We can also import/export table schema from files (.csv or json).
LEC-11: Normalisation
p
2. Augmentation
1. If B can be determined from A, then adding an attribute to this functional dependency won’t change
anything.
el
2. If A→ B holds, then AX→ BX holds too. ‘X’ being a set of attributes.
3. Transitivity
1. If A determines B and B determines C, we can say that A determines C.
2. if A→ B and B→ C then A→ C.
eH
3. Why Normalisation?
1. To avoid redundancy in the DB, not to store redundant data.
4. What happen if we have redundant data?
1. Insertion, deletion and updation anomalies arises.
5. Anomalies
1. Anomalies means abnormalities, there are three types of anomalies introduced by data redundancy.
2. Insertion anomaly
1. When certain data (attribute) can not be inserted into the DB without the presence of other data.
od
3. Deletion anomaly
1. The delete anomaly refers to the situation where the deletion of data results in the unintended loss of some
other important data.
4. Updation anomaly (or modification anomaly)
1. The update anomaly is when an update of a single data value requires multiple rows of data to be updated.
2. Due to updation to many places, may be Data inconsistency arises, if one forgets to update the data at all the
C
intended places.
5. Due to these anomalies, DB size increases and DB performance become very slow.
6. To rectify these anomalies and the effect of these of DB, we use Database optimisation technique called
NORMALISATION.
6. What is Normalisation?
1. Normalisation is used to minimise the redundancy from a relations. It is also used to eliminate undesirable
characteristics like Insertion, Update, and Deletion Anomalies.
2. Normalisation divides the composite attributes into individual attributes OR larger table into smaller and links them
using relationships.
3. The normal form is used to reduce redundancy from the database table.
7. Types of Normal forms
1. 1NF
1. Every relation cell must have atomic value.
2. Relation must not have multi-valued attributes.
2. 2NF
1. Relation must be in 1NF.
2. There should not be any partial dependency.
1. All non-prime attributes must be fully dependent on PK.
2. Non prime attribute can not depend on the part of the PK.
3. 3NF
1. Relation must be in 2NF.
2. No transitivity dependency exists.
1. Non-prime attribute should not find a non-prime attribute.
4. BCNF (Boyce-Codd normal form)
1. Relation must be in 3NF.
2. FD: A -> B, A must be a super key.
1. We must not derive prime attribute from any prime or non-prime attribute.
8. Advantages of Normalisation
1. Normalisation helps to minimise data redundancy.
2. Greater overall database organisation.
p
3. Data consistency is maintained in DB.
el
eH
od
C
LEC-12: Transaction
1. Transaction
1. A unit of work done against the DB in a logical sequence.
2. Sequence is very important in transaction.
3. It is a logical unit of work that contains one or more SQL statements. The result of all these statements in a
transaction either gets completed successfully (all the changes made to the database are permanent) or if at any
point any failure happens it gets rollbacked (all the changes being done are undone.)
2. ACID Properties
1. To ensure integrity of the data, we require that the DB system maintain the following properties of the transaction.
2. Atomicity
1. Either all operations of transaction are reflected properly in the DB, or none are.
3. Consistency
1. Integrity constraints must be maintained before and after transaction.
2. DB must be consistent after transaction happens.
4. Isolation
p
1. Even though multiple transactions may execute concurrently, the system guarantees that, for every pair of
transactions Ti and Tj, it appears to Ti that either Tj finished execution before Ti started, or Tj started execution
after Ti finished. Thus, each transaction is unaware of other transactions executing concurrently in the system.
el
2. Multiple transactions can happen in the system in isolation, without interfering each other.
5. Durability
1. After transaction completes successfully, the changes it has made to the database persist, even if there are
system failures.
eH
3. Transaction states
od
C
1. Active state
1. The very first state of the life cycle of the transaction, all the read and write operations are being
performed. If they execute without any error the T comes to Partially committed state. Although if any
error occurs then it leads to a Failed state.
2. Partially committed state
1. After transaction is executed the changes are saved in the buffer in the main memory. If the changes made
are permanent on the DB then the state will transfer to the committed state and if there is any failure, the T
will go to Failed state.
3. Committed state
1. When updates are made permanent on the DB. Then the T is said to be in the committed state. Rollback
can’t be done from the committed states. New consistent state is achieved at this stage.
4. Failed state
1. When T is being executed and some failure occurs. Due to this it is impossible to continue the execution of
the T.
5. Aborted state
1. When T reaches the failed state, all the changes made in the buffer are reversed. After that the T rollback
completely. T reaches abort state after rollback. DB’s state prior to the T is achieved.
6. Terminated state
1. A transaction is said to have terminated if has either committed or aborted.
p
el
eH
od
C
LEC-13: How to implement Atomicity and Durability in Transactions
p
2. T abort can be done by just deleting the new copy of DB.
3. Hence, either all updates are reflected or none.
9. Durability
1. Suppose, system fails are any time before the updated db-pointer is written to disk.
el
2. When the system restarts, it will read db-pointer & will thus, see the original content of DB and none of the effects of T will
be visible.
3. T is assumed to be successful only when db-pointer is updated.
4. If system fails after db-pointer has been updated. Before that all the pages of the new copy were written to disk. Hence,
when system restarts, it will read new DB copy.
eH
10. The implementation is dependent on write to the db-pointer being atomic. Luckily, disk system provide atomic updates to entire
block or at least a disk sector. So, we make sure db-pointer lies entirely in a single sector. By storing db-pointer at the beginning
of a block.
11. Inefficient, as entire DB is copied for every Transaction.
3. Log-based recovery methods
1. The log is a sequence of records. Log of each transaction is maintained in some stable storage so that if any failure occurs, then
it can be recovered from there.
2. If any operation is performed on the database, then it will be recorded in the log.
3. But the process of storing the logs should be done before the actual transaction is applied in the database.
od
4. Stable storage is a classification of computer data storage technology that guarantees atomicity for any given write operation
and allows software to be written that is robust against some hardware and power failures.
5. Deferred DB Modifications
1. Ensuring atomicity by recording all the DB modifications in the log but deferring the execution of all the write operations
until the final action of the T has been executed.
2. Log information is used to execute deferred writes when T is completed.
3. If system crashed before the T completes, or if T is aborted, the information in the logs are ignored.
C
4. If T completes, the records associated to it in the log file are used in executing the deferred writes.
5. If failure occur while this updating is taking place, we preform redo.
6. Immediate DB Modifications
1. DB modifications to be output to the DB while the T is still in active state.
2. DB modifications written by active T are called uncommitted modifications.
3. In the event of crash or T failure, system uses old value field of the log records to restore modified values.
4. Update takes place only after log records in a stable storage.
5. Failure handling
1. System failure before T completes, or if T aborted, then old value field is used to undo the T.
2. If T completes and system crashes, then new value field is used to redo T having commit logs in the logs.
LEC-14: Indexing in DBMS
1. Indexing is used to optimise the performance of a database by minimising the number of disk accesses required when a query is
processed.
2. The index is a type of data structure. It is used to locate and access the data in a database table quickly.
3. Speeds up operation with read operations like SELECT queries, WHERE clause etc.
4. Search Key: Contains copy of primary key or candidate key
of the table or something else.
5. Data Reference: Pointer holding the address of disk block
where the value of the corresponding key is stored.
6. Indexing is optional, but increases access speed. It is not the
primary mean to access the tuple, it is the secondary mean.
7. Index file is always sorted.
8. Indexing Methods
1. Primary Index (Clustering Index)
1. A file may have several indices, on different search keys. If the data file containing the records is sequentially ordered, a
Primary index is an index whose search key also defines the sequential order of the file.
2. NOTE: The term primary index is sometimes used to mean an index on a primary key. However, such usage is
p
nonstandard and should be avoided.
3. All files are ordered sequentially on some search key. It could be Primary Key or non-primary key.
4. Dense And Sparse Indices
1. Dense Index
el
1. The dense index contains an index record for every search key value in the data file.
2. The index record contains the search-key value and a pointer to the first data record with that search-key value.
The rest of the records with the same search-key value would be stored sequentially after the first record.
3. It needs more space to store index record itself. The index records have the search key and a pointer to the actual
record on the disk.
eH
2. Sparse Index
1. An index record appears for only some of the search-key values.
2. Sparse Index helps you to resolve the issues of dense Indexing in DBMS. In this method of indexing technique, a
range of index columns stores the same data block address, and when data needs to be retrieved, the block
address will be fetched.
5. Primary Indexing can be based on Data file is sorted w.r.t Primary Key attribute or non-key attributes.
6. Based on Key attribute
1. Data file is sorted w.r.t primary key attribute.
2. PK will be used as search-key in Index.
od
3. Sparse Index will be formed i.e., no. of entries in the index file = no. of blocks in datafile.
7. Based on Non-Key attribute
1. Data file is sorted w.r.t non-key attribute.
2. No. Of entries in the index = unique non-key attribute value in the data file.
3. This is dense index as, all the unique values have an entry in the
index file.
4. E.g., Let’s assume that a company recruited many employees in
C
p
el
eH
od
C
LEC-15: NoSQL
1. NoSQL databases (aka "not only SQL") are non-tabular databases and store data differently than relational tables. NoSQL databases
come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. They
provide flexible schemas and scale easily with large amounts of data and high user loads.
1. They are schema free.
2. Data structures used are not tabular, they are more flexible, has the ability to adjust dynamically.
3. Can handle huge amount of data (big data).
4. Most of the NoSQL are open sources and has the capability of horizontal scaling.
5. It just stores data in some format other than relational.
2. History behind NoSQL
1. NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to
create a complex, difficult-to-manage data model in order to avoid data duplication. Developers (rather than storage) were
becoming the primary cost of software development, so NoSQL databases optimised for developer productivity.
2. Data becoming unstructured more, hence structuring (defining schema in advance) them had becoming costly.
3. NoSQL databases allow developers to store huge amounts of unstructured data, giving them a lot of flexibility.
4. Recognising the need to rapidly adapt to changing requirements in a software system. Developers needed the ability to iterate
quickly and make changes throughout their software stack — all the way down to the database. NoSQL databases gave them
p
this flexibility.
5. Cloud computing also rose in popularity, and developers began using public clouds to host their applications and data. They
wanted the ability to distribute data across multiple servers and regions to make their applications resilient, to scale out instead
of scale up, and to intelligently geo-place their data. Some NoSQL databases like MongoDB provide these capabilities.
el
3. NoSQL Databases Advantages
A. Flexible Schema
1. RDBMS has pre-defined schema, which become an issue when we do not have all the data with us or we need to change
the schema. It's a huge task to change schema on the go.
B. Horizontal Scaling
eH
1. Horizontal scaling, also known as scale-out, refers to bringing on additional nodes to share the load. This is difficult with
relational databases due to the difficulty in spreading out related data across nodes. With non-relational databases, this is
made simpler since collections are self-contained and not coupled relationally. This allows them to be distributed across
nodes more simply, as queries do not have to “join” them together across nodes.
2. Scaling horizontally is achieved through Sharding OR Replica-sets.
C. High Availability
1. NoSQL databases are highly available due to its auto replication feature i.e. whenever any kind of failure happens data
replicates itself to the preceding consistent state.
2. If a server fails, we can access that data from another server as well, as in NoSQL database data is stored at multiple
od
servers.
D. Easy insert and read operations.
1. Queries in NoSQL databases can be faster than SQL databases. Why? Data in SQL databases is typically normalised, so
queries for a single object or entity require you to join data from multiple tables. As your tables grow in size, the joins can
become expensive. However, data in NoSQL databases is typically stored in a way that is optimised for queries. The rule of
thumb when you use MongoDB is data that is accessed together should be stored together. Queries typically do not require
joins, so the queries are very fast.
C
p
2. Column-Oriented / Columnar / C-Store / Wide-Column
1. The data is stored such that each row of a column will be next to other rows from that same column.
2. While a relational database stores data in rows and reads data row by row, a column store is organised as a set of columns.
el
This means that when you want to run analytics on a small number of columns, you can read those columns directly
without consuming memory with the unwanted data. Columns are often of the same type and benefit from more efficient
compression, making reads even faster. Columnar databases can quickly aggregate the value of a given column (adding up
the total sales for the year, for example). Use cases include analytics.
3. e.g., Cassandra, RedShift, Snowflake.
eH
3. Document Based Stores
1. This DB store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of
fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or
objects.
2. Use cases include e-commerce platforms, trading platforms, and mobile app development across industries.
3. Supports ACID properties hence, suitable for Transactions.
4. e.g., MongoDB, CouchDB.
4. Graph Based Stores
1. A graph database focuses on the relationship between data elements. Each element is stored as a node (such as a person
od
in a social media graph). The connections between elements are called links or relationships. In a graph database,
connections are first-class elements of the database, stored directly. In relational databases, links are implied, using data to
express the relationships.
2. A graph database is optimised to capture and search the connections between data elements, overcoming the overhead
associated with JOINing multiple tables in SQL.
3. Very few real-world business systems can survive solely on graph queries. As a result graph databases are usually run
alongside other more traditional databases.
C
4. Use cases include fraud detection, social networks, and knowledge graphs.
7. NoSQL Databases Dis-advantages
1. Data Redundancy
1. Since data models in NoSQL databases are typically optimised for queries and not for reducing data duplication, NoSQL
databases can be larger than SQL databases. Storage is currently so cheap that most consider this a minor drawback, and
some NoSQL databases also support compression to reduce the storage footprint.
2. Update & Delete operations are costly.
3. All type of NoSQL Data model doesn’t fulfil all of your application needs
1. Depending on the NoSQL database type you select, you may not be able to achieve all of your use cases in a single
database. For example, graph databases are excellent for analysing relationships in your data but may not provide what
you need for everyday retrieval of the data such as range queries. When selecting a NoSQL database, consider what your
use cases will be and if a general purpose database like MongoDB would be a better option.
4. Doesn’t support ACID properties in general.
5. Doesn’t support data entry with consistency constraints.
8. SQL vs NoSQL
Data Storage Model Tables with fixed rows and Document: JSON documents,
columns Key-value: key-value pairs, Wide-
column: tables with rows and
dynamic columns, Graph: nodes
and edges
Development History Developed in the 1970s with a Developed in the late 2000s with
focus on reducing data a focus on scaling and allowing
duplication for rapid application change
driven by agile and DevOps
practices.
p
Cassandra and HBase, Graph:
Neo4j and Amazon Neptune
el
value: large amounts of data with
simple lookup queries, Wide-
column: large amounts of data
with predictable query patterns,
Graph: analyzing and traversing
eH
relationships between connected
data
MongoDB etc.
1. Relational Databases
1. Based on Relational Model.
2. Relational databases are quite popular, even though it was a system designed in the 1970s. Also known as relational database
management systems (RDBMS), relational databases commonly use Structured Query Language (SQL) for operations such as
creating, reading, updating, and deleting data. Relational databases store information in discrete tables, which can be JOINed
together by fields known as foreign keys. For example, you might have a User table which contains information about all your
users, and join it to a Purchases table, which contains information about all the purchases they’ve made. MySQL, Microsoft SQL
Server, and Oracle are types of relational databases.
3. they are ubiquitous, having acquired a steady user base since the 1970s
4. they are highly optimised for working with structured data.
5. they provide a stronger guarantee of data normalisation
6. they use a well-known querying language through SQL
7. Scalability issues (Horizontal Scaling).
8. Data become huge, system become more complex.
2. Object Oriented Databases
1. The object-oriented data model, is based on the object-oriented-programming paradigm, which is now in wide use.
p
Inheritance, object-identity, and encapsulation (information hiding), with methods to provide an interface to objects, are
among the key concepts of object-oriented programming that have found applications in data modelling. The object-oriented
data model also supports a rich type system, including structured and collection types. While inheritance and, to some extent,
complex types are also present in the E-R model, encapsulation and object-identity distinguish the object-oriented data model
el
from the E-R model.
2. Sometimes the database can be very complex, having multiple relations. So, maintaining a relationship between them can be
tedious at times.
1. In Object-oriented databases data is treated as an object.
2. All bits of information come in one instantly available object package instead of multiple tables.
eH
3. Advantages
1. Data storage and retrieval is easy and quick.
2. Can handle complex data relations and more variety of data types that standard relational databases.
3. Relatively friendly to model the advance real world problems
4. Works with functionality of OOPs and Object Oriented languages.
4. Disadvantages
1. High complexity causes performance issues like read, write, update and delete operations are slowed down.
2. Not much of a community support as isn’t widely adopted as relational databases.
3. Does not support views like relational databases.
od
p
el
eH
od
C
LEC-17: Clustering in DBMS
1. Database Clustering (making Replica-sets) is the process of combining more than one servers or instances connecting a single database.
Sometimes one server may not be adequate to manage the amount of data or the number of requests, that is when a Data Cluster is needed.
Database clustering, SQL server clustering, and SQL clustering are closely associated with SQL is the language used to manage the database
information.
2. Replicate the same dataset on different servers.
3. Advantages
1. Data Redundancy: Clustering of databases helps with data redundancy, as we store the same data at multiple servers. Don’t confuse this
data redundancy as repetition of the same data that might lead to some anomalies. The redundancy that clustering offers is required and is
quite certain due to the synchronisation. In case any of the servers had to face a failure due to any possible reason, the data is available at other
servers to access.
2. Load balancing: or scalability doesn’t come by default with the database. It has to be brought by clustering regularly. It also depends on the
setup. Basically, what load balancing does is allocating the workload among the different servers that are part of the cluster. This indicates that
more users can be supported and if for some reasons if a huge spike in the traffic appears, there is a higher assurance that it will be able to
support the new traffic. One machine is not going to get all of the hits. This can provide scaling seamlessly as required. This links directly to
high availability. Without load balancing, a particular machine could get overworked and traffic would slow down, leading to decrement of
the traffic to zero.
p
3. High availability: When you can access a database, it implies that it is available. High availability refers the amount of time a database is
considered available. The amount of availability you need greatly depends on the number of transactions you are running on your database
and how often you are running any kind of analytics on your data. With database clustering, we can reach extremely high levels of availability
due to load balancing and have extra machines. In case a server got shut down the database will, however, be available.
el
4. How does Clustering Work?
1. In cluster architecture, all requests are split with many computers so that an individual user request is executed and produced by a number of
computer systems. The clustering is serviceable definitely by the ability of load balancing and high-availability. If one node collapses, the
request is handled by another node. Consequently, there are few or no possibilities of absolute system failures.
eH
od
C
LEC-18: Partitioning & Sharding in DBMS (DB Optimisation)
1. A big problem can be solved easily when it is chopped into several smaller sub-problems. That is what the partitioning technique does. It divides a
big database containing data metrics and indexes into smaller and handy slices of data called partitions. The partitioned tables are directly used by
SQL queries without any alteration. Once the database is partitioned, the data definition language can easily work on the smaller partitioned slices,
instead of handling the giant database altogether. This is how partitioning cuts down the problems in managing large database tables.
2. Partitioning is the technique used to divide stored database objects into separate servers. Due to this, there is an increase in performance,
controllability of the data. We can manage huge chunks of data optimally. When we horizontally scale our machines/servers, we know that it gives us
a challenging time dealing with relational databases as it’s quite tough to maintain the relations. But if we apply partitioning to the database that is
already scaled out i.e. equipped with multiple servers, we can partition our database among those servers and handle the big data easily.
3. Vertical Partitioning
1. Slicing relation vertically / column-wise.
2. Need to access different servers to get complete tuples.
4. Horizontal Partitioning
1. Slicing relation horizontally / row-wise.
2. Independent chunks of data tuples are stored in different servers.
5. When Partitioning is Applied?
1. Dataset become much huge that managing and dealing with it become a tedious task.
p
2. The number of requests are enough larger that the single DB server access is taking huge time and hence the system’s response time become
high.
6. Advantages of Partitioning
1. Parallelism
el
2. Availability
3. Performance
4. Manageability
5. Reduce Cost, as scaling-up or vertical scaling might be costly.
7. Distributed Database
eH
1. A single logical database that is, spread across multiple locations (servers) and logically interconnected by network.
2. This is the product of applying DB optimisation techniques like Clustering, Partitioning and Sharding.
3. Why this is needed? READ Point 5.
8. Sharding
1. Technique to implement Horizontal Partitioning.
2. The fundamental idea of Sharding is the idea that instead of having all the data sit on one DB instance, we split it up and introduce a
Routing layer so that we can forward the request to the right instances that actually contain the data.
3. Pros
1. Scalability
od
2. Availability
4. Cons
1. Complexity, making partition mapping, Routing layer to be implemented in the system, Non-uniformity that creates the necessity of Re-
Sharding
2. Not well suited for Analytical type of queries, as the data is spread across different DB instances. (Scatter-Gather problem)
C
p
el
Database Scaling Patterns
H
Step by Step Scaling
e
od
C
- Lakshay
What will you learn?
p
• Step by Step manner, when to choose which Scaling option.
el
• Which Scaling option is feasible practically at the moment.
e H
od
C
A Case Study
Cab Booking APP
p
• Tiny startup.
el
• ~10 customers onboard.
H
• A single small machine DB stores all customers, trips, locations, booking
data, and customer trip history.
e
od
• ~1 trip booking in 5 mins.
C
Your App becoming famous, but…
The PROBLEM begins
p
• Requests scales upto 30 bookings per minute.
el
• Your tiny DB system has started performing poorly.
H
• API latency has increased a lot.
e
• Transactions facing Deadlock, Starvation, and frequent failure.
od
• Sluggish App experience.
C
• Customer dis-satisfaction.
Is there any solution?
p
• We have to apply some kind of performance optimisation measures.
el
• We might have to scale our system going forward.
e H
od
C
Pattern 1
Query Optimisation & Connection Pool Implementation
p
• Cache frequently used non-dynamic data like, booking history, payment
el
history, user profiles etc.
H
• Introduce Database Redundancy. (Or may be use NoSQL)
• Use connection pool libraries to Cache DB connections.
e
od
• Multiple application threads can use same DB connection.
• Good optimisations as of now.
C
• Scaled the business to one more city, and now getting ~100 booking per
minute.
Pattern 2
Vertical Scaling or Scale-up
p
• Upgrading our initial tiny machine.
el
• RAM by 2x and SSD by 3x etc.
H
• Scale up is pocket friendly till a point only.
e
• More you scale up, cost increases exponentially.
od
• Good Optimisation as of now.
C
• Business is growing, you decided to scale it to 3 more cities and now getting
300 booking per minute.
Pattern 3
Command Query Responsibility Segregation (CQRS)
p
• The scaled up big machine is not able to handle all read/write requests.
el
• Separate read/write operations physical machine wise.
H
• 2 more machines as replica to the primary machine.
e
• All read queries to replicas.
od
• All write queries to primary.
• Business is growing, you decided to scale it to 2 more cities.
C
• Primary is not able to handle all write requests.
• Lag between primary and replica is impacting user experience.
Pattern 4
Multi Primary Replication
p
• Why not distribute write request to replica also?
el
• All machines can work as primary & replica.
H
• Multi primary configuration is a logical circular ring.
e
• Write data to any node.
od
• Read data from any node that replies to the broadcast first.
C
• You scale to 5 more cities & your system is in pain again. (~50 req/s)
Pattern 5
Partitioning of Data by Functionality
p
• What about separating the location tables in separate DB schema?
el
• What about putting that DB in separate machines with primary-replica or
H
multi-primary configuration?
e
od
• Backend or application layer has to take responsibility to join the results.
• Planning to expand your business to other country.
C
Pattern 6
Horizontal Scaling or Scale-out
p
• Sharding - multiple shards.
el
• Allocate 50 machines - all having same DB schema - each machine just hold
H
a part of data.
e
od
• Each machine can have their own replicas, may be used in failure recovery.
• Sharding is generally hard to apply. But “No Pain, No Gain”
C
• Scaling the business across continents.
Pattern 7
Data Centre Wise Partition
p
• Requests travelling across continents are having high latency.
el
• What about distributing traffic across data centres?
H
• Data centres across continents.
e
• Enable cross data centre replication which helps disaster recovery.
od
• This always maintain Availability of your system.
C
• Now, Plan for an IPO :p
LEC-20: CAP Theorem
p
5. CAP Theorem NoSQL Databases: NoSQL databases are great for distributed networks. They allow for horizontal scaling, and they can quickly scale
across multiple nodes. When deciding which NoSQL database to use, it’s important to keep the CAP theorem in mind.
1. CA Databases: CA databases enable consistency and availability across all nodes. Unfortunately, CA databases can’t deliver fault tolerance. In
any distributed system, partitions are bound to happen, which means this type of database isn’t a very practical choice. That being said, you still
el
can find a CA database if you need one. Some relational databases, such as MySQL or PostgreSQL, allow for consistency and availability. You can
deploy them to nodes using replication.
2. CP Databases: CP databases enable consistency and partition tolerance, but not availability. When a partition occurs, the system has to turn
off inconsistent nodes until the partition can be fixed. MongoDB is an example of a CP database. It’s a NoSQL database management system
(DBMS) that uses documents for data storage. It’s considered schema-less, which means that it doesn’t require a defined database schema. It’s
eH
commonly used in big data and applications running in different locations. The CP system is structured so that there’s only one primary
node that receives all of the write requests in a given replica set. Secondary nodes replicate the data in the primary nodes, so if the
primary node fails, a secondary node can stand-in. In banking system Availability is not as important as consistency, so we can opt it
(MongoDB).
3. AP Databases: AP databases enable availability and partition tolerance, but not consistency. In the event of a partition, all nodes are available,
but they’re not all updated. For example, if a user tries to access data from a bad node, they won’t receive the most up-to-date version of the
data. When the partition is eventually resolved, most AP databases will sync the nodes to ensure consistency across them. Apache Cassandra is
an example of an AP database. It’s a NoSQL database with no primary node, meaning that all of the nodes remain available. Cassandra allows
for eventual consistency because users can re-sync their data right after a partition is resolved. For apps like Facebook, we value availability
od
more than consistency, we’d opt for AP Databases like Cassandra or Amazon DynamoDB.
C
LEC-21: The Master-Slave Database Concept
p
el
eH
1. Master-Slave is a general way to optimise IO in a system where number of requests goes way high that a single DB server is not able to handle it
efficiently.
2. Its a Pattern 3 in LEC-19 (Database Scaling Pattern). (Command Query Responsibility Segregation)
3. The true or latest data is kept in the Master DB thus write operations are directed there. Reading ops are done only from slaves. This architecture
serves the purpose of safeguarding site er liability, availability, reduce latency etc . If a site receives a lot of traffic and the only available
database is one master, it will be overloaded with reading and writing requests. Making the entire system slow for everyone on the site.
4. DB replication will take care of distributing data from Master machine to Slaves machines. This can be synchronous or asynchronous depending
od
An operating system is a piece of software that manages all the resources of a computer
system, both hardware and software, and provides an environment in which the user can
execute his/her programs in a convenient and efficient manner by hiding underlying
complexity of the hardware and acting as a resource manager.
Why OS?
1. What if there is no OS?
a. Bulky and complex app. (Hardware interaction code must be in app’s
code base)
b. Resource exploitation by 1 App.
c. No memory protection.
2. What is an OS made up of?
a. Collection of system software.
User
Application programs
Operating system
Computer hardware
The operating system provides the means for proper use of the resources in the operation of
the computer system.
LEC-2: Types of OS
OS goals –
p
- Batch-processing operating system [ATLAS, Manchester Univ., late 1950s – early 1960s]
- Multiprogramming operating system [THE, Dijkstra, early 1960s]
- Multitasking operating system [CTSS, MIT, early 1960s]
- Multi-processing operating system [Windows NT]
el
- Distributed system [LOCUS]
- Real time OS [ATCS]
eH
od
C
Single process OS, only 1 process executes at a time from the ready queue. [Oldest]
Batch-processing OS,
1. Firstly, user prepares his job using punch cards.
2. Then, he submits the job to the computer operator.
3. Operator collects the jobs from different users and sort the jobs into batches with
similar needs.
4. Then, operator submits the batches to the processor one by one.
5. All the jobs of one batch are executed together.
p
el
eH
Multiprogramming increases CPU utilization by keeping multiple jobs (code and data)
in thememory so that the CPU always has one to execute in case some job gets busy with
I/O.
- Single CPU
- Context switching for processes.
- Switch happens when current process goes to wait state.
- CPU idle time reduced.
multiprogramming.
- Single CPU
- Able to run more than one task
simultaneously.
- Context switching and time sharing used.
- Increases responsiveness.
- CPU idle time is further reduced.
p
el
eH
od
C
LEC-3: Multi-Tasking vs Multi-Threading
Program: A Program is an executable file which contains a certain set of instructions written
to complete the specific job or operation on your computer.
• It’s a compiled code. Ready to be executed.
• Stored in Disk
Thread:
• Single sequence stream within a process.
• An independent path of execution in a process.
• Light-weight process.
• Used to achieve parallelism by dividing a process’s tasks which are independent path
p
of execution.
• E.g., Multiple tabs in a browser, text editor (When you are typing in an editor, spell-
checking, formatting of text and saving the text are done concurrently by multiple
el
threads.)
Multi-Tasking Multi-Threading
The execution of more than one task A process is divided into several different
eH
simultaneously is called as multitasking. sub-tasks called as threads, which has its
own path of execution. This concept is
called as multithreading.
Concept of more than 1 processes being Concept of more than 1 thread. Threads are
context switched. context switched.
No. of CPU 1. No. of CPU >= 1. (Better to have more than
1)
od
process.
Thread Scheduling:
Threads are scheduled for execution based on their priority. Even though threads are
executing within the runtime, all threads are assigned processor time slices by the operating
system.
p
el
eH
od
C
LEC-4: Components of OS
1. Kernel: A kernel is that part of the operating system which interacts directly with
the hardware andperforms the most crucial tasks.
a. Heart of OS/Core component
b. Very first part of OS to load on start-up.
2. User space: Where application software runs, apps don’t have privileged access to the
underlying hardware. It interacts with kernel.
a. GUI
b. CLI
A shell, also known as a command interpreter, is that part of the operating system that receives
commands from the users and gets them executed.
Functions of Kernel:
p
1. Process management:
a. Scheduling processes and threads on the CPUs.
b. Creating & deleting both user and system process.
el
c. Suspending and resuming processes
d. Providing mechanisms for process synchronization or process
communication.
2. Memory management:
a. Allocating and deallocating memory space as per need.
eH
b. Keeping track of which part of memory are currently being used and by
which process.
3. File management:
a. Creating and deleting files.
b. Creating and deleting directories to organize files.
c. Mapping files into secondary storage.
d. Backup support onto a stable storage media.
4. I/O management: to manage and control I/O operations and I/O devices
od
ii. Buffering
1. Within one job.
2. Eg. Youtube video buffering
iii. Caching
1. Memory caching, Web caching etc.
Types of Kernels:
1. Monolithic kernel
a. All functions are in kernel itself.
b. Bulky in size.
c. Memory required to run is high.
d. Less reliable, one module crashes -> whole kernel is down.
e. High performance as communication is fast. (Less user mode, kernel
mode overheads)
f. Eg. Linux, Unix, MS-DOS.
2. Micro Kernel
a. Only major functions are in kernel.
i. Memory mgmt.
ii. Process mgmt.
b. File mgmt. and IO mgmt. are in User-space.
c. smaller in size.
d. More Reliable
e. More stable
f. Performance is slow.
g. Overhead switching b/w user mode and kernel mode.
h. Eg. L4 Linux, Symbian OS, MINIX etc.
3. Hybrid Kernel:
a. Advantages of both worlds. (File mgmt. in User space and rest in Kernel
space. )
b. Combined approach.
c. Speed and design of mono.
d. Modularity and stability of micro.
p
e. Eg. MacOS, Windows NT/7/10
f. IPC also happens but lesser overheads
4. Nano/Exo kernels…
el
Q. How will communication happen between user mode and kernel mode?
Ans. Inter process communication (IPC).
eH
1. Two processes executing independently, having independent memory space (Memory
protection), But some may need to communicate to work.
A system call is a mechanism using which a user program can request a service from the kernel for
which it does not have the permission to perform.
User programs typically do not have permission to perform operations like accessing I/O devices and
communicating other programs.
System Calls are the only way through which a process can go into kernel mode from user mode.
Types of System Calls:
1) Process Control
a. end, abort
b. load, execute
c. create process, terminate process
d. get process attributes, set process attributes
e. wait for time
f. wait event, signal event
g. allocate and free memory
2) File Management
a. create file, delete file
b. open, close
c. read, write, reposition
d. get file attributes, set file attributes
3) Device Management
a. request device, release device
b. read, write, reposition
c. get device attributes, set device attributes
d. logically attach or detach devices
4) Information maintenance
a. get time or date, set time or date
b. get system data, set system data
c. get process, file, or device attributes
d. set process, file, or device attributes
5) Communication Management
a. create, delete communication connection
b. send, receive messages
c. transfer status information
d. attach or detach remote devices
i. PC On
ii. CPU initializes itself and looks for a firmware program (BIOS) stored in
BIOS Chip (Basic input-output system chip is a ROM chip found on
mother board that allows to access & setup computer system at most
basic level.)
1. In modern PCs, CPU loads UEFI (Unified extensible firmware
interface)
iii. CPU runs the BIOS which tests and initializes system hardware. Bios
loads configuration settings. If something is not appropriate (like missing
RAM) error is thrown and boot process is stopped.
This is called POST (Power on self-test) process.
(UEFI can do a lot more than just initialize hardware; it’s really a tiny
operating system. For example, Intel CPUs have the Intel Management
Engine. This provides a variety of features, including powering Intel’s
Active Management Technology, which allows for remote management
p
of business PCs.)
iv. BIOS will handoff responsibility for booting your PC to your OS’s
bootloader.
el
1. BIOS looked at the MBR (master boot record), a special boot
sector at the beginning of a disk. The MBR contains code that
loads the rest of the operating system, known as a “bootloader.”
The BIOS executes the bootloader, which takes it from there and
eH
begins booting the actual operating system—Windows or Linux,
for example.
In other words,
the BIOS or UEFI examines a storage device on your system to
look for a small program, either in the MBR or on an EFI system
partition, and runs it.
v. The bootloader is a small program that has the large task of booting the
rest of the operating system (Boots Kernel then, User Space). Windows
od
1. A 32-bit OS has 32-bit registers, and it can access 2^32 unique memory addresses. i.e., 4GB of
physical memory.
2. A 64-bit OS has 64-bit registers, and it can access 2^64 unique memory addresses. i.e.,
17,179,869,184 GB of physical memory.
3. 32-bit CPU architecture can process 32 bits of data & information.
4. 64-bit CPU architecture can process 64 bits of data & information.
5. Advantages of 64-bit over the 32-bit operating system:
a. Addressable Memory: 32-bit CPU -> 2^32 memory addresses, 64-bit CPU -> 2^64
memory addresses.
b. Resource usage: Installing more RAM on a system with a 32-bit OS doesn't impact
performance. However, upgrade that system with excess RAM to the 64-bit version of
Windows, and you'll notice a difference.
p
c. Performance: All calculations take place in the registers. When you’re performing math in
your code, operands are loaded from memory into registers. So, having larger registers
allow you to perform larger calculations at the same time.
el
32-bit processor can execute 4 bytes of data in 1 instruction cycle while 64-bit means that
processor can execute 8 bytes of data in 1 instruction cycle.
(In 1 sec, there could be thousands to billons of instruction cycles depending upon a
processor design)
eH
d. Compatibility: 64-bit CPU can run both 32-bit and 64-bit OS. While 32-bit CPU can only
run 32-bit OS.
e. Better Graphics performance: 8-bytes graphics calculations make graphics-intensive apps
run faster.
od
C
Lec-8: Storage Devices Basics
p
el
eH
1. Register: Smallest unit of storage. It is a part of CPU itself.
A register may hold an instruction, a storage address, or any data (such as bit sequence or individual
characters).
Registers are a type of computer memory used to quickly accept, store, and transfer data and
instructions that are being used immediately by the CPU.
2. Cache: Additional memory system that temporarily stores frequently used instructions and data for
quicker processing by the CPU.
od
Comparison
1. Cost:
a. Primary storages are costly.
C
p
el
eH
5. Attributes of process:
a. Feature that allows identifying a process uniquely.
b. Process table
i. All processes are being tracked by OS using a table like data structure.
ii. Each entry in that table is process control block (PCB).
od
Registers in the PCB, it is a data structure. When a processes is running and it's time slice expires, the
current value of process specific registers would be stored in the PCB and the process would be swapped
out. When the process is scheduled to be run, the register values is read from the PCB and written to the
CPU registers. This is the main purpose of the registers in the PCB.
Lec-10: Process States | Process Queues
1. Process States: As process executes, it changes state. Each process may be in one of the following
states.
a. New: OS is about to pick the program & convert it into process. OR the process is being
created.
b. Run: Instructions are being executed; CPU is allocated.
c. Waiting: Waiting for IO.
d. Ready: The process is in memory, waiting to be assigned to a processor.
e. Terminated: The process has finished execution. PCB entry removed from process table.
p
2. Process Queues:
el
eH
a. Job Queue:
i. Processes in new state.
ii. Present in secondary memory.
iii. Job Schedular (Long term schedular (LTS)) picks process from the pool and
loads them into memory for execution.
b. Ready Queue:
i. Processes in Ready state.
od
1. Swapping
a. Time-sharing system may have medium term schedular (MTS).
b. Remove processes from memory to reduce degree of multi-programming.
c. These removed processes can be reintroduced into memory, and its execution can be continued
where it left off. This is called Swapping.
d. Swap-out and swap-in is done by MTS.
e. Swapping is necessary to improve process mix or because a change in memory requirements has
overcommitted available memory, requiring memory to be freed up.
f. Swapping is a mechanism in which a process can be swapped temporarily out of main memory (or
move) to secondary storage (disk) and make that memory available to other processes. At some
later time, the system swaps back the process from the secondary storage to main memory.
p
el
eH
2. Context-Switching
a. Switching the CPU to another process requires performing a state save of the current process and a
state restore of a different process.
b. When this occurs, the kernel saves the context of the old process in its PCB and loads the saved
context of the new process scheduled to run.
c. It is pure overhead, because the system does no useful work while switching.
d. Speed varies from machine to machine, depending on the memory speed, the number of registers
that must be copied etc.
od
3. Orphan process
a. The process whose parent process has been terminated and it is still running.
b. Orphan processes are adopted by init process.
c. Init is the first process of OS.
4. Zombie process / Defunct process
a. A zombie process is a process whose execution is completed but it still has an entry in the process
C
table.
b. Zombie processes usually occur for child processes, as the parent process still needs to read its
child’s exit status. Once this is done using the wait system call, the zombie process is eliminated
from the process table. This is known as reaping the zombie process.
c. It is because parent process may call wait () on child process for a longer time duration and child
process got terminated much earlier.
d. As entry in the process table can only be removed, after the parent process reads the exit status of
child process. Hence, the child process remains a zombie till it is removed from the process table.
LEC-12: Intro to Process Scheduling | FCFS | Convoy Effect
1. Process Scheduling
a. Basis of Multi-programming OS.
b. By switching the CPU among processes, the OS can make the computer more productive.
c. Many processes are kept in memory at a time, when a process must wait or time quantum expires,
the OS takes the CPU away from that process & gives the CPU to another process & this pattern
continues.
2. CPU Scheduler
a. Whenever the CPU become ideal, OS must select one process from the ready queue to be executed.
b. Done by STS.
3. Non-Preemptive scheduling
a. Once CPU has been allocated to a process, the process keeps the CPU until it releases CPU either by
terminating or by switching to wait-state.
b. Starvation, as a process with long burst time may starve less burst time process.
p
c. Low CPU utilization.
4. Preemptive scheduling
a. CPU is taken away from a process after time quantum expires along with terminating or switching
el
to wait-state.
b. Less Starvation
c. High CPU utilization.
5. Goals of CPU scheduling
a. Maximum CPU utilization
eH
b. Minimum Turnaround time (TAT).
c. Min. Wait-time
d. Min. response time.
e. Max. throughput of system.
6. Throughput: No. of processes completed per unit time.
7. Arrival time (AT): Time when process is arrived at the ready queue.
8. Burst time (BT): The time required by the process for its execution.
od
9. Turnaround time (TAT): Time taken from first time process enters ready state till it terminates. (CT - AT)
10. Wait time (WT): Time process spends waiting for CPU. (WT = TAT – BT)
11. Response time: Time duration between process getting into ready queue and process getting CPU for the
first time.
12. Completion Time (CT): Time taken till process gets terminated.
13. FCFS (First come-first serve):
a. Whichever process comes first in the ready queue will be given CPU first.
C
b. In this, if one process has longer BT. It will have major effect on average WT of diff processes, called
Convoy effect.
c. Convoy Effect is a situation where many processes, who need to use a resource for a short time, are
blocked by one process holding that resource for a long time.
i. This cause poor resource management.
LEC-13: CPU Scheduling | SJF | Priority | RR
1. Shortest Job First (SJF) [Non-preemptive]
a. Process with least BT will be dispatched to CPU first.
b. Must do estimation for BT for each process in ready queue beforehand, Correct estimation of BT is
an impossible task (ideally.)
c. Run lowest time process for all time then, choose job having lowest BT at that instance.
d. This will suffer from convoy effect as if the very first process which came is Ready state is having a
large BT.
e. Process starvation might happen.
f. Criteria for SJF algos, AT + BT.
2. SJF [Preemptive]
a. Less starvation.
b. No convoy effect.
c. Gives average WT less for a given set of processes as scheduling short job before a long one
decreases the WT of short job more than it increases the WT of the long process.
p
3. Priority Scheduling [Non-preemptive]
a. Priority is assigned to a process when it is created.
b. SJF is a special case of general priority scheduling with priority inversely proportional to BT.
el
4. Priority Scheduling [Preemptive]
a. Current RUN state job will be preempted if next job has higher priority.
b. May cause indefinite waiting (Starvation) for lower priority jobs. (Possibility is they won’t get
executed ever). (True for both preemptive and non-preemptive version)
i. Solution: Ageing is the solution.
eH
ii. Gradually increase priority of process that wait so long. E.g., increase priority by 1 every 15
minutes.
5. Round robin scheduling (RR)
a. Most popular
b. Like FCFS but preemptive.
c. Designed for time sharing systems.
d. Criteria: AT + time quantum (TQ), Doesn’t depend on BT.
od
e. No process is going to wait forever, hence very low starvation. [No convoy effect]
f. Easy to implement.
g. If TQ is small, more will be the context switch (more overhead).
C
LEC-14: MLQ | MLFQ
p
d. System process: Created by OS (Highest priority)
el
Interactive process (Foreground process): Needs user input (I/O).
Batch process (Background process): Runs silently, no user input required.
e. Scheduling among different sub-queues is implemented as fixed priority preemptive
scheduling. E.g., foreground queue has absolute priority over background queue.
eH
f. If an interactive process comes & batch process is currently executing. Then, batch process will
be preempted.
g. Problem: Only after completion of all the processes from the top-level ready queue, the
further level ready queues will be scheduled.
This came starvation for lower priority process.
h. Convoy effect is present.
3. Comparison:
FCFS SJF PSJF Priority P- RR MLQ MLFQ
p
Priority
Design Simple Complex Complex Complex Complex Simple Complex Complex
Preemption No No Yes No Yes Yes Yes Yes
el
Convoy Yes Yes No Yes Yes No Yes Yes
effect
Overhead No No Yes No Yes Yes Yes Yes
eH
od
C
LEC-15: Introduction to Concurrency
1. Concurrency is the execution of the multiple instruction sequences at the same time. It happens in
the operating system when there are several process threads running in parallel.
2. Thread:
• Single sequence stream within a process.
• An independent path of execution in a process.
• Light-weight process.
• Used to achieve parallelism by dividing a process’s tasks which are independent path of
execution.
• E.g., Multiple tabs in a browser, text editor (When you are typing in an editor, spell
checking, formatting of text and saving the text are done concurrently by multiple threads.)
3. Thread Scheduling: Threads are scheduled for execution based on their priority. Even though
threads are executing within the runtime, all threads are assigned processor time slices by the
p
operating system.
4. Threads context switching
• OS saves current state of thread & switches to another thread of same process.
el
• Doesn’t includes switching of memory address space. (But Program counter, registers &
stack are included.)
• Fast switching as compared to process switching
• CPU’s cache state is preserved.
eH
5. How each thread get access to the CPU?
• Each thread has its own program counter.
• Depending upon the thread scheduling algorithm, OS schedule these threads.
• OS will fetch instructions corresponding to PC of that thread and execute instruction.
6. I/O or TQ, based context switching is done here as well
• We have TCB (Thread control block) like PCB for state storage management while
performing context switching.
od
p
don't know the order in which the threads will attempt to access the
shared data. Therefore, the result of the change in data is dependent
on the thread scheduling algorithm, i.e., both threads are "racing" to
el
access/change the data.
4. Solution to Race Condition
a. Atomic operations: Make Critical code section an atomic operation, i.e.,
Executed in one CPU cycle.
eH
b. Mutual Exclusion using locks.
c. Semaphores
5. Can we use a simple flag variable to solve the problem of race condition?
a. No.
6. Peterson’s solution can be used to avoid race condition but holds good for only 2
process/threads.
7. Mutex/Locks
od
a. Locks can be used to implement mutual exclusion and avoid race condition
by allowing only one thread/process to access critical section.
b. Disadvantages:
i. Contention: one thread has acquired the lock, other threads will be
busy waiting, what if thread that had acquired the lock dies, then all
other threads will be in infinite waiting.
C
ii. Deadlocks
iii. Debugging
iv. Starvation of high priority threads.
LEC-17: Conditional Variable and Semaphores for Threads synchronization
1. Conditional variable
a. The condition variable is a synchronization primitive that lets the thread wait
until a certain condition occurs.
b. Works with a lock
c. Thread can enter a wait state only when it has acquired a lock. When a
thread enters the wait state, it will release the lock and wait until another
thread notifies that the event has occurred. Once the waiting thread enters
the running state, it again acquires the lock immediately and starts executing.
d. Why to use conditional variable?
i. To avoid busy waiting.
e. Contention is not here.
2. Semaphores
p
a. Synchronization method.
b. An integer that is equal to number of resources
c. Multiple threads can go and execute C.S concurrently.
el
d. Allows multiple program threads to access the finite instance of resources
whereas mutex allows multiple threads to access a single shared resource
one at a time.
e. Binary semaphore: value can be 0 or 1.
eH
i. Aka, mutex locks
f. Counting semaphore
i. Can range over an unrestricted domain.
ii. Can be used to control access to a given resource consisting of a finite
number of instances.
g. To overcome the need for busy waiting, we can modify the definition of
the wait () and signal () semaphore operations. When a process executes the
od
wait () operation and finds that the semaphore value is not positive, it must
wait. However, rather than engaging in busy waiting, the process car block
itself. The block- operation places a process into a waiting queue associated
with the semaphore, and the state of the process is switched to the Waiting
state. Then control is transferred to the CPU scheduler, which selects another
process to execute.
C
p
el
eH
1. We have 5 philosophers.
2. They spend their life just being in two states:
a. Thinking
b. Eating
3. They sit on a circular table surrounded by 5 chairs (1 each), in the center of table is a bowl of
noodles, and the table is laid with 5 single forks.
4. Thinking state: When a ph. Thinks, he doesn’t interact with others.
od
5. Eating state: When a ph. Gets hungry, he tries to pick up the 2 forks adjacent to him (Left and
Right). He can pick one fork at a time.
6. One can’t pick up a fork if it is already taken.
7. When ph. Has both forks at the same time, he eats without releasing forks.
8. Solution can be given using semaphores.
a. Each fork is a binary semaphore.
C
p
el
eH
od
C
LEC-21: Deadlock Part-1
p
CPUs.
8. How a process/thread utilize a resource?
a. Request: Request the R, if R is free Lock it, else wait till it is available.
el
b. Use
c. Release: Release resource instance and make it available for other processes
eH
od
p
ii. Protocol (A) can be, each process has to request and be allocated all its resources
before its execution.
iii. Protocol (B) can be, allow a process to request resources only when it has none. It
el
can request any additional resources after it must have released all the resources
that it is currently allocated.
c. No preemption
i. If a process is holding some resources and request another resource that cannot
eH
be immediately allocated to it, then all the resources the process is currently
holding are preempted. The process will restart only when it can regain its old
resources, as well as the new one that it is requesting. (Live Lock may occur).
ii. If a process requests some resources, we first check whether they are available. If
yes, we allocate them. If not, we check whether they are allocated to some other
process that is waiting for additional resources. If so, preempt the desired resource
from waiting process and allocate them to the requesting process.
od
d. Circular wait
i. To ensure that this condition never holds is to impose a proper ordering of
resource allocation.
ii. P1 and P2 both require R1 and R1, locking on these resources should be like, both
try to lock R1 then R2. By this way which ever process first locks R1 will get R2.
C
LEC-22: Deadlock Part-2
1. Deadlock Avoidance: Idea is, the kernel be given in advance info concerning which resources will
use in its lifetime.
By this, system can decide for each request whether the process should wait.
To decide whether the current request can be satisfied or delayed, the system must consider the
resources currently available, resources currently allocated to each process in the system and the
future requests and releases of each process.
a. Schedule process and its resources allocation in such a way that the DL never occur.
b. Safe state: A state is safe if the system can allocate resources to each process (up to its
max.) in some order and still avoid DL.
A system is in safe state only if there exists a safe sequence.
c. In an Unsafe state, the operating system cannot prevent processes from requesting
resources in such a way that any deadlock occurs. It is not necessary that all unsafe states
p
are deadlocks; an unsafe state may lead to a deadlock.
d. The main key of the deadlock avoidance method is whenever the request is made for
resources then the request must only be approved only in the case if the resulting state is a
el
safe state.
e. In a case, if the system is unable to fulfill the request of all processes, then the state of the
system is called unsafe.
f. Scheduling algorithm using which DL can be avoided by finding safe state. (Banker
eH
Algorithm)
2. Banker Algorithm
a. When a process requests a set of resources, the system must determine whether allocating
these resources will leave the system in a safe state. If yes, then the resources may be
allocated to the process. If not, then the process must wait till other processes release
enough resources.
3. Deadlock Detection: Systems haven’t implemented deadlock-prevention or a deadlock avoidance
od
1. In Multi-programming environment, we have multiple processes in the main memory (Ready Queue) to
keep the CPU utilization high and to make computer responsive to the users.
2. To realize this increase in performance, however, we must keep several processes in the memory; that is, we
must share the main memory. As a result, we must manage main memory for all the different processes.
3. Logical versus Physical Address Space
a. Logical Address
i. An address generated by the CPU.
ii. The logical address is basically the address of an instruction or data used by a process.
iii. User can access logical address of the process.
iv. User has indirect access to the physical address through logical address.
v. Logical address does not exist physically. Hence, aka, Virtual address.
vi. The set of all logical addresses that are generated by any program is referred to as Logical
Address Space.
p
vii. Range: 0 to max.
b. Physical Address
i. An address loaded into the memory-address register of the physical memory.
el
ii. User can never access the physical address of the Program.
iii. The physical address is in the memory unit. It’s a location in the main memory physically.
iv. A physical address can be accessed by a user indirectly but not directly.
v. The set of all physical addresses corresponding to the Logical addresses is commonly
known as Physical Address Space.
eH
vi. It is computed by the Memory Management Unit (MMU).
vii. Range: (R + 0) to (R + max), for a base value R.
c. The runtime mapping from virtual to physical address is done by a hardware device called the
memory-management unit (MMU).
d. The user's program mainly generates the logical address, and the user thinks that the program is
running in this logical address, but the program mainly needs physical memory in order to
complete its execution.
od
C
e.
4. How OS manages the isolation and protect? (Memory Mapping and Protection)
a. OS provides this Virtual Address Space (VAS) concept.
b. To separate memory space, we need the ability to determine the range of legal addresses that the
process may access and to ensure that the process can access only these legal addresses.
c. The relocation register contains value of smallest physical address (Base address [R]); the limit
register contains the range of logical addresses (e.g., relocation = 100040 & limit = 74600).
d. Each logical address must be less than the limit register.
e. MMU maps the logical address dynamically by adding the value in the relocation register.
f. When CPU scheduler selects a process for execution, the dispatcher loads the relocation and limit
registers with the correct values as part of the context switch. Since every address generated by the
CPU (Logical address) is checked against these registers, we can protect both OS and other users’
programs and data from being modified by running process.
g. Any attempt by a program executing in user mode to access the OS memory or other uses’
memory results in a trap in the OS, which treat the attempt as a fatal error.
h. Address Translation
p
5. Allocation Method on Physical Memory
a. Contiguous Allocation
b. Non-contiguous Allocation
el
eH
6. Contiguous Memory Allocation
a. In this scheme, each process is contained in a single contiguous block of memory.
b. Fixed Partitioning
i. The main memory is divided into partitions of equal or different sizes.
od
C
ii.
iii. Limitations:
1. Internal Fragmentation: if the size of the process is lesser then the total size of
the partition then some size of the partition gets wasted and remain unused.
This is wastage of the memory and called internal fragmentation.
2. External Fragmentation: The total unused space of various partitions cannot be
used to load the processes even though there is space available but not in the
contiguous form.
3. Limitation on process size: If the process size is larger than the size of maximum
sized partition then that process cannot be loaded into the memory. Therefore, a
limitation can be imposed on the process size that is it cannot be larger than the
size of the largest partition.
4. Low degree of multi-programming: In fixed partitioning, the degree of
multiprogramming is fixed and very less because the size of the partition cannot
be varied according to the size of processes.
c. Dynamic Partitioning
i. In this technique, the partition size is not declared initially. It is declared at the time of
process loading.
p
ii.
iii. Advantages over fixed partitioning
el
1. No internal fragmentation
2. No limit on size of process
3. Better degree of multi-programming
iv. Limitation
1. External fragmentation
eH
od
C
LEC-25: Free Space Management
1. Defragmentation/Compaction
a. Dynamic partitioning suffers from external fragmentation.
b. Compaction to minimize the probability of external fragmentation.
c. All the free partitions are made contiguous, and all the loaded partitions are brought together.
d. By applying this technique, we can store the bigger processes in the memory. The free partitions
are merged which can now be allocated according to the needs of new processes. This technique is
also called defragmentation.
e. The efficiency of the system is decreased in the case of compaction since all the free spaces will be
transferred from several places to a single place.
2. How free space is stored/represented in OS?
a. Free holes in the memory are represented by a free list (Linked-List data structure).
3. How to satisfy a request of a of n size from a list of free holes?
a. Various algorithms which are implemented by the Operating System in order to find out the holes
p
in the linked list and allocate them to the processes.
b. First Fit
i. Allocate the first hole that is big enough.
el
ii. Simple and easy to implement.
iii. Fast/Less time complexity
c. Next Fit
i. Enhancement on First fit but starts search always from last allocated hole.
ii. Same advantages of First Fit.
eH
d. Best Fit
i. Allocate smallest hole that is big enough.
ii. Lesser internal fragmentation.
iii. May create many small holes and cause major external fragmentation.
iv. Slow, as required to iterate whole free holes list.
e. Worst Fit
i. Allocate the largest hole that is big enough.
od
p
logical memory into blocks of same size called Pages. (# Page size = Frame size)
d. Page size is usually determined by the processor architecture. Traditionally, pages in a system had
uniform size, such as 4,096 bytes. However, processor designs often allow two or more, sometimes
el
simultaneous, page sizes due to its benefits.
e. Page Table
i. A Data structure stores which page is mapped to which frame.
ii. The page table contains the base address of each page in the physical memory.
f. Every address generated by CPU (logical address) is divided into two parts: a page number (p) and
eH
a page offset (d). The p is used as an index into a page table to get base address the corresponding
frame in physical memory.
od
g. Page table is stored in main memory at the time of process creation and its base address is stored
C
p
el
eH
f. TLB hit, TLB contains the mapping for the requested logical address.
g. Address space identifier (ASIDs) is stored in each entry of TLB. ASID uniquely identifies each
process and is used to provide address space protection and allow to TLB to contain entries for
several different processes. When TLB attempts to resolve virtual page numbers, it ensures that
od
the ASID for the currently executing process matches the ASID associated with virtual page. If it
doesn’t match, the attempt is treated as TLB miss.
C
LEC-27: Segmentation | Non-Contiguous Memory Allocation
1. An important aspect of memory management that become unavoidable with paging is separation of user’s
view of memory from the actual physical memory.
2. Segmentation is memory management technique that supports the user view of memory.
3. A logical address space is a collection of segments, these segments are based on user view of logical
memory.
4. Each segment has segment number and offset, defining a segment.
<segment-number, offset> {s,d}
5. Process is divided into variable segments based on user view.
6. Paging is closer to the Operating system rather than the User. It divides all the processes into the form of
pages although a process can have some relative parts of functions which need to be loaded in the same
page.
7. Operating system doesn't care about the User's view of the process. It may divide the same function into
different pages and those pages may or may not be loaded at the same time into the memory. It
p
decreases the efficiency of the system.
8. It is better to have segmentation which divides the process into the segments. Each segment contains the
same type of functions such as the main function can be included in one segment and the library functions
el
can be included in the other segment.
eH
od
9.
C
10. Advantages:
a. No internal fragmentation.
b. One segment has a contiguous allocation, hence efficient working within segment.
c. The size of segment table is generally less than the size of page table.
d. It results in a more efficient system because the compiler keeps the same type of functions in one
segment.
11. Disadvantages:
a. External fragmentation.
b. The different size of segment is not good that the time of swapping.
12. Modern System architecture provides both segmentation and paging implemented in some hybrid
approach.
LEC-28: What is Virtual Memory? || Demand Paging || Page Faults
1. Virtual memory is a technique that allows the execution of processes that are not completely in the
memory. It provides user an illusion of having a very big main memory. This is done by treating a part of
secondary memory as the main memory. (Swap-space)
2. Advantage of this is, programs can be larger than physical memory.
3. It is required that instructions must be in physical memory to be executed. But it limits the size of a
program to the size of physical memory. In fact, in many cases, the entire program is not needed at the
same time. So, we want an ability to execute a program that is only partially in memory would give
many benefits:
a. A program would no longer be constrained by the amount of physical memory that is
available.
b. Because each user program could take less physical memory, more programs could be run at
the same time, with a corresponding increase in CPU utilization and throughput.
c. Running a program that is not entirely in memory would benefit both the system and the
p
user.
4. Programmer is provided very large virtual memory when only a smaller physical memory is available.
5. Demand Paging is a popular method of virtual memory management.
el
6. In demand paging, the pages of a process which are least used, get stored in the secondary memory.
7. A page is copied to the main memory when its demand is made, or page fault occurs. There are various
page replacement algorithms which are used to determine the pages which will be replaced.
8. Rather than swapping the entire process into memory, we use Lazy Swapper. A lazy swapper never
swaps a page into memory unless that page will be needed.
eH
9. We are viewing a process as a sequence of pages, rather than one large contiguous address space, using
the term Swapper is technically incorrect. A swapper manipulates entire processes, whereas a Pager is
concerned with individual pages of a process.
10. How Demand Paging works?
a. When a process is to be swapped-in, the pager guesses which pages will be used.
b. Instead of swapping in a whole process, the pager brings only those pages into memory. This,
it avoids reading into memory pages that will not be used anyway.
od
c. Above way, OS decreases the swap time and the amount of physical memory needed.
d. The valid-invalid bit scheme in the page table is used to distinguish between pages that are
in memory and that are on the disk.
i. Valid-invalid bit 1 means, the associated page is both legal and in memory.
ii. Valid-invalid bit 0 means, the page either is not valid (not in the LAS of the process)
or is valid but is currently on the disk.
C
p
e.
f. If a process never attempts to access some invalid bit page, the process will be executed
successfully without even the need pages present in the swap space.
el
g. What happens if the process tries to access a page that was not brought into memory, access
to a page marked invalid causes page fault. Paging hardware noticing invalid bit for a
demanded page will cause a trap to the OS.
h. The procedure to handle the page fault:
eH
i. Check an internal table (in PCB of the process) to determine whether the reference
was valid or an invalid memory access.
ii. If ref. was invalid process throws exception.
If ref. is valid, pager will swap-in the page.
iii. We find a free frame (from free-frame list)
iv. Schedule a disk operation to read the desired page into the newly allocated frame.
v. When disk read is complete, we modify the page table that, the page is now in
memory.
od
vi. Restart the instruction that was interrupted by the trap. The process can now access
the page as through it had always been in memory.
C
p
i.
j. Pure Demand Paging
el
i. In extreme case, we can start executing a process with no pages in memory. When
eH
OS sets the instruction pointer to the first instruction of the process, which is not in
the memory. The process immediately faults for the page and page is brought in the
memory.
ii. Never bring a page into memory until it is required.
k. We use locality of reference to bring out reasonable performance from demand paging.
11. Advantages of Virtual memory
a. The degree of multi-programming will be increased.
b. User can run large apps with less real physical memory.
od
1. Whenever Page Fault occurs, that is, a process tries to access a page which is not currently present in a
frame and OS must bring the page from swap-space to a frame.
2. OS must do page replacement to accommodate new page into a free frame, but there might be a possibility
the system is working in high utilization and all the frames are busy, in that case OS must replace one of the
pages allocated into some frame with the new page.
3. The page replacement algorithm decides which memory page is to be replaced. Some allocated page is
swapped out from the frame and new page is swapped into the freed frame.
4. Types of Page Replacement Algorithm: (AIM is to have minimum page faults)
a. FIFO
i. Allocate frame to the page as it comes into the memory by replacing the oldest page.
ii. Easy to implement.
iii. Performance is not always good
1. The page replaced may be an initialization module that was used long time ago
p
(Good replacement candidate)
2. The page may contain a heavily used variable that was initialized early and is in
content use. (Will again cause page fault)
el
iv. Belady’s anomaly is present.
1. In the case of LRU and optimal page replacement algorithms, it is seen that
the number of page faults will be reduced if we increase the number of
frames. However, Balady found that, In FIFO page replacement algorithm, the
number of page faults will get increased with the increment in number of
eH
frames.
2. This is the strange behavior shown by FIFO algorithm in some of the cases.
b. Optimal page replacement
i. Find if a page that is never referenced in future. If such a page exists, replace this page
with new page.
If no such page exists, find a page that is referenced farthest in future. Replace this page
with new page.
od
1. Counters
a. Associate time field with each page table entry.
b. Replace the page with smallest time value.
2. Stack
a. Keep a stack of page number.
b. Whenever page is referenced, it is removed from the stack & put on
the top.
c. By this, most recently used is always on the top, & least recently used
is always on the bottom.
d. As entries might be removed from the middle of the stack, so Doubly
linked list can be used.
d. Counting-based page replacement – Keep a counter of the number of references that have been
made to each page. (Reference counting)
i. Least frequently used (LFU)
1. Actively used pages should have a large reference count.
2. Replace page with the smallest count.
ii. Most frequently used (MFU)
1. Based on the argument that the page with the smallest count was probably just
brought in and has yet to be used.
iii. Neither MFU nor LFU replacement is common.
p
el
eH
od
C
LEC-30: Thrashing
1. Thrashing
a. If the process doesn’t have the number of frames it needs to support pages in active use, it will
quickly page-fault. At this point, it must replace some page. However, since all its pages are in active
use, it must replace a page that will be needed again right away. Consequently, it quickly faults
again, and again, and again, replacing pages that it must bring back in immediately.
b. This high paging activity is called Thrashing.
c. A system is Thrashing when it spends more time servicing the page faults than executing
processes.
p
el
eH
d. Technique to Handle Thrashing
i. Working set model
1. This model is based on the concept of the Locality Model.
2. The basic principle states that if we allocate enough frames to a process to
od
accommodate its current locality, it will only fault whenever it moves to some
new locality. But if the allocated frames are lesser than the size of the current
locality, the process is bound to thrash.
ii. Page Fault frequency
1. Thrashing has a high page-fault rate.
2. We want to control the page-fault rate.
3. When it is too high, the process needs more frames. Conversely, if the page-fault
C
rate is too low, then the process may have too many frames.
4. We establish upper and lower bounds on the desired page fault rate.
5. If pf-rate exceeds the upper limit, allocate the process another frame, if pf-rate
fails falls below the lower limit, remove a frame from the process.
6. By controlling pf-rate, thrashing can be prevented.
Computer Networks
● Network : A network is a set of devices that are connected with a physical media link. In
a network, two or more nodes are connected by a physical link or two or more networks
are connected by one or more nodes. A network is a collection of devices connected to
each other to allow the sharing of data.
● Ring :
1. Ring topology is a network topology in which nodes are exactly connected to two
or more nodes and thus, forming a single continuous path for the transmission.
2. It does not need any central server to control the connectivity among the nodes.
3. If the single node is damaged, then the whole network fails.
4. Ring topology is very rarely used as it is expensive, difficult to install and
manage.
5. Examples of Ring topology are SONET network, SDH network, etc.
● Bus :
1. Bus topology is a network topology in which all the nodes are connected to a
single cable known as a central cable or bus.
2. It acts as a shared communication medium, i.e., if any device wants to send the
data to other devices, then it will send the data over the bus which in turn sends
the data to all the attached devices.
3. Bus topology is useful for a small number of devices.
4. As if the bus is damaged then the whole network fails.
● Mesh :
1. Mesh topology is a network topology in which all the nodes are individually
connected to other nodes.
2. It does not need any central switch or hub to control the connectivity among the
nodes.
3. Mesh topology is categorized into two parts: Fully connected mesh topology:
In this topology, all the nodes are connected to each other. Partially connected
mesh topology: In this topology, all the nodes are not connected to each other.
4. It is robust as a failure in one cable will only disconnect the specified computer
connected to this cable.
5. Mesh topology is rarely used as installation and configuration are difficult when
connectivity gets more.
6. Cabling cost is high as it requires bulk wiring.
● Tree :
1. Tree topology is a combination of star and bus topology. It is also known as the
expanded star topology.
2. In tree topology, all the star networks are connected to a single bus.
3. Ethernet protocol is used in this topology.
4. In this, the whole network is divided into segments known as star networks which
can be easily maintained. If one segment is damaged, there is no effect on other
segments.
5. Tree topology depends on the "main bus," and if it breaks, then the whole
network gets damaged
● Hybrid :
1. A hybrid topology is a combination of different topologies to form a resulting
topology.
2. If star topology is connected with another star topology, then it remains a star
topology. If star topology is connected with different topology, then it becomes a
Hybrid topology.
3. It provides flexibility as it can be implemented in a different network environment.
● Different Types of Networks : (Imp) - Networks can be divided on the basis of area of
distribution. For example:
● PAN (Personal Area Network): Its range limit is up to 10 meters. It is created for
personal use. Generally, personal devices are connected to this network. For example
computers, telephones, fax, printers, etc.
● LAN (Local Area Network): It is used for a small geographical location like office,
hospital, school, etc.
● HAN (House Area Network): It is actually a LAN that is used within a house and used to
connect homely devices like personal computers, phones, printers, etc.
● CAN (Campus Area Network): It is a connection of devices within a campus area which
links to other departments of the organization within the same campus.
● MAN (Metropolitan Area Network): It is used to connect the devices which span to
large cities like metropolitan cities over a wide geographical area.
● WAN (Wide Area Network): It is used over a wide geographical location that may range
to connect cities and countries.
● GAN (Global Area Network): It uses satellites to connect devices over the global area.
● VPN (Virtual Private Network) : VPN or the Virtual Private Network is a private WAN
(Wide Area Network) built on the internet. It allows the creation of a secured tunnel
(protected network) between different networks using the internet (public network). By
using the VPN, a client can connect to the organization’s network remotely.
● Advantages of VPN :
2. VPN is used for secure transactions and confidential data transfer between
multiple offices located in different geographical locations.
4. VPN encrypts the internet traffic and disguises the online identity.
● Types of VPN :
● Access VPN: Access VPN is used to provide connectivity to remote mobile users and
telecommuters. It serves as an alternative to dial-up connections or ISDN (Integrated
Services Digital Network) connections. It is a low-cost solution and provides a wide
range of connectivity.
● Site-to-Site VPN: A Site-to-Site or Router-to-Router VPN is commonly used in large
companies having branches in different locations to connect the network of one office to
another in different locations. There are 2 sub-categories as mentioned below:
● Intranet VPN: Intranet VPN is useful for connecting remote offices in different
geographical locations using shared infrastructure (internet connectivity and servers)
with the same accessibility policies as a private WAN (wide area network).
● Extranet VPN: Extranet VPN uses shared infrastructure over an intranet, suppliers,
customers, partners, and other entities and connects them using dedicated connections.
2. DataLink Layer
● It is used for transferring the data from one node to another node.
● It receives the data from the network layer and converts the data into data frames
and then attaches the physical address to these frames which are sent to the
physical layer.
● It enables the error-free transfer of data from one node to another node.
● Frame synchronization: Data-link layer converts the data into frames, and it ensures
that the destination must recognize the starting and ending of each frame.
● Flow control: Data-link layer controls the data flow within the network.
● Error control: It detects and corrects the error occurred during the transmission from
source to destination.
● Addressing: Data-link layers attach the physical address with the data frames so that
the individual machines can be easily identified.
● Link management: Data-link layer manages the initiation, maintenance and termination
of the link between the source and destination for the effective exchange of data.
3. Network Layer
● Network layer converts the logical address into the physical address.
● The routing concept means it determines the best route for the packet to travel from
source to the destination.
Functions of network layer :
● Routing: The network layer determines the best route from source to destination. This
function is known as routing.
● Logical addressing: The network layer defines the addressing scheme to identify each
device uniquely.
● Packetizing: The network layer receives the data from the upper layer and converts the
data into packets. This process is known as packetizing.
● Internetworking: The network layer provides the logical connection between the
different types of networks for forming a bigger network.
4. Transport Layer
● It delivers the message through the network and provides error checking so that no error
occurs during the transfer of data.
5. Session Layer
● The main responsibility of the session layer is beginning, maintaining and ending the
communication between the devices.
● Session layer also reports the error coming from the upper layers.
● Session layer establishes and maintains the session between the two users.
6. Presentation Layer
● The presentation layer is also known as a Translation layer as it translates the data from
one format to another format.
● At the sender side, this layer translates the data format used by the application layer to
the common format and at the receiver side, this layer translates the common format into
a format used by the application layer.
Functions of presentation layer:
○ Data conversion
○ Data compression
○ Data encryption
7. Application Layer
● Application layer protocols are file transfer protocol, simple mail transfer protocol,
domain name system, etc.
● The most widely used application protocol is HTTP(Hypertext transfer protocol ). A user
sends the request for the web page using HTTP.
● TCP/IP Reference Model : It is a compressed version of the OSI model with only 4
layers. It was developed by the US Department of Defence (DoD) in the 1860s. The
name of this model is based on 2 standard protocols used i.e. TCP (Transmission
Control Protocol) and IP (Internet Protocol).
1. Link : Decides which links such as serial lines or classic Ethernet must be used
to meet the needs of the connectionless internet layer. Ex - Sonet, Ethernet
2. Internet : The internet layer is the most important layer which holds the whole
architecture together. It delivers the IP packets where they are supposed to be
delivered. Ex - IP, ICMP.
3. Transport : Its functionality is almost the same as the OSI transport layer. It
enables peer entities on the network to carry on a conversation. Ex - TCP, UDP
(User Datagram Protocol)
4. Application : It contains all the higher-level protocols. Ex - HTTP, SMTP, RTP,
DNS.
HTTP is the HyperText Transfer Protocol which defines the set of rules and standards
on how the information can be transmitted on the World Wide Web (WWW). It helps the
web browsers and web servers for communication. It is a ‘stateless protocol’ where each
command is independent with respect to the previous command. HTTP is an
application layer protocol built upon the TCP. It uses port 80 by default.
● DNS (Imp) :
1. DNS is an acronym that stands for Domain Name System.DNS was introduced
by Paul Mockapetris and Jon Postel in 1983.
2. It is a naming system for all the resources over the internet which includes
physical nodes and applications. It is used to locate resources easily over a
network.
3. DNS is an internet which maps the domain names to their associated IP
addresses.
4. Without DNS, users must know the IP address of the web page that you wanted
to access.
● Working of DNS (Imp): If you want to visit the website of "shaurya", then the user will
type "https://www.shaurya.com" into the address bar of the web browser. Once the
domain name is entered, then the domain name system will translate the domain name
into the IP address which can be easily interpreted by the computer. Using the IP
address, the computer can locate the web page requested by the user.
● DNS Forwarder : A forwarder is used with a DNS server when it receives DNS queries
that cannot be resolved quickly. So it forwards those requests to external DNS servers
for resolution. A DNS server which is configured as a forwarder will behave differently
than the DNS server which is not configured as a forwarder.
● SMTP Protocol : SMTP is the Simple Mail Transfer Protocol. SMTP sets the rule for
communication between servers. This set of rules helps the software to transmit emails
over the internet. It supports both End-to-End and Store-and-Forward methods. It is in
always-listening mode on port 25.
● Difference Between TCP (Transmission Control Protocol) and UDP (User Datagram
Protocol):
1. TCP is a connection-oriented protocol, whereas UDP is a connectionless
protocol. A key difference between TCP and UDP is speed, as TCP is
comparatively slower than UDP. Overall, UDP is a much faster, simpler, and
efficient protocol, however, retransmission of lost data packets is only possible
with TCP
2. TCP provides extensive error checking mechanisms. It is because it provides
flow control and acknowledgment of data. UDP has only the basic error
checking mechanism using checksums.
Important Protocols
A protocol is a set of rules which is used to govern all the aspects of information
communication.The main elements of a protocol are:
● Syntax: It specifies the structure or format of the data. It also specifies the order in which
they are presented.
● Timing: Timing specifies two characteristics: When data should be sent and how fast it
can be sent.
2. The main difference between MAC and IP address is that MAC Address is used to
ensure the physical address of a computer. It uniquely identifies the devices on a
network. While IP addresses are used to uniquely identify the connection of a network
with that device taking part in a network.
● Firewall : The firewall is a network security system that is used to monitor the incoming
and outgoing traffic and blocks the same based on the firewall security policies. It acts as
a wall between the internet (public network) and the networking devices (a private
network). It is either a hardware device, software program, or a combination of both. It
adds a layer of security to the network.
Important Key Points
1. What happens when you enter google.com in the web browser? (Most Imp)
Steps :
● Check the browser cache first if the content is fresh and present in the cache display the
same.
● If not, the browser checks if the IP of the URL is present in the cache (browser and OS)
if not then requests the OS to do a DNS lookup using UDP to get the corresponding IP
address of the URL from the DNS server to establish a new TCP connection.
● A new TCP connection is set between the browser and the server using three-way
handshaking.
● An HTTP request is sent to the server using the TCP connection.
● The web servers running on the Servers handle the incoming HTTP request and send
the HTTP response.
● The browser processes the HTTP response sent by the server and may close the TCP
connection or reuse the same for future requests.
● If the response data is cacheable then browsers cache the same.
● Browser decodes the response and renders the content.
2. Hub: Hub is a networking device which is used to transmit the signal to each port
(except one port) to respond from which the signal was received. Hub is operated on a
Physical layer. In this packet filtering is not available. It is of two types: Active Hub,
Passive Hub.
3. A subnet is a network inside a network achieved by the process called subnetting which
helps divide a network into subnets. It is used for getting a higher routing efficiency and
enhances the security of the network. It reduces the time to extract the host address
from the routing table.
4. The reliability of a network can be measured by the following factors:
● Downtime: The downtime is defined as the required time to recover.
● Failure Frequency: It is the frequency when it fails to work the way it is intended.
● Catastrophe: It indicates that the network has been attacked by some unexpected event
such as fire, earthquake.
5. There are mainly two criteria which make a network effective and efficient:
● Performance : performance can be measured in many ways like transmit time and
response time.
● Robustness: robustness specifies the quality or condition of being strong and in good
condition.
● Security: It specifies how to protect data from unauthorized access and viruses.
6. Node and Link : A network is a connection setup of two or more computers directly
connected by some physical mediums like optical fiber or coaxial cable. This physical
medium of connection is known as a link, and the computers that it is connected to are
known as nodes.
7. Gateway and router : A node that is connected to two or more networks is commonly
known as a gateway. It is also known as a router. It is used to forward messages from
one network to another. Both the gateway and router regulate the traffic in the
network. Differences between gateway and router: A router sends the data between
two similar networks while gateway sends the data between two dissimilar networks.
8. NIC (Imp) : NIC stands for Network Interface Card. It is a peripheral card attached to
the PC to connect to a network. Every NIC has its own MAC address that identifies the
PC on the network. It provides a wireless connection to a local area network. NICs were
mainly used in desktop computers.
9. POP3 stands for Post Office Protocol version3. POP is responsible for accessing the
mail service on a client machine. POP3 works on two models such as Delete mode and
Keep mode.
10. Private IP Address - There are three ranges of IP addresses that have been reserved
for IP addresses. They are not valid for use on the internet. If you want to access the
internet on these private IPs, you must use a proxy server or NAT server.
12. Netstat : It is a command line utility program. It gives useful information about the
current TCP/IP setting of a connection.
13. Ping : The "ping" is a utility program that allows you to check the connectivity between
the network devices. You can ping devices using its IP address or name.
14. The processes on each machine that communicate at a given layer are called
peer-peer processes. (P2P).
15. Unicasting: If the message is sent to a single node from the source then it is known as
unicasting. This is commonly used in networks to establish a new connection.
Anycasting: If the message is sent to any of the nodes from the source then it is known
as anycasting. It is mainly used to get the content from any of the servers in the Content
Delivery System.
Multicasting: If the message is sent to a subset of nodes from the source then it is
known as multicasting. Used to send the same data to multiple receivers.
Broadcasting: If the message is sent to all the nodes in a network from a source then it
is known as broadcasting. DHCP and ARP in the local network use broadcasting.