0% found this document useful (0 votes)
3 views

chap 6 dbms

Uploaded by

pavan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

chap 6 dbms

Uploaded by

pavan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

6.

Fundamentals of Data Base Transaction Processing


1. Explain the following terms: transaction, concurrency control.
2. Explain the desirable properties of a transaction.
3. Write a note on System Log.
4. Write a note on commit point of a transaction.
5. Discuss the advantages of NoSql versus RDBMS.
6. Explain CAP theorem.
7. Explain the advantages of distributed computing.
8. What is scaling. Explain the types of scaling.
9. Differentiate between ACID and BASE.
10. Bring out the advantages and disadvantages of Nosql.
11. Draw a state diagram and discuss the typical states during
transaction execution.
12. Discuss the different types of failures that occur during
transaction execution.
13. Explain the need for concurrency control.
14. Explain the categories of NoSql.
15. Explain Column store and Row store with suitable examples. ‘

1. Explain the desirable properties of a transaction.

There are four important properties of transaction. These are:


1. Atomicity: A transaction is said to be atomic if a transaction always executes all its
actions in one step or not executes any actions at all
2. Consistency: A transaction must alter the database from one steady state to another
steady state. This is the responsibility of both the DBMS and the application developers
to make certain consistency.
3. Isolation: (concurrent changes invisibles): Transactions that are executing
independently of one another is the primary concept followed by isolation. It is the
responsibility of the concurrency control sub-system to ensure adapting the isolation.
4. Durability: (committed update persist) : The effects of a successfully accomplished
transaction are permanently recorded in the database and must not get lost or vanished
due of a subsequent failure. So this becomes the responsibility of the recovery sub-system
to ensure durability.
The acronym ACID is sometimes used to refer above four properties of transaction that we have
presented here: Atomicity, Consistency, Isolation, and Durability.

2. Explain the States of Transaction with diagram.

A transcation is an atomic unit of work that is completed entirely or not at all.

A transaction must be in one of the following states:

 ACTIVE STATE: A transaction goes immediately after it starts execution, where it can
execute its READ and WRITE operations.
 PARTIALLY COMMITTED: some recovery protocols need to ensure that a system
failure will not result in an inability to record the changes of the transaction permanently
 FAILED: If one of the checks fails or if the transaction is aborted during its active state.
The transaction may then have to be rolled back to undo the effect of
its WRITE operations on the database
 COMMITTED: When a transaction is committed, it has concluded its execution
successfully and all its changes must be recorded permanently in the database.
 TERMINATED : the transaction leaving the system.

3. Write a note on System Log.

The OS maintains a log of events that helps in monitoring, administering and troubleshooting the
system in addition to helping users get information about important processes. Some of the
events include system errors, warnings, startup messages, system changes, abnormal shutdowns,
etc. This list is applicable to most versions of the three common OSs (Windows, Linux and Mac
OS).
The events recorded are the significant occurrences in the OS that require notifying the user. The
log contains information about the software, hardware, system processes and system
components. It also indicates whether the processes loaded successfully or not. The information
can then be used to diagnose the sources of computer problems, whereas the warnings can be
used to predict potential system issues and problems.
The syslog has standard components that may vary depending on the OS. However, there are
common components and information that are captured regardless of the OS.
All entries are classified by type such as error, information, warning, success audit and failure
audit for Windows systems, and emergency, alert, critical, error, warning, notice, info and debug
for Mac OS and Linux systems.
Each syslog entry contains a header information and a description of the events. The latter
includes the date and time the events occurred, the username logged on and the computer name
at the time of the event. It also contains the event ID number that is used to identify the event and
the source of the event such as the name of the system component.
The syslog is easily viewed using built-in utilities such as the Event Viewer in Windows. In
addition to viewing, the Event Viewer is also used to manage the file size, save or archive the log
file, clear old events and set overwrite options. Other options include finding or filtering events
and restoring the log to default settings.

4. Write a note on commit point of a transaction.

 It refers to the completion of the transaction. When all the operations have been
completed by the transaction and have been recorded in the log, then the COMMIT point
is inserted in the log which means that the transaction has been completed now and its
effects would be permanent.
 In the case of failures, the transactions for which COMMIT operation was not added,
those transactions cannot be undone and are rolled back and the transactions for which
COMMIT operation was added, those transactions are redone.
 If the log is maintained in main memory buffer and is containing transaction records, then
before adding COMMIT, the transaction log is updated on the disk and then the
COMMIT operation is written. This is known as Force Writing the log before committing
a transaction.

5. Explain various transaction operations.


The following are the operations that the recovery manager keeps of
. • BEGIN _ TRANSACTION: This marks the beginning of transaction execution.
• READ OR WRITE: These specify read or write operations on the database items that are
executed as part of a transaction.
• END _ TRANSACTION: This marks the end of transaction execution. At this point it should
be checked whether the changes introduced by the transaction could be permanently applied to
the database or whether the transaction has to be aborted.
• COMMIT _ TRANSACTION: It signifies successful end of transaction so that any changes
executed by the transaction can be safely committed to the database and will not be undone.
• ROLBACK (ABORT): This signals the transaction has ended successfully, so that any
changes or effects that the transaction may have applied to the database must be undone.

6. Explain the advantages of distributed computing.

Distributed Systems
• A distributed system consists of multiple computers and software components that
communicate through a computer network (a local network or by a wide area network).
• A distributed system can consist of any number of possible configurations,
– such as mainframes, workstations, personal computers, and so on.
• The computers interact with each other and share the resources of the system to achieve a
common goal.

Advantages of Distributed Computing


• Reliability (fault tolerance)
• Scalability
• Sharing of Resources
• Flexibility
• Speed
• Open system
• Performance
Disadvantages of Distributed Computing
• Troubleshooting :Troubleshooting and diagnosing problems.
• Software :Less software support is the main disadvantage of distributed computing
system.
• Networking :The network infrastructure can create several problems such as transmission
problem, overloading, loss of messages.
• Security :
Easy access in distributed computing system increases the risk of security and sharing of
data generates the problem of data security
• Scalability
There are two ways of scaling horizontal and vertical scaling :
• Vertical scaling

• Horizontal scaling .

7. What is NoSQL ?
NoSQL is a non-relational database management systems, different from traditional relational
database management systems in some significant ways. It is designed for distributed data stores
where very large scale of data storing needs (for example Google or Facebook which collects
terabits of data every day for their users). These type of data storing may not require fixed
schema, avoid join operations and typically scale horizontally. In today’s time data is becoming
easier to access and capture through third parties such as Facebook, Google+ and others.
Personal user information, social graphs, geo location data, user-generated content and machine
logging data are just a few examples where the data has been increasing exponentially. To avail
the above service properly, it is required to process huge amount of data. Which SQL databases
were never designed. The evolution of NoSql databases is to handle these huge data properly.

Example :
Social-network graph :
Each record: UserID1, UserID2
Separate records: UserID, first_name,last_name, age, gender,...
Task: Find all friends of friends of friends of ... friends of a given user.
Wikipedia pages :
Large collection of documents
Combination of structured and unstructured data
Task: Retrieve all pages regarding athletics of Summer Olympic before 1950.
8. Discuss the advantages of NoSql versus RDBMS
- Structured and organized data
- Structured query language (SQL)
- Data and its relationships are stored in separate tables.
- Data Manipulation Language, Data Definition Language
- Tight Consistency
NoSQL
- Stands for Not Only SQL
- No declarative query language
- No predefined schema
- Key-Value pair storage, Column Store, Document Store, Graph databases
- Eventual consistency rather ACID property
- Unstructured and unpredictable data
- CAP Theorem
- Prioritizes high performance, high availability and scalability
- BASE Transaction
Brief history of NoSQL
The term NoSQL was coined by Carlo Strozzi in the year 1998. He used this term to name his
Open Source, Light Weight, DataBase which did not have an SQL interface.

In the early 2009, when last.fm wanted to organize an event on open-source distributed
databases, Eric Evans, a Rackspace employee, reused the term to refer databases which are non-
relational, distributed, and does not conform to atomicity, consistency, isolation, durability - four
obvious features of traditional relational database systems.

In the same year, the "no:sql(east)" conference held in Atlanta, USA, NoSQL was discussed and
debated a lot.

And then, discussion and practice of NoSQL got a momentum, and NoSQL saw an
unprecedented growth.

9. Explain CAP Theorem (Brewer’s Theorem).


You must understand the CAP theorem when you talk about NoSQL databases or in fact when
designing any distributed system. CAP theorem states that there are three basic requirements
which exist in a special relation when designing applications for a distributed architecture.

Consistency - This means that the data in the database remains consistent after the execution of
an operation. For example after an update operation all clients see the same data.
Availability - This means that the system is always on (service guarantee availability), no
downtime.
Partition Tolerance - This means that the system continues to function even the communication
among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that
cannot communicate with one another.
In theoretically it is impossible to fulfill all 3 requirements. CAP provides the basic requirements
for a distributed system to follow 2 of the 3 requirements. Therefore all the current NoSQL
database follow the different combinations of the C, A, P from the CAP theorem. Here is the
brief description of three combinations CA, CP, AP :

CA - Single site cluster, therefore all nodes are always in contact. When a partition occurs, the
system blocks.
CP - Some data may not be accessible, but the rest is still consistent/accurate.
AP - System is still available under partitioning, but some of the data returned may be inaccurate.

NoSQL pros/cons
Advantages :
• High scalability
• Distributed Computing
• Lower cost
• Schema flexibility, semi-structure data
• No complicated Relationships
Disadvantages
• No standardization
• Limited query capabilities (so far)
• Eventual consistent is not intuitive to program for
The BASE
he BASE acronym was defined by Eric Brewer, who is also known for formulating the CAP
theorem.

The CAP theorem states that a distributed computer system cannot guarantee all of the following
three properties at the same time:

• Consistency
• Availability
• Partition tolerance
A BASE system gives up on consistency.

• Basically Available indicates that the system does guarantee availability, in terms of the
CAP theorem.
• Soft state indicates that the state of the system may change over time, even without input.
This is because of the eventual consistency model.
• Eventual consistency indicates that the system will become consistent over time, given
that the system doesn't receive input during that time.
ACID vs BASE
ACID BASE
Atomic Basically Available
Consistency Soft state
Isolation Eventual consistency
Durable
BigTable, Cassandra, SimpleDB

14. Explain the categories of NoSql.


There are four general types (most common categories) of NoSQL databases. Each of these
categories has its own specific attributes and limitations. There is not a single solutions which is
better than all the others, however there are some databases that are better to solve specific
problems. To clarify the NoSQL databases, lets discuss the most common categories :

• Key-value stores
• Column-oriented
• Graph
• Document oriented
Key-value stores
• Key-value stores are most basic types of NoSQL databases.
• Designed to handle huge amounts of data.
• Based on Amazon’s Dynamo paper.
• Key value stores allow developer to store schema-less data.
• In the key-value storage, database stores data as hash table where each key is unique and
the value can be string, JSON, BLOB (basic large object) etc.
• A key may be strings, hashes, lists, sets, sorted sets and values are stored against these
keys.
• For example a key-value pair might consist of a key like "Name" that is associated with a
value like "Robin".
• Key-Value stores can be used as collections, dictionaries, associative arrays etc.
• Key-Value stores follows the 'Availability' and 'Partition' aspects of CAP theorem.
• Key-Values stores would work well for shopping cart contents, or individual values like
color schemes, a landing page URI, or a default account number.
Example of Key-value store DataBase : Redis, Dynamo, Riak. etc.
Pictorial Presentation :
Column-oriented databases
• Column-oriented databases primarily work on columns and every column is treated
individually.
• Values of a single column are stored contiguously.
• Column stores data in column specific files.
• In Column stores, query processors work on columns too.
• All data within each column datafile have the same type which makes it ideal for
compression.
• Column stores can improve the performance of queries as it can access specific column
data.
• High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX).
• Works on data warehouses and business intelligence, customer relationship management
(CRM), Library card catalogs etc.
Example of Column-oriented databases : BigTable, Cassandra, SimpleDB etc.
Pictorial Presentation :
Graph databases
A graph data structure consists of a finite (and possibly mutable) set of ordered pairs, called
edges or arcs, of certain entities called nodes or vertices.

Following picture presents a labeled graph of 6 vertices and 7 edges.

What is a Graph Databases?

• A graph database stores data in a graph.


• It is capable of elegantly representing any kind of data in a highly accessible way.
• A graph database is a collection of nodes and edges
• Each node represents an entity (such as a student or business) and each edge represents a
connection or relationship between two nodes.
• Every node and edge is defined by a unique identifier.
• Each node knows its adjacent nodes.
• As the number of nodes increases, the cost of a local step (or hop) remains the same.
• Index for lookups.
Here is a comparison between the classic relational model and the graph model :

Relational model Graph model


Tables Vertices and Edges set
Rows Vertices
Columns Key/value pairs
Joins Edges
Example of Graph databases : OrientDB, Neo4J, Titan.etc.
Pictorial Presentation :
Document Oriented databases
• A collection of documents
• Data in this model is stored inside documents.
• A document is a key value collection where the key allows access to its value.
• Documents are not typically forced to have a schema and therefore are flexible and easy
to change.
• Documents are stored into collections in order to group different kinds of data.
• Documents can contain many different key-value pairs, or key-array pairs, or even nested
documents.
Here is a comparison between the classic relational model and the document model :

Relational model Document model


Tables Collections
Rows Documents
Columns Key/value pairs
Joins not available
Example of Document Oriented databases : MongoDB, CouchDB etc.
Pictorial Presentation :
Production deployment
There is a large number of companies using NoSQL. To name a few :

• Google
• Facebook
• Mozilla
• Adobe
• Foursquare
• LinkedIn
• Digg
• McGraw-Hill Education
• Vermont Public Radio

You might also like