chap 6 dbms
chap 6 dbms
ACTIVE STATE: A transaction goes immediately after it starts execution, where it can
execute its READ and WRITE operations.
PARTIALLY COMMITTED: some recovery protocols need to ensure that a system
failure will not result in an inability to record the changes of the transaction permanently
FAILED: If one of the checks fails or if the transaction is aborted during its active state.
The transaction may then have to be rolled back to undo the effect of
its WRITE operations on the database
COMMITTED: When a transaction is committed, it has concluded its execution
successfully and all its changes must be recorded permanently in the database.
TERMINATED : the transaction leaving the system.
The OS maintains a log of events that helps in monitoring, administering and troubleshooting the
system in addition to helping users get information about important processes. Some of the
events include system errors, warnings, startup messages, system changes, abnormal shutdowns,
etc. This list is applicable to most versions of the three common OSs (Windows, Linux and Mac
OS).
The events recorded are the significant occurrences in the OS that require notifying the user. The
log contains information about the software, hardware, system processes and system
components. It also indicates whether the processes loaded successfully or not. The information
can then be used to diagnose the sources of computer problems, whereas the warnings can be
used to predict potential system issues and problems.
The syslog has standard components that may vary depending on the OS. However, there are
common components and information that are captured regardless of the OS.
All entries are classified by type such as error, information, warning, success audit and failure
audit for Windows systems, and emergency, alert, critical, error, warning, notice, info and debug
for Mac OS and Linux systems.
Each syslog entry contains a header information and a description of the events. The latter
includes the date and time the events occurred, the username logged on and the computer name
at the time of the event. It also contains the event ID number that is used to identify the event and
the source of the event such as the name of the system component.
The syslog is easily viewed using built-in utilities such as the Event Viewer in Windows. In
addition to viewing, the Event Viewer is also used to manage the file size, save or archive the log
file, clear old events and set overwrite options. Other options include finding or filtering events
and restoring the log to default settings.
It refers to the completion of the transaction. When all the operations have been
completed by the transaction and have been recorded in the log, then the COMMIT point
is inserted in the log which means that the transaction has been completed now and its
effects would be permanent.
In the case of failures, the transactions for which COMMIT operation was not added,
those transactions cannot be undone and are rolled back and the transactions for which
COMMIT operation was added, those transactions are redone.
If the log is maintained in main memory buffer and is containing transaction records, then
before adding COMMIT, the transaction log is updated on the disk and then the
COMMIT operation is written. This is known as Force Writing the log before committing
a transaction.
Distributed Systems
• A distributed system consists of multiple computers and software components that
communicate through a computer network (a local network or by a wide area network).
• A distributed system can consist of any number of possible configurations,
– such as mainframes, workstations, personal computers, and so on.
• The computers interact with each other and share the resources of the system to achieve a
common goal.
• Horizontal scaling .
7. What is NoSQL ?
NoSQL is a non-relational database management systems, different from traditional relational
database management systems in some significant ways. It is designed for distributed data stores
where very large scale of data storing needs (for example Google or Facebook which collects
terabits of data every day for their users). These type of data storing may not require fixed
schema, avoid join operations and typically scale horizontally. In today’s time data is becoming
easier to access and capture through third parties such as Facebook, Google+ and others.
Personal user information, social graphs, geo location data, user-generated content and machine
logging data are just a few examples where the data has been increasing exponentially. To avail
the above service properly, it is required to process huge amount of data. Which SQL databases
were never designed. The evolution of NoSql databases is to handle these huge data properly.
Example :
Social-network graph :
Each record: UserID1, UserID2
Separate records: UserID, first_name,last_name, age, gender,...
Task: Find all friends of friends of friends of ... friends of a given user.
Wikipedia pages :
Large collection of documents
Combination of structured and unstructured data
Task: Retrieve all pages regarding athletics of Summer Olympic before 1950.
8. Discuss the advantages of NoSql versus RDBMS
- Structured and organized data
- Structured query language (SQL)
- Data and its relationships are stored in separate tables.
- Data Manipulation Language, Data Definition Language
- Tight Consistency
NoSQL
- Stands for Not Only SQL
- No declarative query language
- No predefined schema
- Key-Value pair storage, Column Store, Document Store, Graph databases
- Eventual consistency rather ACID property
- Unstructured and unpredictable data
- CAP Theorem
- Prioritizes high performance, high availability and scalability
- BASE Transaction
Brief history of NoSQL
The term NoSQL was coined by Carlo Strozzi in the year 1998. He used this term to name his
Open Source, Light Weight, DataBase which did not have an SQL interface.
In the early 2009, when last.fm wanted to organize an event on open-source distributed
databases, Eric Evans, a Rackspace employee, reused the term to refer databases which are non-
relational, distributed, and does not conform to atomicity, consistency, isolation, durability - four
obvious features of traditional relational database systems.
In the same year, the "no:sql(east)" conference held in Atlanta, USA, NoSQL was discussed and
debated a lot.
And then, discussion and practice of NoSQL got a momentum, and NoSQL saw an
unprecedented growth.
Consistency - This means that the data in the database remains consistent after the execution of
an operation. For example after an update operation all clients see the same data.
Availability - This means that the system is always on (service guarantee availability), no
downtime.
Partition Tolerance - This means that the system continues to function even the communication
among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that
cannot communicate with one another.
In theoretically it is impossible to fulfill all 3 requirements. CAP provides the basic requirements
for a distributed system to follow 2 of the 3 requirements. Therefore all the current NoSQL
database follow the different combinations of the C, A, P from the CAP theorem. Here is the
brief description of three combinations CA, CP, AP :
CA - Single site cluster, therefore all nodes are always in contact. When a partition occurs, the
system blocks.
CP - Some data may not be accessible, but the rest is still consistent/accurate.
AP - System is still available under partitioning, but some of the data returned may be inaccurate.
NoSQL pros/cons
Advantages :
• High scalability
• Distributed Computing
• Lower cost
• Schema flexibility, semi-structure data
• No complicated Relationships
Disadvantages
• No standardization
• Limited query capabilities (so far)
• Eventual consistent is not intuitive to program for
The BASE
he BASE acronym was defined by Eric Brewer, who is also known for formulating the CAP
theorem.
The CAP theorem states that a distributed computer system cannot guarantee all of the following
three properties at the same time:
• Consistency
• Availability
• Partition tolerance
A BASE system gives up on consistency.
• Basically Available indicates that the system does guarantee availability, in terms of the
CAP theorem.
• Soft state indicates that the state of the system may change over time, even without input.
This is because of the eventual consistency model.
• Eventual consistency indicates that the system will become consistent over time, given
that the system doesn't receive input during that time.
ACID vs BASE
ACID BASE
Atomic Basically Available
Consistency Soft state
Isolation Eventual consistency
Durable
BigTable, Cassandra, SimpleDB
• Key-value stores
• Column-oriented
• Graph
• Document oriented
Key-value stores
• Key-value stores are most basic types of NoSQL databases.
• Designed to handle huge amounts of data.
• Based on Amazon’s Dynamo paper.
• Key value stores allow developer to store schema-less data.
• In the key-value storage, database stores data as hash table where each key is unique and
the value can be string, JSON, BLOB (basic large object) etc.
• A key may be strings, hashes, lists, sets, sorted sets and values are stored against these
keys.
• For example a key-value pair might consist of a key like "Name" that is associated with a
value like "Robin".
• Key-Value stores can be used as collections, dictionaries, associative arrays etc.
• Key-Value stores follows the 'Availability' and 'Partition' aspects of CAP theorem.
• Key-Values stores would work well for shopping cart contents, or individual values like
color schemes, a landing page URI, or a default account number.
Example of Key-value store DataBase : Redis, Dynamo, Riak. etc.
Pictorial Presentation :
Column-oriented databases
• Column-oriented databases primarily work on columns and every column is treated
individually.
• Values of a single column are stored contiguously.
• Column stores data in column specific files.
• In Column stores, query processors work on columns too.
• All data within each column datafile have the same type which makes it ideal for
compression.
• Column stores can improve the performance of queries as it can access specific column
data.
• High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX).
• Works on data warehouses and business intelligence, customer relationship management
(CRM), Library card catalogs etc.
Example of Column-oriented databases : BigTable, Cassandra, SimpleDB etc.
Pictorial Presentation :
Graph databases
A graph data structure consists of a finite (and possibly mutable) set of ordered pairs, called
edges or arcs, of certain entities called nodes or vertices.
• Google
• Facebook
• Mozilla
• Adobe
• Foursquare
• LinkedIn
• Digg
• McGraw-Hill Education
• Vermont Public Radio