Week 1 Merged
Week 1 Merged
Week 1 Lecture 1
Class BSCCS2001
Materials https://drive.google.com/drive/folders/19FhdYYKeH3ZshWhoZIJlP_MC1nVnUUmU?usp=sharing
Module # 1
Type Lecture
Week # 1
🚨 DBMS: A database management system (or DBMS) is essentially nothing more than a computerized data-
keeping system. (via IBM)
Database Applications:
Banking: transactions
Week 1 Lecture 1 1
University Database Example
Application program examples
Add new students, instructors and courses
Assign grades to students, compute Grade Point Average (GPA) and generate transcripts
In early days, database applications were built directly on top of file systems
Data isolation
Integrity problems
Integrity constraints (eg: account balance > 0) become "buried" in program code rather than being stated explicity
Atomicity of updates
Failures may leave databases in an inconsistent state with partial updates carries out
Example: Transfer of funds from one account to another should either complete or not happen at all
Example: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at
the same time
Security problems
Course pre-requisites:
Set Theory
Definition of a set
Intensional definition
Extensional definition
Set-builder notation
Operations on sets:
De-Morgan's Law
Week 1 Lecture 1 2
Image, Pre-image, Inverse
Definition of functions
Composition of functions
Inverse of functions
Propositional Logic
Truth values and Truth tables
Predicate Logic
Predicates
Quantification
Existential
Universal
Python
Merge sort
Quick sort
Search
Linear search
Binary search
Interpolation search
Data Structures
Arrays
List
Balanced Tree
B - Tree
Hash table/map
Python
Cheatsheet: https://www.pythoncheatsheet.org
Week 1 Lecture 1 3
C Language: https://www.youtube.com/watch?
v=zYierUhIFNQ&list=PLhQjrBD2T382_R182iC2gNZI9HzWFMC_8&index=2 (part of CS50 2020 Lectures)
Week 1 Lecture 1 4
📚
Week 1 Lecture 2
Class BSCCS2001
Materials
Module # 2
Type Lecture
Week # 1
Why DBMS?
Data Management
Storage
Retrieval
Transaction
Audit
Archival
For
Individuals
Global
1. Physical:
Physical Data or Records Management, more formally known as Book Keeping, has been using physical ledgers
and journals for centuries
The most significant development happened when Henry Brown patented a "receptacle for storing and preserving
papers" on November 2, 1886
Herman Hollerith adapted the punch cards used for weaving looms to act as the memory for a mechanical tabulating
machine in 1890
Week 1 Lecture 2 1
2. Electronic:
Electronic Data or Records management moves with the advances in technology, especially of memory, storage,
computing and networking
1960s: Data Management with punch cards / tapes and magnetic tapes
1970s:
On October 14, 1979, Apple II platform shipped VisiCalc, marking the birth of spreadsheets
2000s: e-Commerce boomed, NoSQL was introduced for unstructured data management
Durability
Scalability
Security
Retrieval
Ease of Use
Consistency
Efficiency
Cost
Book Keeping
A book register was maintained on which the shop owner wrote the amount received from customers, the amount due for
any customer, inventory details and so on ...
Durability: Physical damage to these registers is a possibility due to rodents, humidity, wear and tear
Scalability: Very difficult to maintain over the years, some shops have numerous registers spanning over the years
Not only small shops but large orgs also used to maintain their transactions in book registers
Durability: These are computer applications and hence data is less prone to physical damage
Scalability: Easier to search, insert and modify records as compared to book ledgers
Easy to Use: Computer applications are used to search and manipulate records in the spreadsheets leading to
reduction in manpower needed to perform routing computations
Week 1 Lecture 2 2
Consistency: Not guaranteed but spreadsheets are less prone to mistakes registers
With rapid scale up of data, there has been considerable increase in the time required to perform most operations
A typical spreadsheet file may have an upper limit on the number of rows
The above mentioned limitations of filesystems paved the way for a comprehensive platform dedicated to management of
data - the Database Management System
1980s
Research relational prototypes evolve into commercial systems - SQL becomes industrial standard
1990s
Early 2000s
Later 2000s
Giant data storage systems - Google BigTable, Yahoo PNuts, Amazon, ...
Week 1 Lecture 2 3
📚
Week 1 Lecture 3
Class BSCCS2001
Materials
Module # 3
Type Lecture
Week # 1
If the account balance is not enough, it will now allow the fund transfer
If the account numbers are not correct, it will flash a message and terminate the transaction
We will use this banking transaction system to compare various features of a file-based (.csv file) implementation viz-a-viz a
DBMS-based implementation
Source: https://github.com/bhaskariitm/transition-from-files-to-db
Initiating a transaction
Python
Week 1 Lecture 3 1
def begin_Transaction(credit_account, debit_account, amount):
temp = []
success = 0
SQL
Transaction
Python
try:
for sRec in f_reader1:
# CONDITION CHECK FOR ENOUGH BALANCE
if sRec['AcctNo'] == debitAcc and int(sRec['Balance']) > int(amt):
for rRec in f_reader2:
if rRec['AcctNo'] == creditAcc:
sRec['Balance'] = str(int(sRec['Balance']) - int(amt)) # DEBIT
temp.append(sRec)
# CRITICAL POINT
f_writer.writerow({
'Acct1':sRec['AcctNo'],
'Acct2':rRec['AcctNo'],
'Amount':amt,
'D/C':'D'
})
rRec['Balance'] = str(int(rRec['Balance']) + int(amt)) # CREDIT
temp.append(rRec)
f_writer.writerow({'Account1': r_record['Account_no'], 'Account2': s_record['Account_no'], 'Amount': amount,'D/C': 'C'})
success = success + 1
break
f_obj_Account1.seek(0)
next(f_obj_Account1)
for record in f_reader1:
if record['Account_no'] != temp[0]['Account_no'] and record['Account_no'] != temp[1]['Account_no']:
temp.append(record)
except:
print('\nWrong input entered !!!')
SQL
do $$
begin
amt = 5000
sendVal = '1800090';
recVal = '1800100';
select balance from accounts
into sbalance
where account_no = sendVal;
if sbalance < amt then
raise notice "Insufficient balance";
else
update accounts
set balance = balance - amt
where account_no = sendVal;
insert into ledger(sendAc, recAc, amnt, ttype)
values(sendVal, recVal, amt, 'D')
update accounts
set balance = balance + amt
where account_no = recVal;
insert into ledger(sendAc, recAc, amnt, ttype)
values(sendVal, recVal, amt, 'C')
commit;
raise notice "Successful";
end if;
end; $$
Week 1 Lecture 3 2
Closing a transaction
Python
f_obj_Account1.close()
f_obj_Account2.close()
f_obj_Ledger.close()
if success == 1:
f_obj_Account = open('Accounts.csv', 'w+', newline='')
f_writer = csv.DictWriter(f_obj_Account, fieldnames=col_name_Account)
f_writer.writeheader()
for data in temp:
f_writer.writerow(data)
f_obj_Account.close()
print("\nTransaction is successfull !!")
else:
print('\nTransaction failed : Confirm Account details')
SQL
Comparison
Scalability with
Very difficult to handle insert, update and querying of In-built features to provide high scalability for a large
respect to amount of
records number of records
data
Scalability with
Extremely difficult to change the structure of records Adding or removing attributes can be done seamlessly
respect to changes in
as in the case of adding or removing attributes using simple SQL queries
structure
Time of execution in seconds in milliseconds
Data processed using temporary data structures Data persistence is ensured via automatic, system
Persistence
have to be manually updated to the file induced mechanisms
Ensuring robustness of data has to be done Backup, recovery and restore need minimum manual
Robustness
manually intervention
Difficult to implement in Python (Security at OS
Security User-specific access at database level
level)
Most file access operations involve extensive coding Standard and simple built-in queries reduce the effort
Programmer's
to ensure persistence, robustness and security of involved in coding thereby increasing a programmer's
productivity
data throughput
Arithmetic operations Easy to do arithmetic computations Limited set of arithmetic operations are available
Parameterized Comparison
Scalability
File Handling in Python
Number of records: As the # of records increases, the efficiency of flat files reduces:
Structural Change: To add an attribute, initializing the new attribute of each record with a default value has to be done
by program. It is very difficult to detect and maintain relationships between entities if and when an attribute has to be
removed
DBMS
Number of records: Databases are built to efficiently scale up when the # of records increase drastically.
Week 1 Lecture 3 3
Structural Changes: During adding an attribute, a default value can be defined that holds for all existing records - the
new attribute gets initialized with default value. During deletion, constraints are used either not to allow the removal on
ensure its safe removal
However, in the number of records is really large, then the time required in the initialization process of a database will
be negligible as compared to that of using SQL queries
In order to process a 1GB file, a program in Python would typically take a few seconds
DBMS
The effort to install and configure a DB in a DB server in expensive and time consuming
In order to process a 1GB file, an SQL query would typically take a few milliseconds
Programmer's Productivity
File Handling in Python
Building a file handler: Since the constraints within and across entities have to be enforced manually, the effort
involved in building a file handling application is huge
Maintenance: To maintain the consistency of data, one must regularly check for sanity of data and the relationships
between entities during inserts, updates and deletes
Handling huge data: As the data grows beyond the capacity of the file handler, more efforts are needed
DBMS
Configuring the database: The installation and configuration of a database is a specialized job of a DBA. A
programmer, on the other hand, is saved the trouble
Maintenance: DBMS has built-in mechanisms to ensure consistency and sanity of data being inserted, updated or
deleted. The programmer does not need to do such checks
Handling huge data: DBMS can handle even terabytes of data - Programmer does not have to worry
Arithmetic Operations
File Handling in Python
Extensive support for arithmetic and logical operations on data using Python. These include complex numerical
calculations and recursive computations
DBMS
SQL provides limited support for arithmetic and logical operations. Any complex computation has to be done outside of
SQL
File systems are cheaper to install and use. No specialized hardware, software or personnel are required to maintain
filesystems
DBMS
Large databases are served by dedicated database servers which need large storage and processing power
DBMSs are expensive software that have to be installed and regularly updated
Databases are inherently complex and need specialized people to work on it - like DBA (Database System
Administrator)
The above factors lead to huge costs in implementing and maintaining database management systems
Week 1 Lecture 3 4
📚
Week 1 Lecture 4
Class BSCCS2001
Materials
Module # 4
Type Lecture
Week # 1
Introduction to DBMS
Levels of Abstraction
Physical Level: describes how a record (eg: instructor) is stored
Logical Level: describes data stored in a database and the relationships among the data fields
Views can also hide information (such as employee's salary) for security purposes
Week 1 Lecture 4 1
Schema and Instances
TLDR: Schema is the way in which data is organized and Instance is the actual value of the data
Schema
Example: The database consists of information about a set of customers and accounts in a bank and the
relationship between them
Customer Schema
Account Schema
Instance
Customer Instance
Account Instance
Week 1 Lecture 4 2
Account # Account Type Interest Rate Min. Bal. Balance
Physical Data Independence - the ability to modify the physical schema without changing the logical schema
In general, the interfaces between various levels and components should be well defined so that changes in some
parts do not seriously influence others.
Data Models
A collection of tools that describe the following ...
Data
Data relationships
Data semantics
Data constraints
Network model
Hierarchical model
XML format
Relational Model
All the data is stored in various tables
Example
Data dictionary contains metadata (that is, data about the data)
Database schema
Week 1 Lecture 4 3
Integrity constraints
Authorization
Pure - used for proving properties about computational power and for optimization
Cannot be used to solve all problems that a C program, for example, can solve
To be able to compute complex complex functions, SQL is usually embedded in some higher-level language
Application Programming Interfaces or APIs (eg: ODBC / JDBC) which allow SQL queries to be sent to the
databases
Database Design
The process of designing the general structure of the database:
Logical Design - Deciding on the database schema. Database design requires that we find a good collection of
relation schema
Business decision
What relation schemas should we have and how should the attributes be distributed among the various
relation schemas?
Week 1 Lecture 4 4
📚
Week 1 Lecture 5
Class BSCCS2001
Materials
Module # 5
Type Lecture
Week # 1
Extend the relational data model by including object orientation and constructs to deal with added data types
Allow attributes of tuples to have complex types, including non-atomic values such as nested relations
Preserve relational foundations, in particular the declarative access to data, while extending modeling power
Week 1 Lecture 5 1
XML: eXtensible Markup Language
Defined by the WWW Consortium (W3C)
The ability to specify new tags and to create tag structures made XML a great way to exchange data, not just
documents
XML has become the basis for all new generation data interchange formats
A wide variety of tools are available for parsing, browsing and querying XML documents
Database Engine
3 major components are:
Storage Manager
Query processing
Transaction Manager
Storage Management
Storage Manager is a program module that provides the interface between the low-level data stored in the database and
the application programs and queries submitted to the system
Issues:
Storage access
File organization
Query Processing
Parsing and Translation
Optimization
Evaluation
Equivalent expressions
Cost difference between a good and a bad way of evaluating a query can be enormous
Depends critically on statistical information about relations which the database must maintain
Need to estimate statistics for intermediate results to compute cost of complex expressions
Transaction Management
What is the system fails?
What if more than one user is concurrently updating the same file?
A transaction is a collection of operations that perform single logical function in a database application
Transaction-Management component ensure that the database remains in a consistent (correct) state despite
system failures (eg: power failures and operating system crashes) and transaction failures
Week 1 Lecture 5 2
Concurrency-control manager controls the interaction among the concurrent transactions to ensure consistency of
the database
Database Architecture
The architecture of a database system is greatly influenced by the underlying computer system on which the database is
running:
Centralized
Client-Server
Parallel (multi-processor)
Distributed
Cloud
Week 1 Lecture 5 3