SlideShare a Scribd company logo
BY:SURBHI SAROHA
 File Organization & Data warehousing
 File & Record Concept
 Fixed and variable sized Records
 Types of Single level Index
 Multilevel Indexes
 Dynamic Multilevel Indexes using B trees
 Data warehousing:Introduction
 Basic concepts
 Data warehouse architecture
 Various models
 Basic operations
 File Organization refers to the logical relationships among various records that
constitute the file, particularly with respect to the means of identification and access
to any specific record.
 In simple terms, Storing the files in a certain order is called File Organization.
 File Structure refers to the format of the label and data blocks and of any logical
control record.
 The Objective of File Organization
 It helps in the faster selection of records i.e. it makes the process faster.
 Different Operations like inserting, deleting, and update on different records are
faster and easier.
 It prevents us from inserting duplicate records via various operations.
 It helps in storing the records or the data very efficiently at a minimal cost
 Various methods have been introduced to Organize files. These particular methods
have advantages and disadvantages on the basis of access or selection. Thus it is all
upon the programmer to decide the best-suited file Organization method according to
his requirements.
 Some types of File Organizations are :
 Sequential File Organization
 Heap File Organization
 Hash File Organization
 B+ Tree File Organization
 Clustered File Organization
 ISAM (Indexed Sequential Access Method)
 A Database Management System (DBMS) stores data in the form of tables, uses
ER model and the goal is ACID properties. For example, a DBMS of college has
tables for students, faculty, etc.
 A Data Warehouse is separate from DBMS, it stores a huge amount of data, which
is typically collected from multiple heterogeneous sources like files, DBMS, etc.
 The goal is to produce statistical results that may help in decision makings. For
example, a college might want to see quick different results, like how the
placement of CS students has improved over the last 10 years, in terms of
salaries, counts, etc.
 An ordinary Database can store MBs to GBs of data and that too for a specific
purpose.
 For storing data of TB size, the storage shifted to Data Warehouse. Besides this, a
transactional database doesn’t offer itself to analytics.
 To effectively perform analytics, an organization keeps a central Data Warehouse
to closely study its business by organizing, understanding, and using its historic
data for taking strategic decisions and analyzing trends.
 Better business analytics: Data warehouse plays an important role in every
business to store and analysis of all the past data and records of the company.
which can further increase the understanding or analysis of data to the company.
 Faster Queries: Data warehouse is designed to handle large queries that’s why it
runs queries faster than the database.
 Improved data Quality: In the data warehouse the data you gathered from
different sources is being stored and analyzed it does not interfere with or add
data by itself so your quality of data is maintained and if you get any issue
regarding data quality then the data warehouse team will solve this.
 Historical Insight: The warehouse stores all your historical data which contains
details about the business so that one can analyze it at any time and extract
insights from it
 The File is a collection of records. Using the primary key, we can access the
records. The type and frequency of access can be determined by the type of file
organization which was used for a given set of records.
 File organization is a logical relationship among various records. This method
defines how file records are mapped onto disk blocks.
 File organization is used to describe the way in which the records are stored in
terms of blocks, and the blocks are placed on the storage medium.
 The first approach to map the database to the file is to use the several files and
store only one fixed length record in any given file. An alternative approach is to
structure our files so that we can contain multiple lengths for records.
 Files of fixed length records are easier to implement than the files of variable
length records.
 It contains an optimal selection of records, i.e., records can be selected as fast as
possible.
 To perform insert, delete or update transaction on the records should be quick and
easy.
 The duplicate records cannot be induced as a result of insert, update or delete.
 For the minimal cost of storage, records should be stored efficiently.
 File organization contains various methods. These particular methods have pros
and cons on the basis of access or selection. In the file organization, the
programmer decides the best-suited file organization method according to his
requirement.
 Sequential file organization
 Heap file organization
 Hash file organization
 B+ file organization
 Indexed sequential access method (ISAM)
 Cluster file organization
 In relational databases, a record is a group of related data held within the same
structure. More specifically, a record is a grouping of fields within a table that
reference one particular object. The term record is frequently used synonymously
with row.
 For example, a customer record may include items, such as first name, physical
address, email address, date of birth and gender.
 A record is also known as a tuple.
 Fixed-length records means setting a length and storing the records into the file.
If the record size exceeds the fixed size, it gets divided into more than one block.
 Due to the fixed size there occurs following two problems:
 Partially storing subparts of the record in more than one block requires access to
all the blocks containing the subparts to read or write in it.
 It is difficult to delete a record in such a file organization. It is because if the size
of the existing record is smaller than the block size, then another record or a part
fills up the block.
 Variable-length records are the records that vary in size. It requires the creation
of multiple blocks of multiple sizes to store them. These variable-length records
are kept in the following ways in the database system:
 Storage of multiple record types in a file.
 It is kept as Record types that enable repeating fields like multisets or arrays.
 It is kept as Record types that enable variable lengths either for one field or more.
 In variable-length records, there exist the following two problems:
 Defining the way of representing a single record so as to extract the individual
attributes easily.
 Defining the way of storing variable-length records within a block so as to extract
that record in a block easily.
 Indexing is used to optimize the performance of a database by minimizing the
number of disk accesses required when a query is processed.
 The index is a type of data structure. It is used to locate and access the data in a
database table quickly.
 Index structure:
 Indexes can be created using some database columns.
 The first column of the database is the search key that contains a copy of the
primary key or candidate key of the table. The values of the primary key are
stored in sorted order so that the corresponding data can be accessed easily.
 The second column of the database is the data reference. It contains a set of
pointers holding the address of the disk block where the value of the particular
key can be found.
DBMS (UNIT 5)
 Ordered indices
 The indices are usually sorted to make searching faster. The indices which are
sorted are known as ordered indices.
 Primary Index
 If the index is created on the basis of the primary key of the table, then it is
known as primary indexing. These primary keys are unique to each record and
contain 1:1 relation between the records.
 As primary keys are stored in sorted order, the performance of the searching
operation is quite efficient.
 The primary index can be classified into two types: Dense index and Sparse index.
 Dense index
 The dense index contains an index record for every search key value in the data file. It
makes searching faster.
 In this, the number of records in the index table is same as the number of records in
the main table.
 It needs more space to store index record itself. The index records have the search key
and a pointer to the actual record on the disk.
 Sparse index
 In the data file, index record appears only for a few items. Each item points to a block.
 In this, instead of pointing to each record in the main table, the index points to the
records in the main table in a gap.
 Clustering Index
 A clustered index can be defined as an ordered data file. Sometimes the index is
created on non-primary key columns which may not be unique for each record.
 In this case, to identify the record faster, we will group two or more columns to get
the unique value and create index out of them. This method is called a clustering
index.
 The records which have similar characteristics are grouped, and indexes are
created for these group.
 Secondary Index
 In the sparse indexing, as the size of the table grows, the size of mapping also
grows. These mappings are usually kept in the primary memory so that address
fetch should be faster. Then the secondary memory searches the actual data based
on the address got from mapping. If the mapping size grows then fetching the
address itself becomes slower. In this case, the sparse index will not be efficient.
To overcome this problem, secondary indexing is introduced.
 In secondary indexing, to reduce the size of mapping, another level of indexing is
introduced. In this method, the huge range for the columns is selected initially so
that the mapping size of the first level becomes small. Then each range is further
divided into smaller ranges. The mapping of the first level is stored in the primary
memory, so that address fetch is faster. The mapping of the second level and
actual data are stored in the secondary memory (hard disk).
 With the growth of the size of the database, indices also grow. As the index is
stored in the main memory, a single-level index might become too large a size to
store with multiple disk accesses.
 The multilevel indexing segregates the main block into various smaller blocks so
that the same can be stored in a single block. The outer blocks are divided into
inner blocks which in turn are pointed to the data blocks. This can be easily stored
in the main memory with fewer overheads.
 In Relational Database Management Systems (RDBMS), indexes are essential
data structures that allow faster data retrieval by reducing the number of disk
accesses required to retrieve data. But, traditional indexes can become inefficient
as the database size grows. Multilevel indexes provide a solution to this problem
by dividing the index into smaller, manageable pieces.
 Indexing helps to optimize the performance of a database. It minimizes the
number of disk accesses required when a query is processed. It is a data structure
technique which is used to quickly locate and access the data in a database.
 There are two things used in indexing, these are : Search Key or Candidate key
and Data Reference or Pointer.
 B Tree is a self-balancing tree data structure.
 It stores and maintains data in a sorted form where the left children of the root
are smaller than the root and the right children are larger than the root in value.
 It makes searching efficient and allows all operations in logarithmic time. It
allows nodes with more than two children.
 B-tree is used for implementing multilevel indexing.
 Every node of the B-tree stores the key-value along with the data pointer pointing
to the block in the disk file containing that key.
 Every node has at most m children where m is the order of the B-Tree.
 A node with K children contains K-1 keys.
 Every non-leaf node except the root node must have at least ⌈m/2⌉ child nodes.
 The root must have at least 2 children if it is not the leaf node too.
 All leaves of a B-Tree stays at the same level.
 Unlike other trees, its height increases upwards towards the root, and insertion
happens at the leaf node.
 The time complexity of all the operations of a B-Tree is O(log n), here ‘n’ is the
number of elements in the B-Tree.
DBMS (UNIT 5)
DBMS (UNIT 5)
 A Data Warehouse is separate from DBMS, it stores a huge amount of data, which
is typically collected from multiple heterogeneous sources like files, DBMS, etc.
 The goal is to produce statistical results that may help in decision makings.
 For example, a college might want to see quick different results, like how the
placement of CS students has improved over the last 10 years, in terms of
salaries, counts, etc.
 An ordinary Database can store MBs to GBs of data and that too for a specific
purpose. For storing data of TB size, the storage shifted to Data Warehouse.
Besides this, a transactional database doesn’t offer itself to analytics. To
effectively perform analytics, an organization keeps a central Data Warehouse to
closely study its business by organizing, understanding, and using its historic
data for taking strategic decisions and analyzing trends.
 Better business analytics: Data warehouse plays an important role in every
business to store and analysis of all the past data and records of the company.
which can further increase the understanding or analysis of data to the company.
 Faster Queries: Data warehouse is designed to handle large queries that’s why it
runs queries faster than the database.
 Improved data Quality: In the data warehouse the data you gathered from
different sources is being stored and analyzed it does not interfere with or add
data by itself so your quality of data is maintained and if you get any issue
regarding data quality then the data warehouse team will solve this.
 Historical Insight: The warehouse stores all your historical data which contains
details about the business so that one can analyze it at any time and extract
insights from it
 A data-warehouse is a heterogeneous collection of different data sources organised
under a unified schema. There are 2 approaches for constructing data-warehouse:
Top-down approach and Bottom-up approach are explained as below.
 1. Top-down approach:
 External Sources –
External source is a source from where data is collected irrespective of the type of
data. Data can be structured, semi structured and unstructured as well.
 Stage Area –
Since the data, extracted from the external sources does not follow a particular
format, so there is a need to validate this data to load into datawarehouse. For
this purpose, it is recommended to use ETL tool.
 E(Extracted): Data is extracted from External data source.
 T(Transform): Data is transformed into the standard format.
 L(Load): Data is loaded into datawarehouse after transforming it into the standard
format.
 Data-warehouse –
After cleansing of data, it is stored in the datawarehouse as central repository. It
actually stores the meta data and the actual data gets stored in the data
marts. Note that datawarehouse stores the data in its purest form in this top-down
approach.
 Data Marts –
Data mart is also a part of storage component. It stores the information of a particular
function of an organisation which is handled by single authority. There can be as many
number of data marts in an organisation depending upon the functions. We can also
say that data mart contains subset of the data stored in datawarehouse.
 Data Mining –
The practice of analysing the big data present in datawarehouse is data mining. It is
used to find the hidden patterns that are present in the database or in datawarehouse
with the help of algorithm of data mining. This approach is defined by Inmon as –
datawarehouse as a central repository for the complete organisation and data marts
are created from it after the complete datawarehouse has been created.
DBMS (UNIT 5)
 First, the data is extracted from external sources (same as happens in top-down
approach).
 Then, the data go through the staging area (as explained above) and loaded into
data marts instead of datawarehouse. The data marts are created first and
provide reporting capability. It addresses a single business area.
 These data marts are then integrated into datawarehouse.
 Any data warehouse will consist of random data which will surely be in
unstructured manner with a lot of unwanted and dirty data.
 Dirty data refers to incomplete and noisy data containing errors.
 To make this data structured and noise free, dirty data needs to be removed. This
will help in converting data into useful information and can be achieved using
certain data warehouse operations.
 These operations are combination of ETL(Extraction, Transform,
Loading) operations along with data cleaning and data refresh operations.

DBMS (UNIT 5)
 Data Cleaning
 In data cleaning, inconsistencies are removed. Also, noisy data containing errors are also
rectified.
 For example : Cleaning of redundant(duplicate) data.
 Data Refresh
 In data refresh operation, data in data warehouse is refreshed by broadcasting the data from
multiple sources and updating it on timely basis. This is done because, data inside data bases
are updated every minute and to get this same data on data warehouse, the process of refreshing
is performed.
 Extraction of Data
 Data obtained after cleaning and refresh is still unstructured and unorganized. To make it
organised and enable user to extract and retrieve relevant data is done through data extraction
process. This is helpful, if any user wants to mine the data.

 Transformation of data
 Data obtained through heterogeneous data bases have native structure of their
respective databases that might be different from that structure of data
warehouse. So, transformation of data from heterogeneous database is done to
organize data in the structure similar to that of the data warehouse.
 Data Loading
 Data loading is responsible for loading the data to its respective target data
repository that might include data bases, data marts data warehouses etc.
DBMS (UNIT 5)
Ad

More Related Content

What's hot (20)

Input-Buffering
Input-BufferingInput-Buffering
Input-Buffering
Dattatray Gandhmal
 
Database fragmentation
Database fragmentationDatabase fragmentation
Database fragmentation
Punjab College Of Technical Education
 
Linking in MS-Dos System
Linking in MS-Dos SystemLinking in MS-Dos System
Linking in MS-Dos System
Satyamevjayte Haxor
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
Manish Singh
 
Query processing in Distributed Database System
Query processing in Distributed Database SystemQuery processing in Distributed Database System
Query processing in Distributed Database System
Meghaj Mallick
 
Ch 4 linker loader
Ch 4 linker loaderCh 4 linker loader
Ch 4 linker loader
Malek Sumaiya
 
Abstraction
AbstractionAbstraction
Abstraction
adil raja
 
compiler ppt on symbol table
 compiler ppt on symbol table compiler ppt on symbol table
compiler ppt on symbol table
nadarmispapaulraj
 
Introduction to data structures and Algorithm
Introduction to data structures and AlgorithmIntroduction to data structures and Algorithm
Introduction to data structures and Algorithm
Dhaval Kaneria
 
Macro-processor
Macro-processorMacro-processor
Macro-processor
Temesgen Molla
 
Loaders
LoadersLoaders
Loaders
Sathasivam Rangasamy
 
Java Beans
Java BeansJava Beans
Java Beans
Ankit Desai
 
Database language
Database languageDatabase language
Database language
University of Science and Technology Chitttagong
 
PROCEDURAL ORIENTED PROGRAMMING VS OBJECT ORIENTED PROGRAMING
PROCEDURAL ORIENTED PROGRAMMING VS OBJECT ORIENTED PROGRAMING PROCEDURAL ORIENTED PROGRAMMING VS OBJECT ORIENTED PROGRAMING
PROCEDURAL ORIENTED PROGRAMMING VS OBJECT ORIENTED PROGRAMING
Uttam Singh
 
Introduction to Compiler design
Introduction to Compiler design Introduction to Compiler design
Introduction to Compiler design
Dr. C.V. Suresh Babu
 
Recognition-of-tokens
Recognition-of-tokensRecognition-of-tokens
Recognition-of-tokens
Dattatray Gandhmal
 
Dbms architecture
Dbms architectureDbms architecture
Dbms architecture
Shubham Dwivedi
 
Java thread life cycle
Java thread life cycleJava thread life cycle
Java thread life cycle
Archana Gopinath
 
Compilers
CompilersCompilers
Compilers
Bense Tony
 
Optimization of basic blocks
Optimization of basic blocksOptimization of basic blocks
Optimization of basic blocks
ishwarya516
 

Similar to DBMS (UNIT 5) (20)

normalization process in relational data base management
normalization process in relational data base managementnormalization process in relational data base management
normalization process in relational data base management
ssuserf80a8c
 
Module03
Module03Module03
Module03
susir
 
File Organization, Indexing and Hashing.pptx
File Organization, Indexing and Hashing.pptxFile Organization, Indexing and Hashing.pptx
File Organization, Indexing and Hashing.pptx
niqqaanonymous211
 
3620121datastructures.ppt
3620121datastructures.ppt3620121datastructures.ppt
3620121datastructures.ppt
SheejamolMathew
 
lecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptxlecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptx
peter1097
 
file organization ppt on dbms types of f
file organization ppt on dbms types of ffile organization ppt on dbms types of f
file organization ppt on dbms types of f
ar1289589
 
Basic SQL for Bcom Business Analytics.pptx
Basic SQL for Bcom Business Analytics.pptxBasic SQL for Bcom Business Analytics.pptx
Basic SQL for Bcom Business Analytics.pptx
sjcdsdocs
 
File organisation in system analysis and design
File organisation in system analysis and designFile organisation in system analysis and design
File organisation in system analysis and design
Mohitgauri
 
File organisation
File organisationFile organisation
File organisation
Mukund Trivedi
 
fileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdffileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdf
FraolUmeta
 
Database and Research Matrix.pptx
Database and Research Matrix.pptxDatabase and Research Matrix.pptx
Database and Research Matrix.pptx
RahulRoshan37
 
DB LECTURE 4 INDEXINGS PPT NOTES.pptx
DB LECTURE 4 INDEXINGS    PPT NOTES.pptxDB LECTURE 4 INDEXINGS    PPT NOTES.pptx
DB LECTURE 4 INDEXINGS PPT NOTES.pptx
grahamoyigo19
 
Csci12 report aug18
Csci12 report aug18Csci12 report aug18
Csci12 report aug18
karenostil
 
overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam
pratikkadam78
 
Database management system session 6
Database management system session 6Database management system session 6
Database management system session 6
Infinity Tech Solutions
 
Files
FilesFiles
Files
Mukund Trivedi
 
Files
FilesFiles
Files
Mukund Trivedi
 
INDEXING METHODS USED IN DATABASE STORAGE
INDEXING METHODS USED IN DATABASE STORAGEINDEXING METHODS USED IN DATABASE STORAGE
INDEXING METHODS USED IN DATABASE STORAGE
polin38
 
StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...
StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...
StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...
masooda5
 
StorageIndexing_CS541.ppt indexes for dtata bae
StorageIndexing_CS541.ppt indexes for dtata baeStorageIndexing_CS541.ppt indexes for dtata bae
StorageIndexing_CS541.ppt indexes for dtata bae
syedalishahid6
 
normalization process in relational data base management
normalization process in relational data base managementnormalization process in relational data base management
normalization process in relational data base management
ssuserf80a8c
 
Module03
Module03Module03
Module03
susir
 
File Organization, Indexing and Hashing.pptx
File Organization, Indexing and Hashing.pptxFile Organization, Indexing and Hashing.pptx
File Organization, Indexing and Hashing.pptx
niqqaanonymous211
 
3620121datastructures.ppt
3620121datastructures.ppt3620121datastructures.ppt
3620121datastructures.ppt
SheejamolMathew
 
lecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptxlecture 2 notes indexing in application of database systems.pptx
lecture 2 notes indexing in application of database systems.pptx
peter1097
 
file organization ppt on dbms types of f
file organization ppt on dbms types of ffile organization ppt on dbms types of f
file organization ppt on dbms types of f
ar1289589
 
Basic SQL for Bcom Business Analytics.pptx
Basic SQL for Bcom Business Analytics.pptxBasic SQL for Bcom Business Analytics.pptx
Basic SQL for Bcom Business Analytics.pptx
sjcdsdocs
 
File organisation in system analysis and design
File organisation in system analysis and designFile organisation in system analysis and design
File organisation in system analysis and design
Mohitgauri
 
fileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdffileorganizationandintroductionofdbms-210313163900.pdf
fileorganizationandintroductionofdbms-210313163900.pdf
FraolUmeta
 
Database and Research Matrix.pptx
Database and Research Matrix.pptxDatabase and Research Matrix.pptx
Database and Research Matrix.pptx
RahulRoshan37
 
DB LECTURE 4 INDEXINGS PPT NOTES.pptx
DB LECTURE 4 INDEXINGS    PPT NOTES.pptxDB LECTURE 4 INDEXINGS    PPT NOTES.pptx
DB LECTURE 4 INDEXINGS PPT NOTES.pptx
grahamoyigo19
 
Csci12 report aug18
Csci12 report aug18Csci12 report aug18
Csci12 report aug18
karenostil
 
overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam overview of storage and indexing BY-Pratik kadam
overview of storage and indexing BY-Pratik kadam
pratikkadam78
 
INDEXING METHODS USED IN DATABASE STORAGE
INDEXING METHODS USED IN DATABASE STORAGEINDEXING METHODS USED IN DATABASE STORAGE
INDEXING METHODS USED IN DATABASE STORAGE
polin38
 
StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...
StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...
StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...
masooda5
 
StorageIndexing_CS541.ppt indexes for dtata bae
StorageIndexing_CS541.ppt indexes for dtata baeStorageIndexing_CS541.ppt indexes for dtata bae
StorageIndexing_CS541.ppt indexes for dtata bae
syedalishahid6
 
Ad

More from Dr. SURBHI SAROHA (20)

Deep learning(UNIT 3) BY Ms SURBHI SAROHA
Deep learning(UNIT 3) BY Ms SURBHI SAROHADeep learning(UNIT 3) BY Ms SURBHI SAROHA
Deep learning(UNIT 3) BY Ms SURBHI SAROHA
Dr. SURBHI SAROHA
 
MOBILE COMPUTING UNIT 2 by surbhi saroha
MOBILE COMPUTING UNIT 2 by surbhi sarohaMOBILE COMPUTING UNIT 2 by surbhi saroha
MOBILE COMPUTING UNIT 2 by surbhi saroha
Dr. SURBHI SAROHA
 
Mobile Computing UNIT 1 by surbhi saroha
Mobile Computing UNIT 1 by surbhi sarohaMobile Computing UNIT 1 by surbhi saroha
Mobile Computing UNIT 1 by surbhi saroha
Dr. SURBHI SAROHA
 
DEEP LEARNING (UNIT 2 ) by surbhi saroha
DEEP LEARNING (UNIT 2 ) by surbhi sarohaDEEP LEARNING (UNIT 2 ) by surbhi saroha
DEEP LEARNING (UNIT 2 ) by surbhi saroha
Dr. SURBHI SAROHA
 
Introduction to Deep Leaning(UNIT 1).pptx
Introduction to Deep Leaning(UNIT 1).pptxIntroduction to Deep Leaning(UNIT 1).pptx
Introduction to Deep Leaning(UNIT 1).pptx
Dr. SURBHI SAROHA
 
Cloud Computing (Infrastructure as a Service)UNIT 2
Cloud Computing (Infrastructure as a Service)UNIT 2Cloud Computing (Infrastructure as a Service)UNIT 2
Cloud Computing (Infrastructure as a Service)UNIT 2
Dr. SURBHI SAROHA
 
Management Information System(Unit 2).pptx
Management Information System(Unit 2).pptxManagement Information System(Unit 2).pptx
Management Information System(Unit 2).pptx
Dr. SURBHI SAROHA
 
Searching in Data Structure(Linear search and Binary search)
Searching in Data Structure(Linear search and Binary search)Searching in Data Structure(Linear search and Binary search)
Searching in Data Structure(Linear search and Binary search)
Dr. SURBHI SAROHA
 
Management Information System(UNIT 1).pptx
Management Information System(UNIT 1).pptxManagement Information System(UNIT 1).pptx
Management Information System(UNIT 1).pptx
Dr. SURBHI SAROHA
 
Introduction to Cloud Computing(UNIT 1).pptx
Introduction to Cloud Computing(UNIT 1).pptxIntroduction to Cloud Computing(UNIT 1).pptx
Introduction to Cloud Computing(UNIT 1).pptx
Dr. SURBHI SAROHA
 
JAVA (UNIT 5)
JAVA (UNIT 5)JAVA (UNIT 5)
JAVA (UNIT 5)
Dr. SURBHI SAROHA
 
DBMS UNIT 4
DBMS UNIT 4DBMS UNIT 4
DBMS UNIT 4
Dr. SURBHI SAROHA
 
JAVA(UNIT 4)
JAVA(UNIT 4)JAVA(UNIT 4)
JAVA(UNIT 4)
Dr. SURBHI SAROHA
 
OOPs & C++(UNIT 5)
OOPs & C++(UNIT 5)OOPs & C++(UNIT 5)
OOPs & C++(UNIT 5)
Dr. SURBHI SAROHA
 
OOPS & C++(UNIT 4)
OOPS & C++(UNIT 4)OOPS & C++(UNIT 4)
OOPS & C++(UNIT 4)
Dr. SURBHI SAROHA
 
DBMS UNIT 3
DBMS UNIT 3DBMS UNIT 3
DBMS UNIT 3
Dr. SURBHI SAROHA
 
JAVA (UNIT 3)
JAVA (UNIT 3)JAVA (UNIT 3)
JAVA (UNIT 3)
Dr. SURBHI SAROHA
 
Keys in dbms(UNIT 2)
Keys in dbms(UNIT 2)Keys in dbms(UNIT 2)
Keys in dbms(UNIT 2)
Dr. SURBHI SAROHA
 
DBMS (UNIT 2)
DBMS (UNIT 2)DBMS (UNIT 2)
DBMS (UNIT 2)
Dr. SURBHI SAROHA
 
JAVA UNIT 2
JAVA UNIT 2JAVA UNIT 2
JAVA UNIT 2
Dr. SURBHI SAROHA
 
Deep learning(UNIT 3) BY Ms SURBHI SAROHA
Deep learning(UNIT 3) BY Ms SURBHI SAROHADeep learning(UNIT 3) BY Ms SURBHI SAROHA
Deep learning(UNIT 3) BY Ms SURBHI SAROHA
Dr. SURBHI SAROHA
 
MOBILE COMPUTING UNIT 2 by surbhi saroha
MOBILE COMPUTING UNIT 2 by surbhi sarohaMOBILE COMPUTING UNIT 2 by surbhi saroha
MOBILE COMPUTING UNIT 2 by surbhi saroha
Dr. SURBHI SAROHA
 
Mobile Computing UNIT 1 by surbhi saroha
Mobile Computing UNIT 1 by surbhi sarohaMobile Computing UNIT 1 by surbhi saroha
Mobile Computing UNIT 1 by surbhi saroha
Dr. SURBHI SAROHA
 
DEEP LEARNING (UNIT 2 ) by surbhi saroha
DEEP LEARNING (UNIT 2 ) by surbhi sarohaDEEP LEARNING (UNIT 2 ) by surbhi saroha
DEEP LEARNING (UNIT 2 ) by surbhi saroha
Dr. SURBHI SAROHA
 
Introduction to Deep Leaning(UNIT 1).pptx
Introduction to Deep Leaning(UNIT 1).pptxIntroduction to Deep Leaning(UNIT 1).pptx
Introduction to Deep Leaning(UNIT 1).pptx
Dr. SURBHI SAROHA
 
Cloud Computing (Infrastructure as a Service)UNIT 2
Cloud Computing (Infrastructure as a Service)UNIT 2Cloud Computing (Infrastructure as a Service)UNIT 2
Cloud Computing (Infrastructure as a Service)UNIT 2
Dr. SURBHI SAROHA
 
Management Information System(Unit 2).pptx
Management Information System(Unit 2).pptxManagement Information System(Unit 2).pptx
Management Information System(Unit 2).pptx
Dr. SURBHI SAROHA
 
Searching in Data Structure(Linear search and Binary search)
Searching in Data Structure(Linear search and Binary search)Searching in Data Structure(Linear search and Binary search)
Searching in Data Structure(Linear search and Binary search)
Dr. SURBHI SAROHA
 
Management Information System(UNIT 1).pptx
Management Information System(UNIT 1).pptxManagement Information System(UNIT 1).pptx
Management Information System(UNIT 1).pptx
Dr. SURBHI SAROHA
 
Introduction to Cloud Computing(UNIT 1).pptx
Introduction to Cloud Computing(UNIT 1).pptxIntroduction to Cloud Computing(UNIT 1).pptx
Introduction to Cloud Computing(UNIT 1).pptx
Dr. SURBHI SAROHA
 
Ad

Recently uploaded (20)

K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
Envenomation---Clinical Toxicology. pptx
Envenomation---Clinical Toxicology. pptxEnvenomation---Clinical Toxicology. pptx
Envenomation---Clinical Toxicology. pptx
rekhapositivity
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
Studying Drama: Definition, types and elements
Studying Drama: Definition, types and elementsStudying Drama: Definition, types and elements
Studying Drama: Definition, types and elements
AbdelFattahAdel2
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-26-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-26-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-26-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-26-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
Timber Pitch Roof Construction Measurement-2024.pptx
Timber Pitch Roof Construction Measurement-2024.pptxTimber Pitch Roof Construction Measurement-2024.pptx
Timber Pitch Roof Construction Measurement-2024.pptx
Tantish QS, UTM
 
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdfExploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Sandeep Swamy
 
Vitamins Chapter-7, Biochemistry and clinical pathology, D.Pharm 2nd year
Vitamins Chapter-7, Biochemistry and clinical pathology, D.Pharm 2nd yearVitamins Chapter-7, Biochemistry and clinical pathology, D.Pharm 2nd year
Vitamins Chapter-7, Biochemistry and clinical pathology, D.Pharm 2nd year
ARUN KUMAR
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
Unit 4: Long term- Capital budgeting and its types
Unit 4: Long term- Capital budgeting and its typesUnit 4: Long term- Capital budgeting and its types
Unit 4: Long term- Capital budgeting and its types
bharath321164
 
Diabetic neuropathy peripheral autonomic
Diabetic neuropathy peripheral autonomicDiabetic neuropathy peripheral autonomic
Diabetic neuropathy peripheral autonomic
Pankaj Patawari
 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
Envenomation---Clinical Toxicology. pptx
Envenomation---Clinical Toxicology. pptxEnvenomation---Clinical Toxicology. pptx
Envenomation---Clinical Toxicology. pptx
rekhapositivity
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
Studying Drama: Definition, types and elements
Studying Drama: Definition, types and elementsStudying Drama: Definition, types and elements
Studying Drama: Definition, types and elements
AbdelFattahAdel2
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
Timber Pitch Roof Construction Measurement-2024.pptx
Timber Pitch Roof Construction Measurement-2024.pptxTimber Pitch Roof Construction Measurement-2024.pptx
Timber Pitch Roof Construction Measurement-2024.pptx
Tantish QS, UTM
 
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdfExploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Sandeep Swamy
 
Vitamins Chapter-7, Biochemistry and clinical pathology, D.Pharm 2nd year
Vitamins Chapter-7, Biochemistry and clinical pathology, D.Pharm 2nd yearVitamins Chapter-7, Biochemistry and clinical pathology, D.Pharm 2nd year
Vitamins Chapter-7, Biochemistry and clinical pathology, D.Pharm 2nd year
ARUN KUMAR
 
How to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of saleHow to manage Multiple Warehouses for multiple floors in odoo point of sale
How to manage Multiple Warehouses for multiple floors in odoo point of sale
Celine George
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
Unit 4: Long term- Capital budgeting and its types
Unit 4: Long term- Capital budgeting and its typesUnit 4: Long term- Capital budgeting and its types
Unit 4: Long term- Capital budgeting and its types
bharath321164
 
Diabetic neuropathy peripheral autonomic
Diabetic neuropathy peripheral autonomicDiabetic neuropathy peripheral autonomic
Diabetic neuropathy peripheral autonomic
Pankaj Patawari
 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
 

DBMS (UNIT 5)

  • 2.  File Organization & Data warehousing  File & Record Concept  Fixed and variable sized Records  Types of Single level Index  Multilevel Indexes  Dynamic Multilevel Indexes using B trees  Data warehousing:Introduction  Basic concepts  Data warehouse architecture  Various models  Basic operations
  • 3.  File Organization refers to the logical relationships among various records that constitute the file, particularly with respect to the means of identification and access to any specific record.  In simple terms, Storing the files in a certain order is called File Organization.  File Structure refers to the format of the label and data blocks and of any logical control record.  The Objective of File Organization  It helps in the faster selection of records i.e. it makes the process faster.  Different Operations like inserting, deleting, and update on different records are faster and easier.  It prevents us from inserting duplicate records via various operations.  It helps in storing the records or the data very efficiently at a minimal cost
  • 4.  Various methods have been introduced to Organize files. These particular methods have advantages and disadvantages on the basis of access or selection. Thus it is all upon the programmer to decide the best-suited file Organization method according to his requirements.  Some types of File Organizations are :  Sequential File Organization  Heap File Organization  Hash File Organization  B+ Tree File Organization  Clustered File Organization  ISAM (Indexed Sequential Access Method)
  • 5.  A Database Management System (DBMS) stores data in the form of tables, uses ER model and the goal is ACID properties. For example, a DBMS of college has tables for students, faculty, etc.  A Data Warehouse is separate from DBMS, it stores a huge amount of data, which is typically collected from multiple heterogeneous sources like files, DBMS, etc.  The goal is to produce statistical results that may help in decision makings. For example, a college might want to see quick different results, like how the placement of CS students has improved over the last 10 years, in terms of salaries, counts, etc.
  • 6.  An ordinary Database can store MBs to GBs of data and that too for a specific purpose.  For storing data of TB size, the storage shifted to Data Warehouse. Besides this, a transactional database doesn’t offer itself to analytics.  To effectively perform analytics, an organization keeps a central Data Warehouse to closely study its business by organizing, understanding, and using its historic data for taking strategic decisions and analyzing trends.
  • 7.  Better business analytics: Data warehouse plays an important role in every business to store and analysis of all the past data and records of the company. which can further increase the understanding or analysis of data to the company.  Faster Queries: Data warehouse is designed to handle large queries that’s why it runs queries faster than the database.  Improved data Quality: In the data warehouse the data you gathered from different sources is being stored and analyzed it does not interfere with or add data by itself so your quality of data is maintained and if you get any issue regarding data quality then the data warehouse team will solve this.  Historical Insight: The warehouse stores all your historical data which contains details about the business so that one can analyze it at any time and extract insights from it
  • 8.  The File is a collection of records. Using the primary key, we can access the records. The type and frequency of access can be determined by the type of file organization which was used for a given set of records.  File organization is a logical relationship among various records. This method defines how file records are mapped onto disk blocks.  File organization is used to describe the way in which the records are stored in terms of blocks, and the blocks are placed on the storage medium.  The first approach to map the database to the file is to use the several files and store only one fixed length record in any given file. An alternative approach is to structure our files so that we can contain multiple lengths for records.  Files of fixed length records are easier to implement than the files of variable length records.
  • 9.  It contains an optimal selection of records, i.e., records can be selected as fast as possible.  To perform insert, delete or update transaction on the records should be quick and easy.  The duplicate records cannot be induced as a result of insert, update or delete.  For the minimal cost of storage, records should be stored efficiently.
  • 10.  File organization contains various methods. These particular methods have pros and cons on the basis of access or selection. In the file organization, the programmer decides the best-suited file organization method according to his requirement.
  • 11.  Sequential file organization  Heap file organization  Hash file organization  B+ file organization  Indexed sequential access method (ISAM)  Cluster file organization
  • 12.  In relational databases, a record is a group of related data held within the same structure. More specifically, a record is a grouping of fields within a table that reference one particular object. The term record is frequently used synonymously with row.  For example, a customer record may include items, such as first name, physical address, email address, date of birth and gender.  A record is also known as a tuple.
  • 13.  Fixed-length records means setting a length and storing the records into the file. If the record size exceeds the fixed size, it gets divided into more than one block.  Due to the fixed size there occurs following two problems:  Partially storing subparts of the record in more than one block requires access to all the blocks containing the subparts to read or write in it.  It is difficult to delete a record in such a file organization. It is because if the size of the existing record is smaller than the block size, then another record or a part fills up the block.
  • 14.  Variable-length records are the records that vary in size. It requires the creation of multiple blocks of multiple sizes to store them. These variable-length records are kept in the following ways in the database system:  Storage of multiple record types in a file.  It is kept as Record types that enable repeating fields like multisets or arrays.  It is kept as Record types that enable variable lengths either for one field or more.  In variable-length records, there exist the following two problems:  Defining the way of representing a single record so as to extract the individual attributes easily.  Defining the way of storing variable-length records within a block so as to extract that record in a block easily.
  • 15.  Indexing is used to optimize the performance of a database by minimizing the number of disk accesses required when a query is processed.  The index is a type of data structure. It is used to locate and access the data in a database table quickly.  Index structure:  Indexes can be created using some database columns.
  • 16.  The first column of the database is the search key that contains a copy of the primary key or candidate key of the table. The values of the primary key are stored in sorted order so that the corresponding data can be accessed easily.  The second column of the database is the data reference. It contains a set of pointers holding the address of the disk block where the value of the particular key can be found.
  • 18.  Ordered indices  The indices are usually sorted to make searching faster. The indices which are sorted are known as ordered indices.  Primary Index  If the index is created on the basis of the primary key of the table, then it is known as primary indexing. These primary keys are unique to each record and contain 1:1 relation between the records.  As primary keys are stored in sorted order, the performance of the searching operation is quite efficient.  The primary index can be classified into two types: Dense index and Sparse index.
  • 19.  Dense index  The dense index contains an index record for every search key value in the data file. It makes searching faster.  In this, the number of records in the index table is same as the number of records in the main table.  It needs more space to store index record itself. The index records have the search key and a pointer to the actual record on the disk.  Sparse index  In the data file, index record appears only for a few items. Each item points to a block.  In this, instead of pointing to each record in the main table, the index points to the records in the main table in a gap.
  • 20.  Clustering Index  A clustered index can be defined as an ordered data file. Sometimes the index is created on non-primary key columns which may not be unique for each record.  In this case, to identify the record faster, we will group two or more columns to get the unique value and create index out of them. This method is called a clustering index.  The records which have similar characteristics are grouped, and indexes are created for these group.
  • 21.  Secondary Index  In the sparse indexing, as the size of the table grows, the size of mapping also grows. These mappings are usually kept in the primary memory so that address fetch should be faster. Then the secondary memory searches the actual data based on the address got from mapping. If the mapping size grows then fetching the address itself becomes slower. In this case, the sparse index will not be efficient. To overcome this problem, secondary indexing is introduced.  In secondary indexing, to reduce the size of mapping, another level of indexing is introduced. In this method, the huge range for the columns is selected initially so that the mapping size of the first level becomes small. Then each range is further divided into smaller ranges. The mapping of the first level is stored in the primary memory, so that address fetch is faster. The mapping of the second level and actual data are stored in the secondary memory (hard disk).
  • 22.  With the growth of the size of the database, indices also grow. As the index is stored in the main memory, a single-level index might become too large a size to store with multiple disk accesses.  The multilevel indexing segregates the main block into various smaller blocks so that the same can be stored in a single block. The outer blocks are divided into inner blocks which in turn are pointed to the data blocks. This can be easily stored in the main memory with fewer overheads.  In Relational Database Management Systems (RDBMS), indexes are essential data structures that allow faster data retrieval by reducing the number of disk accesses required to retrieve data. But, traditional indexes can become inefficient as the database size grows. Multilevel indexes provide a solution to this problem by dividing the index into smaller, manageable pieces.
  • 23.  Indexing helps to optimize the performance of a database. It minimizes the number of disk accesses required when a query is processed. It is a data structure technique which is used to quickly locate and access the data in a database.  There are two things used in indexing, these are : Search Key or Candidate key and Data Reference or Pointer.
  • 24.  B Tree is a self-balancing tree data structure.  It stores and maintains data in a sorted form where the left children of the root are smaller than the root and the right children are larger than the root in value.  It makes searching efficient and allows all operations in logarithmic time. It allows nodes with more than two children.  B-tree is used for implementing multilevel indexing.  Every node of the B-tree stores the key-value along with the data pointer pointing to the block in the disk file containing that key.
  • 25.  Every node has at most m children where m is the order of the B-Tree.  A node with K children contains K-1 keys.  Every non-leaf node except the root node must have at least ⌈m/2⌉ child nodes.  The root must have at least 2 children if it is not the leaf node too.  All leaves of a B-Tree stays at the same level.  Unlike other trees, its height increases upwards towards the root, and insertion happens at the leaf node.  The time complexity of all the operations of a B-Tree is O(log n), here ‘n’ is the number of elements in the B-Tree.
  • 28.  A Data Warehouse is separate from DBMS, it stores a huge amount of data, which is typically collected from multiple heterogeneous sources like files, DBMS, etc.  The goal is to produce statistical results that may help in decision makings.  For example, a college might want to see quick different results, like how the placement of CS students has improved over the last 10 years, in terms of salaries, counts, etc.  An ordinary Database can store MBs to GBs of data and that too for a specific purpose. For storing data of TB size, the storage shifted to Data Warehouse. Besides this, a transactional database doesn’t offer itself to analytics. To effectively perform analytics, an organization keeps a central Data Warehouse to closely study its business by organizing, understanding, and using its historic data for taking strategic decisions and analyzing trends.
  • 29.  Better business analytics: Data warehouse plays an important role in every business to store and analysis of all the past data and records of the company. which can further increase the understanding or analysis of data to the company.  Faster Queries: Data warehouse is designed to handle large queries that’s why it runs queries faster than the database.  Improved data Quality: In the data warehouse the data you gathered from different sources is being stored and analyzed it does not interfere with or add data by itself so your quality of data is maintained and if you get any issue regarding data quality then the data warehouse team will solve this.  Historical Insight: The warehouse stores all your historical data which contains details about the business so that one can analyze it at any time and extract insights from it
  • 30.  A data-warehouse is a heterogeneous collection of different data sources organised under a unified schema. There are 2 approaches for constructing data-warehouse: Top-down approach and Bottom-up approach are explained as below.  1. Top-down approach:
  • 31.  External Sources – External source is a source from where data is collected irrespective of the type of data. Data can be structured, semi structured and unstructured as well.  Stage Area – Since the data, extracted from the external sources does not follow a particular format, so there is a need to validate this data to load into datawarehouse. For this purpose, it is recommended to use ETL tool.  E(Extracted): Data is extracted from External data source.  T(Transform): Data is transformed into the standard format.  L(Load): Data is loaded into datawarehouse after transforming it into the standard format.
  • 32.  Data-warehouse – After cleansing of data, it is stored in the datawarehouse as central repository. It actually stores the meta data and the actual data gets stored in the data marts. Note that datawarehouse stores the data in its purest form in this top-down approach.  Data Marts – Data mart is also a part of storage component. It stores the information of a particular function of an organisation which is handled by single authority. There can be as many number of data marts in an organisation depending upon the functions. We can also say that data mart contains subset of the data stored in datawarehouse.  Data Mining – The practice of analysing the big data present in datawarehouse is data mining. It is used to find the hidden patterns that are present in the database or in datawarehouse with the help of algorithm of data mining. This approach is defined by Inmon as – datawarehouse as a central repository for the complete organisation and data marts are created from it after the complete datawarehouse has been created.
  • 34.  First, the data is extracted from external sources (same as happens in top-down approach).  Then, the data go through the staging area (as explained above) and loaded into data marts instead of datawarehouse. The data marts are created first and provide reporting capability. It addresses a single business area.  These data marts are then integrated into datawarehouse.
  • 35.  Any data warehouse will consist of random data which will surely be in unstructured manner with a lot of unwanted and dirty data.  Dirty data refers to incomplete and noisy data containing errors.  To make this data structured and noise free, dirty data needs to be removed. This will help in converting data into useful information and can be achieved using certain data warehouse operations.  These operations are combination of ETL(Extraction, Transform, Loading) operations along with data cleaning and data refresh operations. 
  • 37.  Data Cleaning  In data cleaning, inconsistencies are removed. Also, noisy data containing errors are also rectified.  For example : Cleaning of redundant(duplicate) data.  Data Refresh  In data refresh operation, data in data warehouse is refreshed by broadcasting the data from multiple sources and updating it on timely basis. This is done because, data inside data bases are updated every minute and to get this same data on data warehouse, the process of refreshing is performed.  Extraction of Data  Data obtained after cleaning and refresh is still unstructured and unorganized. To make it organised and enable user to extract and retrieve relevant data is done through data extraction process. This is helpful, if any user wants to mine the data. 
  • 38.  Transformation of data  Data obtained through heterogeneous data bases have native structure of their respective databases that might be different from that structure of data warehouse. So, transformation of data from heterogeneous database is done to organize data in the structure similar to that of the data warehouse.  Data Loading  Data loading is responsible for loading the data to its respective target data repository that might include data bases, data marts data warehouses etc.