SlideShare a Scribd company logo
Unveiling the Core: Internal
Architecture of DBMS
Welcome to an in-depth exploration of the internal architecture of
Database Management Systems (DBMS). This presentation will
demystify the sophisticated mechanisms that enable efficient data
storage, retrieval, and manipulation. Understanding these
foundational components is crucial for any computer science
student or database professional aiming to build robust and high-
performing database applications. We will delve into the intricate
processes that occur behind the scenes, from the moment a query
is submitted to the secure storage and transaction handling of
critical data.
by MD. SHAHAN AL MUNIM
The Journey of a Query: Query Processing
Parsing & Translation
The SQL query is first parsed for syntax and semantic correctness. It is then translated into an internal representation, such
as a relational algebra tree, preparing it for optimization.
Optimization
This critical phase involves identifying the most efficient execution plan for the query. The query optimizer considers various
factors like indexing, join algorithms, and data distribution to minimize cost and maximize performance.
Execution
The chosen execution plan is then carried out by the query execution engine. This involves retrieving data from storage,
performing necessary operations (e.g., sorting, filtering, joining), and returning the results to the user.
Query processing is the engine of any DBMS, transforming high-level user requests into actionable instructions for the system. Each step
is meticulously designed to ensure accuracy and speed, making the difference between a sluggish and a responsive database system.
Effective optimization is key to handling complex queries on massive datasets efficiently.
Ensuring Data Integrity: Transaction Management
Atomicity
Ensures that a transaction is treated as a single, indivisible
unit. Either all operations within the transaction are
completed successfully, or none of them are.
Consistency
Guarantees that a transaction brings the database from
one valid state to another. All data integrity constraints
must be satisfied at the beginning and end of a
transaction.
Isolation
Ensures that concurrent transactions execute
independently without interfering with each other. The
intermediate state of a transaction is not visible to other
transactions.
Durability
Guarantees that once a transaction has been committed,
its changes are permanently stored in the database and
survive any subsequent system failures.
Transaction management is fundamental to maintaining the reliability and integrity of data in a multi-user environment. It relies on
the ACID properties (Atomicity, Consistency, Isolation, Durability) to ensure that operations are processed reliably, even in the face of
concurrent access and system failures. These properties are crucial for applications where data accuracy is paramount, such as
financial systems.
The Foundation: Storage Management
1
Buffer Management
Manages the flow of data between main memory and disk storage to optimize I/O operations.
2
File Organization
Determines how data records are physically stored on disk, impacting retrieval efficiency (e.g.,
heap, sequential, hashed files).
3
Indexing
Provides efficient data access paths by creating data structures (e.g., B-trees, hash
tables) that map search keys to data locations.
4
Disk Space Management
Allocates and deallocates disk space for files and records, handling issues
like fragmentation and free space tracking.
Storage management is the bedrock of any DBMS, responsible for how data is physically stored and retrieved from disk. It encompasses various techniques to
ensure data persistence, efficient access, and effective utilization of storage resources. Without robust storage management, even the most sophisticated query
processors and transaction managers would struggle to perform adequately.
Interacting with Storage: Buffer Management
Role of the Buffer Pool
The buffer pool is a crucial component of main memory
used to cache data blocks frequently accessed from
disk. It minimizes disk I/O, which is significantly slower
than memory access, thereby boosting overall query
performance.
Replacement Policies
Effective buffer management employs various
replacement policies (e.g., LRU, FIFO, Clock) to decide
which pages to evict from the buffer pool when new
pages need to be loaded. The choice of policy
significantly impacts performance based on access
patterns.
Buffer management is a sophisticated caching mechanism that plays a vital role in bridging the speed gap between
CPU and disk. By intelligently predicting and caching frequently used data, it drastically reduces the number of
expensive disk reads, making database operations much faster and more responsive. Its efficiency is a major
determinant of database performance.
Organizing Data on Disk: File and Record
Management
Heap Files
Records are stored in no
particular order. Suitable for
small tables or when records are
frequently inserted and deleted.
Retrieval often requires scanning
the entire file.
Sequential Files
Records are stored in a specific
order based on a search key.
Ideal for batch processing and
range queries, but insertions can
be costly.
Hashed Files
Records are stored based on a
hash function applied to a search
key. Provides very fast direct
access for equality queries, but
range queries are inefficient.
File and record management dictates the physical layout of data on secondary storage. The chosen file organization
method significantly impacts the efficiency of various database operations, particularly data retrieval and insertion.
Each method has trade-offs in terms of performance for different types of queries and data modification patterns.
Accelerating Data Access: Indexing Techniques
B+
Tree Indexes
B-trees and B+ trees are widely used.
They provide efficient search,
insertion, and deletion operations,
especially for range queries.
Hash
Hash Indexes
Based on hashing techniques, these
indexes provide extremely fast
average-case performance for
equality searches. Less suitable for
range queries.
Bitmap
Bitmap Indexes
Used for columns with low
cardinality. They represent data as
bitmaps, which are efficient for
complex queries involving multiple
conditions.
Indexing is a crucial optimization technique that significantly speeds up data retrieval. By creating auxiliary data
structures that map search keys to the physical locations of records, indexes allow the DBMS to locate data without
scanning entire tables. Selecting the appropriate indexing strategy is vital for optimizing query performance in a
database.
Coordinating Concurrent Access:
Concurrency Control
Locking
Transactions acquire locks on data items to prevent other transactions from accessing
them concurrently, ensuring isolation.
Timestamping
Each transaction is assigned a unique timestamp, and operations are ordered based on
these timestamps to resolve conflicts.
Optimistic
Assumes conflicts are rare. Transactions execute without locking, validate at commit time,
and roll back if conflicts are detected.
Concurrency control mechanisms are essential in multi-user database systems to ensure that
simultaneous transactions do not interfere with each other, leading to inconsistent data. These
techniques maintain the Isolation property of ACID transactions, preventing issues like lost
updates, dirty reads, and unrepeatable reads. The choice of mechanism depends on the expected
transaction workload and conflict rates.
Recovering from Failures: Database Recovery
Logging
Recording all changes made to the database in a log file. This
log is crucial for undoing or redoing operations during
recovery.
Checkpointing
Periodically saving the state of the database to disk, reducing
the amount of work required for recovery after a crash.
Rollback & Rollforward
Using the log, transactions can be undone (rolled back) to a
consistent state or redone (rolled forward) to apply committed
changes.
Database recovery ensures that the database remains consistent and durable even after system failures like power outages, software
bugs, or disk crashes. By meticulously logging all operations and periodically saving consistent states, the DBMS can restore the database
to its last known consistent state, minimizing data loss and ensuring continuous availability. This capability is vital for business continuity.
Key Takeaways & Next Steps
Understanding the internal architecture of a DBMS, encompassing query processing, transaction management, and storage
management, provides a foundational insight into how databases truly work. These intricate components collaborate to
deliver the performance, reliability, and data integrity that modern applications demand.
For computer science students, further exploration of specific algorithms (e.g., query optimization algorithms, concurrency
control protocols like Two-Phase Locking) and practical implementation details in various DBMS products would be highly
beneficial. Database professionals can leverage this knowledge to optimize existing systems, troubleshoot performance
issues, and design more efficient database schemas. The journey into database internals is continuous, offering endless
opportunities for learning and innovation.

More Related Content

Similar to Internal Architecture of Database Management Systems (20)

Elimination of data redundancy before persisting into dbms using svm classifi...
Elimination of data redundancy before persisting into dbms using svm classifi...
nalini manogaran
 
Ans mi0034-database management system-sda-2012-ii
Ans mi0034-database management system-sda-2012-ii
zafarishtiaq
 
Implementing sorting in database systems
Implementing sorting in database systems
unyil96
 
Introduction-to-Databases.pptx
Introduction-to-Databases.pptx
IvanDarrylLopez
 
Data warehouse concepts
Data warehouse concepts
obieefans
 
DW 101
DW 101
jeffd00
 
MS-CIT Unit 9.pptx
MS-CIT Unit 9.pptx
SHRIBALAJIINFOTECH
 
Capitalizing on the New Era of In-memory Computing
Capitalizing on the New Era of In-memory Computing
Infosys
 
Data warehouse
Data warehouse
RajThakuri
 
History Of Database Technology
History Of Database Technology
Jacqueline Thomas
 
Advanced Database System
Advanced Database System
sushmita rathour
 
Process management seminar
Process management seminar
apurva_naik
 
DBArtisan XE6 Datasheet
DBArtisan XE6 Datasheet
Embarcadero Technologies
 
Datawarehousing
Datawarehousing
sumit621
 
Data Orchestration Solution: An Integral Part of DataOps
Data Orchestration Solution: An Integral Part of DataOps
Enov8
 
Informatica and datawarehouse Material
Informatica and datawarehouse Material
obieefans
 
introductiontodatabaseDATABASE MANA .pptx
introductiontodatabaseDATABASE MANA .pptx
lavanyashreedp91
 
Unit 2 rdbms study_material
Unit 2 rdbms study_material
gayaramesh
 
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
IJDMS
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining for
AyushMeraki1
 
Elimination of data redundancy before persisting into dbms using svm classifi...
Elimination of data redundancy before persisting into dbms using svm classifi...
nalini manogaran
 
Ans mi0034-database management system-sda-2012-ii
Ans mi0034-database management system-sda-2012-ii
zafarishtiaq
 
Implementing sorting in database systems
Implementing sorting in database systems
unyil96
 
Introduction-to-Databases.pptx
Introduction-to-Databases.pptx
IvanDarrylLopez
 
Data warehouse concepts
Data warehouse concepts
obieefans
 
Capitalizing on the New Era of In-memory Computing
Capitalizing on the New Era of In-memory Computing
Infosys
 
Data warehouse
Data warehouse
RajThakuri
 
History Of Database Technology
History Of Database Technology
Jacqueline Thomas
 
Process management seminar
Process management seminar
apurva_naik
 
Datawarehousing
Datawarehousing
sumit621
 
Data Orchestration Solution: An Integral Part of DataOps
Data Orchestration Solution: An Integral Part of DataOps
Enov8
 
Informatica and datawarehouse Material
Informatica and datawarehouse Material
obieefans
 
introductiontodatabaseDATABASE MANA .pptx
introductiontodatabaseDATABASE MANA .pptx
lavanyashreedp91
 
Unit 2 rdbms study_material
Unit 2 rdbms study_material
gayaramesh
 
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
EVALUATE DATABASE COMPRESSION PERFORMANCE AND PARALLEL BACKUP
IJDMS
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining for
AyushMeraki1
 

Recently uploaded (20)

apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays
 
apidays New York 2025 - Using GraphQL SDL files as executable API Contracts b...
apidays New York 2025 - Using GraphQL SDL files as executable API Contracts b...
apidays
 
Data-Driven-Operational--Excellence.pptx
Data-Driven-Operational--Excellence.pptx
NiwanthaThilanjanaGa
 
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Ameya Patekar
 
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays
 
apidays Singapore 2025 - What exactly are AI Agents by Aki Ranin (Earthshots ...
apidays Singapore 2025 - What exactly are AI Agents by Aki Ranin (Earthshots ...
apidays
 
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays
 
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays
 
最新版美国威斯康星大学拉克罗斯分校毕业证(UW–L毕业证书)原版定制
最新版美国威斯康星大学拉克罗斯分校毕业证(UW–L毕业证书)原版定制
Taqyea
 
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays
 
apidays New York 2025 - Beyond Webhooks: The Future of Scalable API Event Del...
apidays New York 2025 - Beyond Webhooks: The Future of Scalable API Event Del...
apidays
 
Philippine-Constitution-and-Law in hospitality
Philippine-Constitution-and-Law in hospitality
kikomendoza006
 
Untitled presentation xcvxcvxcvxcvx.pptx
Untitled presentation xcvxcvxcvxcvx.pptx
jonathan4241
 
SAP_S4HANA_PPM_IT_Corporate_Services_Presentation.pptx
SAP_S4HANA_PPM_IT_Corporate_Services_Presentation.pptx
vemulavenu484
 
[Eddie Lee] Capstone Project - AI PM Bootcamp - DataFox.pdf
[Eddie Lee] Capstone Project - AI PM Bootcamp - DataFox.pdf
Eddie Lee
 
FME Beyond Data Processing: Creating a Dartboard Accuracy App
FME Beyond Data Processing: Creating a Dartboard Accuracy App
jacoba18
 
Hypothesis Testing Training Material.pdf
Hypothesis Testing Training Material.pdf
AbdirahmanAli51
 
apidays New York 2025 - The Challenge is Not the Pattern, But the Best Integr...
apidays New York 2025 - The Challenge is Not the Pattern, But the Best Integr...
apidays
 
Report_Government Authorities_Index_ENG_FIN.pdf
Report_Government Authorities_Index_ENG_FIN.pdf
OlhaTatokhina1
 
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays
 
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays New York 2025 - Why an SDK is Needed to Protect APIs from Mobile Apps...
apidays
 
apidays New York 2025 - Using GraphQL SDL files as executable API Contracts b...
apidays New York 2025 - Using GraphQL SDL files as executable API Contracts b...
apidays
 
Data-Driven-Operational--Excellence.pptx
Data-Driven-Operational--Excellence.pptx
NiwanthaThilanjanaGa
 
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Ameya Patekar
 
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays Singapore 2025 - 4 Identity Essentials for Scaling SaaS in Large Orgs...
apidays
 
apidays Singapore 2025 - What exactly are AI Agents by Aki Ranin (Earthshots ...
apidays Singapore 2025 - What exactly are AI Agents by Aki Ranin (Earthshots ...
apidays
 
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays Singapore 2025 - Enhancing Developer Productivity with UX (Government...
apidays
 
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays New York 2025 - Breaking Barriers: Lessons Learned from API Integrati...
apidays
 
最新版美国威斯康星大学拉克罗斯分校毕业证(UW–L毕业证书)原版定制
最新版美国威斯康星大学拉克罗斯分校毕业证(UW–L毕业证书)原版定制
Taqyea
 
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays New York 2025 - Fast, Repeatable, Secure: Pick 3 with FINOS CCC by Le...
apidays
 
apidays New York 2025 - Beyond Webhooks: The Future of Scalable API Event Del...
apidays New York 2025 - Beyond Webhooks: The Future of Scalable API Event Del...
apidays
 
Philippine-Constitution-and-Law in hospitality
Philippine-Constitution-and-Law in hospitality
kikomendoza006
 
Untitled presentation xcvxcvxcvxcvx.pptx
Untitled presentation xcvxcvxcvxcvx.pptx
jonathan4241
 
SAP_S4HANA_PPM_IT_Corporate_Services_Presentation.pptx
SAP_S4HANA_PPM_IT_Corporate_Services_Presentation.pptx
vemulavenu484
 
[Eddie Lee] Capstone Project - AI PM Bootcamp - DataFox.pdf
[Eddie Lee] Capstone Project - AI PM Bootcamp - DataFox.pdf
Eddie Lee
 
FME Beyond Data Processing: Creating a Dartboard Accuracy App
FME Beyond Data Processing: Creating a Dartboard Accuracy App
jacoba18
 
Hypothesis Testing Training Material.pdf
Hypothesis Testing Training Material.pdf
AbdirahmanAli51
 
apidays New York 2025 - The Challenge is Not the Pattern, But the Best Integr...
apidays New York 2025 - The Challenge is Not the Pattern, But the Best Integr...
apidays
 
Report_Government Authorities_Index_ENG_FIN.pdf
Report_Government Authorities_Index_ENG_FIN.pdf
OlhaTatokhina1
 
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays New York 2025 - Boost API Development Velocity with Practical AI Tool...
apidays
 
Ad

Internal Architecture of Database Management Systems

  • 1. Unveiling the Core: Internal Architecture of DBMS Welcome to an in-depth exploration of the internal architecture of Database Management Systems (DBMS). This presentation will demystify the sophisticated mechanisms that enable efficient data storage, retrieval, and manipulation. Understanding these foundational components is crucial for any computer science student or database professional aiming to build robust and high- performing database applications. We will delve into the intricate processes that occur behind the scenes, from the moment a query is submitted to the secure storage and transaction handling of critical data. by MD. SHAHAN AL MUNIM
  • 2. The Journey of a Query: Query Processing Parsing & Translation The SQL query is first parsed for syntax and semantic correctness. It is then translated into an internal representation, such as a relational algebra tree, preparing it for optimization. Optimization This critical phase involves identifying the most efficient execution plan for the query. The query optimizer considers various factors like indexing, join algorithms, and data distribution to minimize cost and maximize performance. Execution The chosen execution plan is then carried out by the query execution engine. This involves retrieving data from storage, performing necessary operations (e.g., sorting, filtering, joining), and returning the results to the user. Query processing is the engine of any DBMS, transforming high-level user requests into actionable instructions for the system. Each step is meticulously designed to ensure accuracy and speed, making the difference between a sluggish and a responsive database system. Effective optimization is key to handling complex queries on massive datasets efficiently.
  • 3. Ensuring Data Integrity: Transaction Management Atomicity Ensures that a transaction is treated as a single, indivisible unit. Either all operations within the transaction are completed successfully, or none of them are. Consistency Guarantees that a transaction brings the database from one valid state to another. All data integrity constraints must be satisfied at the beginning and end of a transaction. Isolation Ensures that concurrent transactions execute independently without interfering with each other. The intermediate state of a transaction is not visible to other transactions. Durability Guarantees that once a transaction has been committed, its changes are permanently stored in the database and survive any subsequent system failures. Transaction management is fundamental to maintaining the reliability and integrity of data in a multi-user environment. It relies on the ACID properties (Atomicity, Consistency, Isolation, Durability) to ensure that operations are processed reliably, even in the face of concurrent access and system failures. These properties are crucial for applications where data accuracy is paramount, such as financial systems.
  • 4. The Foundation: Storage Management 1 Buffer Management Manages the flow of data between main memory and disk storage to optimize I/O operations. 2 File Organization Determines how data records are physically stored on disk, impacting retrieval efficiency (e.g., heap, sequential, hashed files). 3 Indexing Provides efficient data access paths by creating data structures (e.g., B-trees, hash tables) that map search keys to data locations. 4 Disk Space Management Allocates and deallocates disk space for files and records, handling issues like fragmentation and free space tracking. Storage management is the bedrock of any DBMS, responsible for how data is physically stored and retrieved from disk. It encompasses various techniques to ensure data persistence, efficient access, and effective utilization of storage resources. Without robust storage management, even the most sophisticated query processors and transaction managers would struggle to perform adequately.
  • 5. Interacting with Storage: Buffer Management Role of the Buffer Pool The buffer pool is a crucial component of main memory used to cache data blocks frequently accessed from disk. It minimizes disk I/O, which is significantly slower than memory access, thereby boosting overall query performance. Replacement Policies Effective buffer management employs various replacement policies (e.g., LRU, FIFO, Clock) to decide which pages to evict from the buffer pool when new pages need to be loaded. The choice of policy significantly impacts performance based on access patterns. Buffer management is a sophisticated caching mechanism that plays a vital role in bridging the speed gap between CPU and disk. By intelligently predicting and caching frequently used data, it drastically reduces the number of expensive disk reads, making database operations much faster and more responsive. Its efficiency is a major determinant of database performance.
  • 6. Organizing Data on Disk: File and Record Management Heap Files Records are stored in no particular order. Suitable for small tables or when records are frequently inserted and deleted. Retrieval often requires scanning the entire file. Sequential Files Records are stored in a specific order based on a search key. Ideal for batch processing and range queries, but insertions can be costly. Hashed Files Records are stored based on a hash function applied to a search key. Provides very fast direct access for equality queries, but range queries are inefficient. File and record management dictates the physical layout of data on secondary storage. The chosen file organization method significantly impacts the efficiency of various database operations, particularly data retrieval and insertion. Each method has trade-offs in terms of performance for different types of queries and data modification patterns.
  • 7. Accelerating Data Access: Indexing Techniques B+ Tree Indexes B-trees and B+ trees are widely used. They provide efficient search, insertion, and deletion operations, especially for range queries. Hash Hash Indexes Based on hashing techniques, these indexes provide extremely fast average-case performance for equality searches. Less suitable for range queries. Bitmap Bitmap Indexes Used for columns with low cardinality. They represent data as bitmaps, which are efficient for complex queries involving multiple conditions. Indexing is a crucial optimization technique that significantly speeds up data retrieval. By creating auxiliary data structures that map search keys to the physical locations of records, indexes allow the DBMS to locate data without scanning entire tables. Selecting the appropriate indexing strategy is vital for optimizing query performance in a database.
  • 8. Coordinating Concurrent Access: Concurrency Control Locking Transactions acquire locks on data items to prevent other transactions from accessing them concurrently, ensuring isolation. Timestamping Each transaction is assigned a unique timestamp, and operations are ordered based on these timestamps to resolve conflicts. Optimistic Assumes conflicts are rare. Transactions execute without locking, validate at commit time, and roll back if conflicts are detected. Concurrency control mechanisms are essential in multi-user database systems to ensure that simultaneous transactions do not interfere with each other, leading to inconsistent data. These techniques maintain the Isolation property of ACID transactions, preventing issues like lost updates, dirty reads, and unrepeatable reads. The choice of mechanism depends on the expected transaction workload and conflict rates.
  • 9. Recovering from Failures: Database Recovery Logging Recording all changes made to the database in a log file. This log is crucial for undoing or redoing operations during recovery. Checkpointing Periodically saving the state of the database to disk, reducing the amount of work required for recovery after a crash. Rollback & Rollforward Using the log, transactions can be undone (rolled back) to a consistent state or redone (rolled forward) to apply committed changes. Database recovery ensures that the database remains consistent and durable even after system failures like power outages, software bugs, or disk crashes. By meticulously logging all operations and periodically saving consistent states, the DBMS can restore the database to its last known consistent state, minimizing data loss and ensuring continuous availability. This capability is vital for business continuity.
  • 10. Key Takeaways & Next Steps Understanding the internal architecture of a DBMS, encompassing query processing, transaction management, and storage management, provides a foundational insight into how databases truly work. These intricate components collaborate to deliver the performance, reliability, and data integrity that modern applications demand. For computer science students, further exploration of specific algorithms (e.g., query optimization algorithms, concurrency control protocols like Two-Phase Locking) and practical implementation details in various DBMS products would be highly beneficial. Database professionals can leverage this knowledge to optimize existing systems, troubleshoot performance issues, and design more efficient database schemas. The journey into database internals is continuous, offering endless opportunities for learning and innovation.