Fragmentation in distributed databases
Fragmentation in distributed databases
ing and managing the system in a triggers are special programs or actions that automatically run in a database when certain
parts called fragments, which are stored at different locations (sites) in a network. The way that minimizes resource use and expenses. events happen.Think of a trigger like a reaction:
main goal is to improve performance, manageability, and availability by placing data close The idea is to balance storage, computation, and communication costs. "When X happens, do Y automatically."
to where it’s most often used. There are three major types of cost In technical terms:
There are three main types of fragmentation: 1.Storage Cost:Cost of storing data at different sites. A trigger is a stored procedure that executes automatically in response to certain events
1.Horizontal Fragmentation:Splits a table by rows. Includes extra storage due to replication (having copies of data for fault tolerance). on a table or view.Common events:
Each fragment has a subset of rows based on some condition.Example: A Customers table 2.Processing Cost:Cost of CPU time needed to execute queries, updates, and transactions. INSERT → when a new record is added
might be split into "Customers from Europe" and "Customers from Asia" fragments. Depends on how data is fragmented, where it is placed, and how much computation UPDATE → when an existing record is changed
2.Vertical Fragmentation:Splits a table by columns. happens at each site. DELETE → when a record is removed
Each fragment has a subset of columns. 3.Communication Cost:The big one in distributed systems.Cost of transferring data Example:Imagine you have a BankAccounts table.
Example: One fragment has "CustomerName" and "PhoneNumber", another has between sites over a network.Especially important for queries that need data from You can set a trigger to automatically log any transaction:
"CustomerID" and "BillingAddress". multiple fragments/sites. CREATE TRIGGER log_transaction
3.Hybrid (or Mixed) Fragmentation:A combination of horizontal and vertical Cost optimization = Smart data fragmentation + Smart data placement + Smart query AFTER UPDATE ON BankAccounts
fragmentation. execution. FOR EACH ROW
You first split by rows, then further split by columns. BEGIN
INSERT INTO TransactionsLog (AccountID, OldBalance, NewBalance, TransactionDate)
VALUES (OLD.AccountID, OLD.Balance, NEW.Balance, NOW());
END;
What are Variables in PL/SQL? What is Concurrency Control? What is Query Processing?
Variables are named storage locations in PL/SQL that temporarily hold data during the Concurrency control is all about managing multiple users (or transactions) who are trying When you write an SQL query (like SELECT * FROM Students), the database does a lot of
execution of a program. to access or modify the database at the same time, without messing things up. work behind the scenes to understand, optimize, and execute it efficiently.
Different Types of Variables Goal:Keep the database accurate and consistent Query Processing = Turning your high-level SQL query into something the database engine
Type Example Notes Prevent conflicts between operations happening at the same time. can execute quickly.
NUMBER, VARCHAR2, Why is it needed? Step What happens Key idea
Scalar variables Store single values Imagine two people trying to book the last ticket for a concert at the same time:
DATE Check if the query is
Without concurrency control, both could get a confirmation — even though only one Is the SQL valid? Are all tables
Composite Store collections of 1. Parsing correct (syntax and
RECORD, TABLE ticket exists! and columns correct?
semantics)
variables values
The system needs to control the order of operations so that one person's transaction Database converts SQL into its
Reference finishes before the other one starts. Translate SQL into an
CURSOR Point to a result set 2. Translation internal form (like relational
variables Example with Locking: internal query language
algebra)
User-defined types Custom OBJECTS Advanced structures Suppose two transactions want to update the same row: Find the best (fastest) way Database analyzes different
Basic syntax: 1.Transaction A locks the row. 3. Optimization
to run the query strategies and picks the best one
DECLARE 2.Transaction B tries to access but has to wait. 4. Actually run the query Execute the chosen plan and
variable_name datatype [DEFAULT value]; 3.Transaction A finishes and commits. Evaluation/Execution and fetch results return data to the user
BEGIN 4.Now Transaction B can proceed safely. Example: Your SQL:
-- program logic This prevents lost updates. SELECT Name FROM Students WHERE Age > 18;
END; parsing:→ Checks that Students and Age exist.
Example: Translation:→ Converts to "Scan Students table, filter Age > 18, project Name".
DECLARE Optimization:→ Notices there’s an index on Age, decides to use index scan instead of full
v_employee_name VARCHAR2(50); scan.
v_salary NUMBER(8,2) DEFAULT 5000; Execution:→ Actually reads the matching records using the index and returns the names.
BEGIN
-- use the variables here
END;
What is a Parallel Database? recovery and Transaction Management in Databases What are Constraints?
A Parallel Database is a system where multiple processors (CPUs or servers) work together In a database management system (DBMS), recovery management and transaction Constraints are rules that you apply to database tables to make sure the data stays correct
to process data at the same time, instead of one processor doing all the work. management are crucial to ensure the integrity, consistency, and availability of data, and meaningful.
Goal:Speed up query execution Key Concepts: They prevent invalid data from being stored in the database.
Handle larger data (Big Data) 1.Transactions: A transaction is a sequence of database operations (such as INSERT, Why use Constraints?
Improve scalability and reliability UPDATE, DELETE) that are executed as a single logical unit of work. The key properties of a To protect data integrity
Why use Parallel Databases? transaction are defined by the ACID properties. To enforce business rules automatically
Imagine you have to search through millions of records — if one CPU does it, it could take 2.Logs: To ensure atomicity and durability, DBMS uses transaction logs (also known as To avoid manual error checking in applications
hours. write-ahead logs) to record all changes made to the database. The logs allow the system Types of Constraints:
But if 10 CPUs search 10% each at the same time, you can finish 10 times faster! to redo or undo transactions in case of failure. Constraint Type
More processors = faster query response = better performance 3.Checkpointing: A checkpoint is a process in which the DBMS saves the current state of NOT NULL:Makes sure a column cannot be empty
How Parallel Databases Work: the database and transaction logs to persistent storage. This minimizes the amount of UNIQUE:Makes sure all values in a column are different
The data is split across multiple disks/nodes (this is called partitioning). work needed for recovery after a failure. PRIMARY KEY:Uniquely identifies each row + NOT NULL
A query is divided into tasks. FOREIGN KEY:Links rows between tables
Tasks are assigned to different processors to execute in parallel. CHECK Makes sure values satisfy a specific condition
The results from all processors are combined into the final answer. DEFAULT:Assigns a default value if none is provided
explain pl/sql structure Query Optimization is the process of improving the performance of a database query, What is IF-THEN in PL/SQL?
DECLARE -- Declaration section (optional) making it execute faster and use fewer resources (like CPU, memory, and disk I/O). It IF-THEN is used to make decisions in a PL/SQL program.
BEGIN -- Executable section (mandatory) involves selecting the most efficient execution plan for a given SQL query. It checks a condition, and if the condition is true, it executes some code.
EXCEPTION -- Exception handling section (optional) steps in Query Optimization: Just like:
END; 1.Parsing the Query:The database parses the SQL query to check for syntax and "If it’s raining, then take an umbrella!"
DECLARE ex:- semantics. Basic Syntax:
v_emp_name VARCHAR2(50); 2.Logical Plan Generation:The query tree is translated into a logical execution plan that IF condition THEN
v_salary NUMBER(8,2); outlines how the query will be executed. -- statements to execute if condition is true
BEGIN 3.Choosing the Best Execution Plan:The database considers multiple ways to execute the END IF;
-- Fetch employee name and salary from database query and selects the most efficient plan. DECLARE
SELECT name, salary INTO v_emp_name, v_salary 4.Physical Plan Generation:After selecting a logical plan, the database creates a physical v_salary NUMBER := 3000;
FROM employees execution plan, which specifies the actual operations and indexes used. BEGIN
WHERE emp_id = 101; 5.Execution:The chosen physical plan is executed by the database, and the results are IF v_salary < 5000 THEN
-- Display employee info returned.ex:- DBMS_OUTPUT.PUT_LINE('Salary is less than 5000');
DBMS_OUTPUT.PUT_LINE('Employee: ' || v_emp_name || ' Salary: ' || v_salary); SELECT Name, Salary END IF;
EXCEPTION FROM Employees END;
WHEN NO_DATA_FOUND THEN WHERE Department = 'HR' AND Salary > 50000; Here, since v_salary is 3000, the message will be printed.
DBMS_OUTPUT.PUT_LINE('Employee not found.');
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Error occurred: ' || SQLERRM);
END;
Distributed Transaction Management Standard Architecture for Parallel Databases write pl/sql program to add three numbers
In a distributed database system, data is stored across multiple sites or servers that are A parallel database system is designed to handle large-scale data processing by utilizing DECLARE
geographically distributed. Distributed transaction management refers to the process of multiple processors and storage devices to divide the work and increase efficiency. -- Declare variables for the three numbers and the result
ensuring that transactions in a distributed system are executed correctly and consistently, Parallelism can be introduced at different levels of a database system, such as at the num1 NUMBER := 10; -- First number
Key Components: query execution level, data storage, and data retrieval level. num2 NUMBER := 20; -- Second number
1.Distributed Transactions: A distributed transaction is a transaction that accesses and Parallel database architectures are generally categorized into three main types based on num3 NUMBER := 30; -- Third number
modifies data on multiple databases or systems that are located in different physical how parallelism is implemented: sum NUMBER; -- Variable to store the sum
locations. These databases are part of a distributed database system, which may use 1. Shared Memory Architecture (SMP) BEGIN
different platforms or technologies. In Shared Memory Systems, multiple processors share a single memory space. All -- Add the three numbers
2.Transaction Coordinator: A transaction coordinator is responsible for managing the processors have direct access to the same memory, so they can efficiently communicate sum := num1 + num2 + num3;
distributed transaction. It ensures that the transaction is correctly executed across all and share data.ex-Oracle Real Application Clusters (RAC), -- Output the result
participating systems and databases, and it also ensures the ACID properties (Atomicity, 2. Shared Disk Architecture:In Shared Disk Systems, each processor has its own private DBMS_OUTPUT.PUT_LINE('The sum of ' || num1 || ', ' || num2 || ', and ' || num3 || '
Consistency, Isolation, Durability). memory but they all share a common disk storage. In other words, each processor has is: ' || sum);
3.Participants: Each individual database or system that is involved in the transaction is access to all the disks, but each has its own memory.ex-IBM DB2 END;
referred to as a participant. A participant is responsible for performing its part of the 1.Shared Nothing Architecture: Out:-The sum of 10, 20, and 30 is: 60.
transaction, either committing or rolling back its changes based on the overall In Shared Nothing Systems, each processor has its own private memory and private disk
transaction’s status. storage.ex-google big table , amazon dynamo db
Principle of dbms