0% found this document useful (0 votes)
6 views

Fragmentation in distributed databases

The document discusses various aspects of distributed databases, including fragmentation, cost optimization, and triggers. It explains the types of fragmentation (horizontal, vertical, hybrid), the importance of concurrency control, and the structure of PL/SQL programs. Additionally, it covers parallel databases, transaction management, and the ACID properties that ensure data integrity and reliability.

Uploaded by

gananjay
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Fragmentation in distributed databases

The document discusses various aspects of distributed databases, including fragmentation, cost optimization, and triggers. It explains the types of fragmentation (horizontal, vertical, hybrid), the importance of concurrency control, and the structure of PL/SQL programs. Additionally, it covers parallel databases, transaction management, and the ACID properties that ensure data integrity and reliability.

Uploaded by

gananjay
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

Fragmentation in distributed databases: is the process of splitting a database into smaller cost optimization in distributed databases means designing

ing and managing the system in a triggers are special programs or actions that automatically run in a database when certain
parts called fragments, which are stored at different locations (sites) in a network. The way that minimizes resource use and expenses. events happen.Think of a trigger like a reaction:
main goal is to improve performance, manageability, and availability by placing data close The idea is to balance storage, computation, and communication costs. "When X happens, do Y automatically."
to where it’s most often used. There are three major types of cost In technical terms:
There are three main types of fragmentation: 1.Storage Cost:Cost of storing data at different sites. A trigger is a stored procedure that executes automatically in response to certain events
1.Horizontal Fragmentation:Splits a table by rows. Includes extra storage due to replication (having copies of data for fault tolerance). on a table or view.Common events:
Each fragment has a subset of rows based on some condition.Example: A Customers table 2.Processing Cost:Cost of CPU time needed to execute queries, updates, and transactions. INSERT → when a new record is added
might be split into "Customers from Europe" and "Customers from Asia" fragments. Depends on how data is fragmented, where it is placed, and how much computation UPDATE → when an existing record is changed
2.Vertical Fragmentation:Splits a table by columns. happens at each site. DELETE → when a record is removed
Each fragment has a subset of columns. 3.Communication Cost:The big one in distributed systems.Cost of transferring data Example:Imagine you have a BankAccounts table.
Example: One fragment has "CustomerName" and "PhoneNumber", another has between sites over a network.Especially important for queries that need data from You can set a trigger to automatically log any transaction:
"CustomerID" and "BillingAddress". multiple fragments/sites. CREATE TRIGGER log_transaction
3.Hybrid (or Mixed) Fragmentation:A combination of horizontal and vertical Cost optimization = Smart data fragmentation + Smart data placement + Smart query AFTER UPDATE ON BankAccounts
fragmentation. execution. FOR EACH ROW
You first split by rows, then further split by columns. BEGIN
INSERT INTO TransactionsLog (AccountID, OldBalance, NewBalance, TransactionDate)
VALUES (OLD.AccountID, OLD.Balance, NEW.Balance, NOW());
END;

What are Variables in PL/SQL? What is Concurrency Control? What is Query Processing?
Variables are named storage locations in PL/SQL that temporarily hold data during the Concurrency control is all about managing multiple users (or transactions) who are trying When you write an SQL query (like SELECT * FROM Students), the database does a lot of
execution of a program. to access or modify the database at the same time, without messing things up. work behind the scenes to understand, optimize, and execute it efficiently.
Different Types of Variables Goal:Keep the database accurate and consistent Query Processing = Turning your high-level SQL query into something the database engine
Type Example Notes Prevent conflicts between operations happening at the same time. can execute quickly.
NUMBER, VARCHAR2, Why is it needed? Step What happens Key idea
Scalar variables Store single values Imagine two people trying to book the last ticket for a concert at the same time:
DATE Check if the query is
Without concurrency control, both could get a confirmation — even though only one Is the SQL valid? Are all tables
Composite Store collections of 1. Parsing correct (syntax and
RECORD, TABLE ticket exists! and columns correct?
semantics)
variables values
The system needs to control the order of operations so that one person's transaction Database converts SQL into its
Reference finishes before the other one starts. Translate SQL into an
CURSOR Point to a result set 2. Translation internal form (like relational
variables Example with Locking: internal query language
algebra)
User-defined types Custom OBJECTS Advanced structures Suppose two transactions want to update the same row: Find the best (fastest) way Database analyzes different
Basic syntax: 1.Transaction A locks the row. 3. Optimization
to run the query strategies and picks the best one
DECLARE 2.Transaction B tries to access but has to wait. 4. Actually run the query Execute the chosen plan and
variable_name datatype [DEFAULT value]; 3.Transaction A finishes and commits. Evaluation/Execution and fetch results return data to the user
BEGIN 4.Now Transaction B can proceed safely. Example: Your SQL:
-- program logic This prevents lost updates. SELECT Name FROM Students WHERE Age > 18;
END; parsing:→ Checks that Students and Age exist.
Example: Translation:→ Converts to "Scan Students table, filter Age > 18, project Name".
DECLARE Optimization:→ Notices there’s an index on Age, decides to use index scan instead of full
v_employee_name VARCHAR2(50); scan.
v_salary NUMBER(8,2) DEFAULT 5000; Execution:→ Actually reads the matching records using the index and returns the names.
BEGIN
-- use the variables here
END;

What is a Parallel Database? recovery and Transaction Management in Databases What are Constraints?
A Parallel Database is a system where multiple processors (CPUs or servers) work together In a database management system (DBMS), recovery management and transaction Constraints are rules that you apply to database tables to make sure the data stays correct
to process data at the same time, instead of one processor doing all the work. management are crucial to ensure the integrity, consistency, and availability of data, and meaningful.
Goal:Speed up query execution Key Concepts: They prevent invalid data from being stored in the database.
Handle larger data (Big Data) 1.Transactions: A transaction is a sequence of database operations (such as INSERT, Why use Constraints?
Improve scalability and reliability UPDATE, DELETE) that are executed as a single logical unit of work. The key properties of a To protect data integrity
Why use Parallel Databases? transaction are defined by the ACID properties. To enforce business rules automatically
Imagine you have to search through millions of records — if one CPU does it, it could take 2.Logs: To ensure atomicity and durability, DBMS uses transaction logs (also known as To avoid manual error checking in applications
hours. write-ahead logs) to record all changes made to the database. The logs allow the system Types of Constraints:
But if 10 CPUs search 10% each at the same time, you can finish 10 times faster! to redo or undo transactions in case of failure. Constraint Type
More processors = faster query response = better performance 3.Checkpointing: A checkpoint is a process in which the DBMS saves the current state of NOT NULL:Makes sure a column cannot be empty
How Parallel Databases Work: the database and transaction logs to persistent storage. This minimizes the amount of UNIQUE:Makes sure all values in a column are different
The data is split across multiple disks/nodes (this is called partitioning). work needed for recovery after a failure. PRIMARY KEY:Uniquely identifies each row + NOT NULL
A query is divided into tasks. FOREIGN KEY:Links rows between tables
Tasks are assigned to different processors to execute in parallel. CHECK Makes sure values satisfy a specific condition
The results from all processors are combined into the final answer. DEFAULT:Assigns a default value if none is provided

explain pl/sql structure Query Optimization is the process of improving the performance of a database query, What is IF-THEN in PL/SQL?
DECLARE -- Declaration section (optional) making it execute faster and use fewer resources (like CPU, memory, and disk I/O). It IF-THEN is used to make decisions in a PL/SQL program.
BEGIN -- Executable section (mandatory) involves selecting the most efficient execution plan for a given SQL query. It checks a condition, and if the condition is true, it executes some code.
EXCEPTION -- Exception handling section (optional) steps in Query Optimization: Just like:
END; 1.Parsing the Query:The database parses the SQL query to check for syntax and "If it’s raining, then take an umbrella!"
DECLARE ex:- semantics. Basic Syntax:
v_emp_name VARCHAR2(50); 2.Logical Plan Generation:The query tree is translated into a logical execution plan that IF condition THEN
v_salary NUMBER(8,2); outlines how the query will be executed. -- statements to execute if condition is true
BEGIN 3.Choosing the Best Execution Plan:The database considers multiple ways to execute the END IF;
-- Fetch employee name and salary from database query and selects the most efficient plan. DECLARE
SELECT name, salary INTO v_emp_name, v_salary 4.Physical Plan Generation:After selecting a logical plan, the database creates a physical v_salary NUMBER := 3000;
FROM employees execution plan, which specifies the actual operations and indexes used. BEGIN
WHERE emp_id = 101; 5.Execution:The chosen physical plan is executed by the database, and the results are IF v_salary < 5000 THEN
-- Display employee info returned.ex:- DBMS_OUTPUT.PUT_LINE('Salary is less than 5000');
DBMS_OUTPUT.PUT_LINE('Employee: ' || v_emp_name || ' Salary: ' || v_salary); SELECT Name, Salary END IF;
EXCEPTION FROM Employees END;
WHEN NO_DATA_FOUND THEN WHERE Department = 'HR' AND Salary > 50000; Here, since v_salary is 3000, the message will be printed.
DBMS_OUTPUT.PUT_LINE('Employee not found.');
WHEN OTHERS THEN
DBMS_OUTPUT.PUT_LINE('Error occurred: ' || SQLERRM);
END;

Distributed Transaction Management Standard Architecture for Parallel Databases write pl/sql program to add three numbers
In a distributed database system, data is stored across multiple sites or servers that are A parallel database system is designed to handle large-scale data processing by utilizing DECLARE
geographically distributed. Distributed transaction management refers to the process of multiple processors and storage devices to divide the work and increase efficiency. -- Declare variables for the three numbers and the result
ensuring that transactions in a distributed system are executed correctly and consistently, Parallelism can be introduced at different levels of a database system, such as at the num1 NUMBER := 10; -- First number
Key Components: query execution level, data storage, and data retrieval level. num2 NUMBER := 20; -- Second number
1.Distributed Transactions: A distributed transaction is a transaction that accesses and Parallel database architectures are generally categorized into three main types based on num3 NUMBER := 30; -- Third number
modifies data on multiple databases or systems that are located in different physical how parallelism is implemented: sum NUMBER; -- Variable to store the sum
locations. These databases are part of a distributed database system, which may use 1. Shared Memory Architecture (SMP) BEGIN
different platforms or technologies. In Shared Memory Systems, multiple processors share a single memory space. All -- Add the three numbers
2.Transaction Coordinator: A transaction coordinator is responsible for managing the processors have direct access to the same memory, so they can efficiently communicate sum := num1 + num2 + num3;
distributed transaction. It ensures that the transaction is correctly executed across all and share data.ex-Oracle Real Application Clusters (RAC), -- Output the result
participating systems and databases, and it also ensures the ACID properties (Atomicity, 2. Shared Disk Architecture:In Shared Disk Systems, each processor has its own private DBMS_OUTPUT.PUT_LINE('The sum of ' || num1 || ', ' || num2 || ', and ' || num3 || '
Consistency, Isolation, Durability). memory but they all share a common disk storage. In other words, each processor has is: ' || sum);
3.Participants: Each individual database or system that is involved in the transaction is access to all the disks, but each has its own memory.ex-IBM DB2 END;
referred to as a participant. A participant is responsible for performing its part of the 1.Shared Nothing Architecture: Out:-The sum of 10, 20, and 30 is: 60.
transaction, either committing or rolling back its changes based on the overall In Shared Nothing Systems, each processor has its own private memory and private disk
transaction’s status. storage.ex-google big table , amazon dynamo db

Operation Description Example Principle Meaning Example


Retrieve data from one or Hide the complex details of how You see a "Student" table, not
SELECT SELECT * FROM employees;
more tables Data Abstraction data is stored from users; show the underlying files and
Insert new data into a INSERT INTO employees VALUES(1, 'John', only necessary parts indexing
ACID Properties in Database Management Systems INSERT
table 'Doe'); Changes in storage structure
ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability. These Adding a new index doesn’t
Modify existing data in a UPDATE employees SET salary = 5000 WHERE Data Independence don't affect how you access the
are the key properties that ensure the reliability and correctness of database transactions. UPDATE break existing queries
table employee_id = 1; data
ACID Summary
Property Description Example DELETE FROM employees WHERE employee_id Use indexing, query
DELETE Remove data from a table Searching a name using an
= 1; Efficient Data optimization, and storage
A bank transfer that either fully index instead of scanning the
The transaction is an all-or- Access techniques to make accessing
deducts money from one account and Create a new table, view, whole table
Atomicity nothing unit of work. Either all CREATE CREATE TABLE employees (...); data fast
deposits it into another, or it doesn't or index
operations succeed, or none do. Ensure that data is accurate, Enforcing "Age cannot be
happen at all. Modify an existing table Data Integrity and
ALTER ALTER TABLE employees ADD age INT; valid, and protected from negative" rule and password-
Ensures the database moves A database rule prevents negative (e.g., add/drop columns) Security
unauthorized access protecting access
Consistency from one valid state to another, balances. A withdrawal that would Delete a table, view, or
DROP DROP TABLE employees; Ensure that database changes are If money is deducted from A
adhering to all rules. result in a negative balance is rejected. other database object Transaction
reliable — either fully complete and not credited to B, the
Transactions do not interfere A transaction reading an account Combine data from SELECT * FROM employees JOIN departments Management
JOIN or not at all (atomicity) whole transfer is canceled
with each other. Intermediate balance will not see the partial effects multiple tables ON ...
Isolation Allow multiple users to work at Two users updating their
results are invisible to other of another transaction that is still in Group rows for aggregate SELECT department, COUNT(*) FROM Concurrency
transactions. progress. GROUP BY the same time without profiles at the same time
operations employees GROUP BY department; Control
corrupting data without issues
After a transaction that updates an SELECT department, COUNT(*) FROM
Once a transaction is committed, Filter results after GROUP Protect data against loss due to
account balance, the change is HAVING employees GROUP BY department HAVING Backup and If a system crash happens, you
Durability its results are permanent, even in BY failure; allow restoring to a good
preserved even if the system crashes COUNT(*) > 5; Recovery can recover yesterday’s backup
the event of a system failure. state
immediately afterwa
Representing "Students" and
Data Modeling and Represent real-world entities and
"Courses" with foreign key
Relationships their relationships logically
relationships

Basic operation sql

Principle of dbms

You might also like