DB Cheat Sheet Till Mid
DB Cheat Sheet Till Mid
B+ Trees: Is a regular tree, with root nodes, non-leaf nodes, and nodes.Saved partially on disk, except root node Pointers in leaves are
pointing to record in the table Self-balancing using insertion and deletion.B+ Trees Rules:All leaves are at the same level. if n keys in a
node, then n+1 pointers.Each node has a minimum number of keys and pointers.
B+tree Insertion
Case 1: Available space in node. 4. Which key? Smallest value of right 5. Add a key in the above level and
1. Just add it. subtree point to the new node.
Case 2: Leaf is full. Case 3: Non-leaf is full 6. Which key? smallest value of right
1. Create a new node. 1. Create a new node. subtree of added key
2. Move to new node whatever you 2. Point to leaves. Case 4: New root
should to enable insertion 3. Add key in new node. 1. Create a new root node.
3. Add a key in the above level and 4. Smallest value of right subtree. 2. Point at both old roots.
point to the new node. 3. Add key Smallest value of right
subtree
B+Tree Deletion
Case 1: No underflow if deleted Case 3: Merge with sibling 3. Merge with a sibling
1. Just remove it 1. Remove key we want to delete from Case 4:Remove a level(Done when we have
Case 2: Redistribute keys leaf incorrect pointer number)
1. Borrow biggest number from sibling 2. Remove pointer and delete key in 1. Move above node keys to below
2. Change key in above level above level node
TUT3
Static Hashing Index: Is an efficient index for searching in O(1). Passes key to hash function h(), stores page number and row in output cell
.Pages are referred to as buckets.In insertion If bucket is full,we create an Overflow page. If we exceed the limit, we double the hash size and
rehash every key. (Reorganization)
Extendible Hashing Index:Divided into a Directory and Index Buckets Converts the output of the hash function to binary Controlled by
two variables, i and j. Directory uses i and Index Buckets uses j. i: Number of bits to read from left. j: Number of Common bits from the left
Extendible Hashing Index Insertion
Choose ith bits from the left, follow arrow in the directory
Case 1: Space available in bucket 2. Insert duplicate key 3. Insert key and group similar keys
1. Insert key Case 3: 4. if j > i
Case 2: key is a duplicate and there is no 1. Create new bucket a. increase i by 1
space, 2. Increase j in old and new bucket by b. restructure directory arrows
1. Create overflow page 1
Linear Hashing Index: Converts output of hashing function to binary. Buckets grow linearly, by 1 only. Controlled by two variables i and
m.i:Number of bits to read from the right.m: Last used block
Linear Hashing Index Insertion
Choose ith bits from right, call them b
if bucket b exists, go to bucket number b else, go to bucket b - 2^(i-1) → Indexing starts from 0 and if index = -ve then loop around
Case 1: Space available in bucket Case 3:Calculate Utilization, U After 2. If all buckets are used (cannot
Insert key insertion, if U > 0.8 represent m with i bits)
Case 2: No Space Available 1. Increase m by 1 3. Increase i by 1
Create overflow page 4. Check bucket number m - 2^(i-1)
5. Move any key to its correct bucket
TUT4
Multidimensional Indices: Used to build index on multiple columns.Thus, helping in multi key queries.Must also be efficient in Partial
Queries: queries that don't use all of indexed columns.Partition Hash Index: Used to build index on multiple columns.Hashed both keys
from different columns.Concatenates result to get final hash value.Pros:O(1) access plus K checks inside cell.Can answer partial
queries.Cons:No nearest neighbor queries.Grid Index:N-dimensional array or grid.Each dimension being range of values of an indexed
column.Access like an array.Cons:Space and management overhead. Need to find correct, evenly distributed ranges.Kd tree Index:Binary
trees with alternating levels for each indexed column. Kd tree index inserting:If inserting and page is full:Create new page.Create new level
with next alternate column. Choose the middle value to put in the node. Bitmap Index: Create a bitmap for each indexed column.Bitmap: is
collection of vectors of bits.Such vectors will mark occurrences of a specific value by 1.We can use Bitwise operations like AND &
OR.Pros:Bitwise operations are very fast. Can answer many types of queries (No Nearest Neighbour).Cons:Length of Vector could get very
big (= size of column).Could be compressed though!
TUT5
T(R): Tuples in R, S(R) : bytes in each R Tuple, B(R) : blocks(pages) to hold all R Tuples,V(R,A) distinct values in R for attribute A
Takeaways
TA-1: Index:a data structure built on one or efficient only in two scenarios: analytical TA-20:Bitmap index supports both single
more columns to enhance the performance of queries and queries involving narrow-range and multidimensional data. Its greatest
SQL queries on those columns. continuously search on columns with a huge range of advantage is the ability to use bitwise
loaded into memory Without an index, the values. operations. The disadvantage is the long
database engine would be doing linear search TA-10: B+tree index should be used when sequence of bits to be stored.
in the data O(n) to answer a query. Using an you want to enhance the performance of: TA-21:Translations of a SQL query:
index, the performance is typically enhanced SQL queries on exact value, SQL queries parsing, logical and physical.
to O(log n) or O(1). with a range search, and SQL queries Parsing is for syntax checking. Logical and
TA-2: Primary index:built on the primary containing aggregate functions. physical for optimization and execution
column which is the column typically used to TA-11: The clustering (i.e. sorting) column purposes.
sort the table on the hard disk does not have to be the key column. Any TA-22: For even a simple SQL, there are
TA-3: Dense index has at least 1 instance column could be used in clustering the data several alternative relational algebra trees –
from every value appearing in the column we in the pages of the table. all produce the same result set. One of them
are building the index on. TA-12: Spatial data:location and area will be optimal.
TA-4: Sparse index has a small percentage information using geometric data types TA-23: Three heuristics are applied to a
(typically 10%) of the values appearing in the (lines, polygons, etc..) relational algebra tree to try to reduce the
column we are building the index on. have a TA-13:R-tree: is a modified B+ tree for number of tuples propagated upward:
value for each page.Better in inserting or indexing spatial data.enables fast execution 1. push selections down,
deleting.Smaller so better chance to fit in of exact value (on polygons) and range 2. push projections down
memory.gets updated less often compared to queries, as well as proximity, adjacency and 3. perform the most restrictive joins
dense indexes. containment spatial queries. and selections first.
TA-5: Secondary index is an index built on TA-16:Partitioned hash mechanism TA-24: The database engine maintains
the non-primary column.First level must be enables the representations of several statistics about each relation and also its
dense.Must be sorted columns in the same hash table. Hence, columns to pick the relational algebra tree
TA-7: Tables and indices are paged. Page executing multi-term exact-value queries in with minimum estimated sum in internal
size is related to cylinder size. O(1).Is inefficient for partial exact-value nodes
TA-9: BRIN index: stores the minimum and queries run on the index.
maximum values in each table page. It is