When data size increases you will realize your database queries responds slowly. Lets talk how databases handles such situations using B-Trees.
Disk Drive:
Each data access call needs to read/write to Disk Drive.
Hard Disk Drive has circular disks divided in Tracks & Sectors. An area marked between a Track & a Sector is a Block. A Block is of usually 512 bytes and contains data in from of bits and is read via a Disk Arm .
To access any data you would need Block Address(based on Track & Sector) & Offset in the Block to identify the data/record address.
Database:
Enough about Disk Drives, lets talk about Database. Consider a database table Employee having columns EmpId, Name, Address. Each column taking 40 bytes, so each record totaling 120 bytes. With 1000 such records we would need 120x1000 = 120 000 Bytes = 120 KB space that would need 120,000/512 = 234 blocks on Hard Disk. A database query will require to search through all 234 Blocks to get the required record.
Indexing:
To speed up the record search lets bring in Indexing, for that we need a index table IndexTable1 that points to Record Address. Each index will then have a EmpId(40 Byte) and a Record Address Pointer(10Byte) totaling 50 Byte. 50Byte *1000Records = 50 000 Bytes which will need only 50000/512 = 100 Blocks thus reducing search time in database with a trade of extra space. Each block can store 10 Records.
Multi Level Indexing:
We need database to respond in no time, current approach still takes much time, lets bring in multi level indexing . Lets bring another index table IndexTable2 that keeps track of Disk block that keeps that index record. It tells that EmpId 1 to 10 are stored in Block 1 and so on. This reduces time of access by a lot of margin. Similarly more Index Tables can be brought in to further speed up the data access.
On further analysis you will identify indexes are just rotated trees.
M-Way Search Trees:
M-way Search Trees are just like Binary Search Trees with a difference that it can have more than 2 children. An M-way search tree will have utmost M children and M-1 keys. The keys are in sorted order such that k1 < k2 < k3<…<km-1 .
Here in root node ,keys are in sorted order and like BST all values less than 10 falls in its left children & values greater than 10 & less than 20( 10 < child node < 20 ) lies in child node between them , this goes on for all the nodes.
For our index tree will look something like this including one Record Pointer through each key.
2–3 tree is a M-way Search Tree with M=3 .
Though the M-way Search Tree looks fast but it has no rule for filling keys or filling child nodes first. This could degrade performance wildly.
B-Tree:
B-Tree is a M-way Search Tree & resolves issues faced with M-way Search Tree with certain rules. Rules are:
- M/2 keys must be added to a node before creating its child nodes
- Root Node must have at-least 2 children
- Leaf nodes are at the same level
- Bottom up creation of tree
Now try adding few records to our B-Tree with M=4 to see the rules in action.
[10, 20, 40, 50, 60 , 3, 70, 80, 15, 5, 12]
- Step 1: Create a root node & add key 10 to it
- Step 2: Since root node has available space lets add 20 to root Node in sorted order
- Step 3: Root Node has available space for 1 more key lets add 40 to it. Root Node is full now.
- Step 4: Since root is full lets break down root node into [10,20] & [50]. Sending mid element 40 up making it the root node.
- Step 5: Lookup where 60 will be placed, since 60 is greater than 50 & its node has available space lets add 60 besides 50
- Step 6: Lookup 3, since 3 is less than 10, it is placed before 10
- Step 7: Lookup 70, since 70 is greater than 60, placing it next to 60
- Step 8: Lookup 80, 80 is greater than 80, but 70’s node is full with [50,60,70] & 80 . Lets split into [50,60] & [80] & sending mid element 70 up in Root Node.
- Step 9: Lookup 15, 15 falls with [3,10,20]. Lets sort it to 3, 10,15,20 . Split into two [3,10] & [20]. Sending 15 mid element up to root node.
- Step 10: Lookup 5, 5 falls in [3.10] Node & added to it.
- Step 11; Lookup 12, it falls in [3,5,10] Node. Split it into [3,5]& [10,12]& sending mid element 10 up. Root node is full hence split into [10,15] & [70] & sending mid element 40 up creating it th root node.
B+Tree
This is same as B-Tree only difference is :
- Leaf nodes has records/data
- All keys are present in leaf node, duplicates may exist
- All leafs are connected as a LinkedList
BTree & B+Tree are widely used in optimizing database access optimization. CloudDb, Postgres, MySQL, SQLite & many.
References: