B-Tree and B+Tree


B-tree and B+tree are the most commonly used data structure for building index in database/file system. The reason is it has a higher branching factors over balanced BST(like red black tree) thus it has a lower height. The reason is nodes need to be accessed for BST and B-Tree(B+tree) are both logm(n) where m is the number of branches on each node. lower height means fewer access to the disk to retrieve the nodes, which is what matters for database performance. Database usually stores huge amounts of data and the index can be too large to fit into memory as well and instead stored on the disk. Access to the disk is much slower than memory access. Although higher branching factor means more time in finding the key in the node itself performed in memory, it’s almost ignorable comparing to the time in disk I/O.

Usuallly the database created a node with a size exactly equal to the size of one page so it can be read using one read I/O.
B+TREE’s advantages over B tree:

1. since its internal nodes don’t store data but only store keys, it can have more keys than B-tree within same size of space (one page normally). It means higher branching factor and lower height.
2. Faster full-scan of data using the linked list of all data items at the bottom: fewer cache misses comparing a full-tree traversal of B-tree.

Advantages of B-tree over B+tree: data is closer to the root. thus if some node is much more frequently accessed and we can gain some performance if it’s closer to the root.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s