Insert Operation in B-Tree

In the previous post, we introduced B-Trees and discussed the search() and traverse() functions. In this post, we will discuss the insert() operation. A new key is always inserted at the leaf node. Let the key to be inserted be k. Like in a Binary Search Tree (BST), we start from the root and traverse down until we reach a leaf node. Once we reach a leaf node, we insert the key there. Unlike BSTs, we have a predefined range on the number of keys that a node can contain. So before inserting a key into the node, we must ensure that there is extra space.

How do we ensure that a node has space available for a key before it is inserted? We use an operation called splitChild(), which is used to split a child of a node. See the following diagram to understand the split. In this diagram, child y of x is being split into two nodes y and z. Note that the splitChild operation moves a key up, which is the reason B-Trees grow up, unlike BSTs, which grow down.

BTreeSplit

As discussed, to insert a new key, we go down from the root to the leaf. Before traversing down to a node, we first check if the node is full. If the node is full, we split it to create space. The complete algorithm is as follows:

Insertion Algorithm

Initialize x as root.
While x is not a leaf, do the following:
- a) Find the child of x that will be traversed next. Let the child be y.
- b) If y is not full, change x to point to y.
- c) If y is full, split it and change x to point to one of the two parts of y. If k is smaller than the middle key in y, set x as the first part of y. Otherwise, set it as the second part of y. When we split y, we move a key from y to its parent x.
The loop in step 2 stops when x is a leaf. x must have space for 1 extra key, as we have been splitting all nodes in advance. So, simply insert k into x.

Note that the algorithm follows the Cormen book. This is a proactive insertion algorithm; before going down to a node, we split it if it is full. The advantage of this preemptive splitting is that we never traverse a node twice. If we were to only split a node after a new key is inserted (a reactive approach), we might end up traversing all nodes from the leaf to the root again, especially when all nodes on the path from the root to the leaf are full. This may cause a cascading effect where splitting occurs up to the root.

However, there is a disadvantage to this proactive insertion: we may perform unnecessary splits.

Example Insertion Sequence

Let us understand the algorithm with an example B-Tree of minimum degree t as 3 and a sequence of integers: 10, 20, 30, 40, 50, 60, 70, 80, and 90 in an initially empty B-Tree. Initially, the root is NULL. Let us first insert 10.

Btree1

Next, let us insert 20, 30, 40, and 50. They will all be inserted in the root because the maximum number of keys a node can accommodate is (2t - 1), which is 5.

BTree2Ins

Now, let us insert 60. Since the root node is full, it will first split into two, and then 60 will be inserted into the appropriate child.

BTreeIns3

Let us now insert 70 and 80. These new keys will be inserted into the appropriate leaf without any splits.

BTreeIns4

Finally, let us insert 90. This insertion will cause a split, with the middle key moving up to the parent.

BTreeIns6

Following are the implementations of the above proactive algorithm.

C++

#include<iostream>
using namespace std;

// A BTree node
class BTreeNode
{
    int *keys;  // An array of keys
    int t;      // Minimum degree (defines the range for number of keys)
    BTreeNode **C; // An array of child pointers
    int n;     // Current number of keys
    bool leaf; // Is true when node is leaf. Otherwise false
public:
    BTreeNode(int _t, bool _leaf);   // Constructor

    // A utility function to insert a new key in the subtree rooted with
    // this node. The assumption is, the node must be non-full when this
    // function is called
    void insertNonFull(int k);

    // A utility function to split the child y of this node. i is index of y in
    // child array C[].  The Child y must be full when this function is called
    void splitChild(int i, BTreeNode *y);

    // A function to traverse all nodes in a subtree rooted with this node
    void traverse();

    // A function to search a key in the subtree rooted with this node.
    BTreeNode *search(int k);   // returns NULL if k is not present.

// Make BTree friend of this so that we can access private members of this
// class in BTree functions
friend class BTree;
};

// A BTree
class BTree
{
    BTreeNode *root; // Pointer to root node
    int t;  // Minimum degree
public:
    // Constructor (Initializes tree as empty)
    BTree(int _t)
    {  root = NULL;  t = _t; }

    // function to traverse the tree
    void traverse()
    {  if (root != NULL) root->traverse(); }

    // function to search a key in this tree
    BTreeNode* search(int k)
    {  return (root == NULL)? NULL : root->search(k); }

    // The main function that inserts a new key in this B-Tree
    void insert(int k);
};

// Constructor for BTreeNode class
BTreeNode::BTreeNode(int t1, bool leaf1)
{
    // Copy the given minimum degree and leaf property
    t = t1;
    leaf = leaf1;

    // Allocate memory for maximum number of possible keys
    // and child pointers
    keys = new int[2*t-1];
    C = new BTreeNode *[2*t];

    // Initialize the number of keys as 0
    n = 0;
}

// Function to traverse all nodes in a subtree rooted with this node
void BTreeNode::traverse()
{
    // There are n keys and n+1 children, traverse through n keys
    // and first n children
    int i;
    for (i = 0; i < n; i++)
    {
        // If this is not leaf, then before printing key[i],
        // traverse the subtree rooted with child C[i].
        if (leaf == false)
            C[i]->traverse();
        cout << " " << keys[i];
    }

    // Print the subtree rooted with last child
    if (leaf == false)
        C[i]->traverse();
}

// Function to search key k in subtree rooted with this node
BTreeNode *BTreeNode::search(int k)
{
    // Find the first key greater than or equal to k
    int i = 0;
    while (i < n && k > keys[i])
        i++;

    // If the found key is equal to k, return this node
    if (keys[i] == k)
        return this;

    // If key is not found here and this is a leaf node
    if (leaf == true)
        return NULL;

    // Go to the appropriate child
    return C[i]->search(k);
}

// The main function that inserts a new key in this B-Tree
void BTree::insert(int k)
{
    // If tree is empty
    if (root == NULL)
    {
        // Allocate memory for root
        root = new BTreeNode(t, true);
        root->keys[0] = k;  // Insert key
        root->n = 1;  // Update number of keys in root
    }
    else // If tree is not empty
    {
        // If root is full, then tree grows in height
        if (root->n == 2*t-1)
        {
            // Allocate memory for new root
            BTreeNode *s = new BTreeNode(t, false);

            // Make old root as child of new root
            s->C[0] = root;

            // Split the old root and move 1 key to the new root
            s->splitChild(0, root);

            // New root has two children now.  Decide which of the
            // two children is going to have new key
            int i = 0;
            if (s->keys[0] < k)
                i++;
            s->C[i]->insertNonFull(k);

            // Change root
            root = s;
        }
        else  // If root is not full, call insertNonFull for root
            root->insertNonFull(k);
    }
}

// A utility function to insert a new key in this node
// The assumption is, the node must be non-full when this
// function is called
void BTreeNode::insertNonFull(int k)
{
    // Initialize index as index of rightmost element
    int i = n-1;

    // If this is a leaf node
    if (leaf == true)
    {
        // The following loop does two things
        // a) Finds the location of new key to be inserted
        // b) Moves all greater keys to one place ahead
        while (i >= 0 && keys[i] > k)
        {
            keys[i+1] = keys[i];
            i--;
        }

        // Insert the new key at found location
        keys[i+1] = k;
        n = n+1;
    }
    else // If this node is not leaf
    {
        // Find the child which is going to have the new key
        while (i >= 0 && keys[i] > k)
            i--;

        // See if the found child is full
        if (C[i+1]->n == 2*t-1)
        {
            // If the child is full, then split it
            splitChild(i+1, C[i+1]);

            // After split, the middle key of C[i] goes up and
            // C[i] is splitted into two.  See which of the two
            // is going to have the new key
            if (keys[i+1] < k)
                i++;
        }
        C[i+1]->insertNonFull(k);
    }
}

// A utility function to split the child y of this node
// Note that y must be full when this function is called
void BTreeNode::splitChild(int i, BTreeNode *y)
{
    // Create a new node which is going to store (t-1) keys
    // of y
    BTreeNode *z = new BTreeNode(y->t, y->leaf);
    z->n = t - 1;

    // Copy the last (t-1) keys of y to z
    for (int j = 0; j < t-1; j++)
        z->keys[j] = y->keys[j+t];

    // Copy the last t children of y to z
    if (y->leaf == false)
    {
        for (int j = 0; j < t; j++)
            z->C[j] = y->C[j+t];
    }

    // Reduce the number of keys in y
    y->n = t - 1;

    // Since this node is going to have a new child,
    // create space of new child
    for (int j = n; j >= i+1; j--)
        C[j+1] = C[j];

    // Link the new child to this node
    C[i+1] = z;

    // A key of y will move to this node. Find the location of
    // new key and move all greater keys one space ahead
    for (int j = n-1; j >= i; j--)
        keys[j+1] = keys[j];

    // Copy the middle key of y to this node
    keys[i] = y->keys[t-1];

    // Increment count of keys in this node
    n = n + 1;
}

// Driver program to test above functions
int main()
{
    BTree t(3); // A B-Tree with minimum degree 3
    t.insert(10);
    t.insert(20);
    t.insert(5);
    t.insert(6);
    t.insert(12);
    t.insert(30);
    t.insert(7);
    t.insert(17);

    cout << "Traversal of the constructed tree is ";
    t.traverse();

    int k = 6;
    (t.search(k) != NULL)? cout << "\nPresent" : cout << "\nNot Present";

    k = 15;
    (t.search(k) != NULL)? cout << "\nPresent" : cout << "\nNot Present";

    return 0;
}

JAVA

import java.util.Arrays;

class BTreeNode {
    int[] keys; // An array of keys
    int t;      // Minimum degree
    BTreeNode[] C; // An array of child pointers
    int n;     // Current number of keys
    boolean leaf; // True if leaf node, otherwise false

    // Constructor
    BTreeNode(int t, boolean leaf) {
        this.t = t;
        this.leaf = leaf;
        this.keys = new int[2 * t - 1];
        this.C = new BTreeNode[2 * t];
        this.n = 0;
    }

    // Function to traverse all nodes in a subtree rooted with this node
    public void traverse() {
        int i;
        for (i = 0; i < n; i++) {
            if (!leaf) {
                C[i].traverse();
            }
            System.out.print(keys[i] + " ");
        }
        if (!leaf) {
            C[i].traverse();
        }
    }

    // Function to search a key in subtree rooted with this node
    public BTreeNode search(int k) {
        int i = 0;
        while (i < n && k > keys[i]) {
            i++;
        }

        if (i < n && keys[i] == k) {
            return this;
        }

        if (leaf) {
            return null;
        }
        return C[i].search(k);
    }

    // Insert a new key in this node
    public void insertNonFull(int k) {
        int i = n - 1;

        if (leaf) {
            while (i >= 0 && keys[i] > k) {
                keys[i + 1] = keys[i];
                i--;
            }
            keys[i + 1] = k;
            n++;
        } else {
            while (i >= 0 && keys[i] > k) {
                i--;
            }
            if (C[i + 1].n == 2 * t - 1) {
                splitChild(i + 1, C[i + 1]);

                if (keys[i + 1] < k) {
                    i++;
                }
            }
            C[i + 1].insertNonFull(k);
        }
    }

    // Split the child y of this node
    public void splitChild(int i, BTreeNode y) {
        BTreeNode z = new BTreeNode(y.t, y.leaf);
        z.n = t - 1;

        for (int j = 0; j < t - 1; j++) {
            z.keys[j] = y.keys[j + t];
        }

        if (!y.leaf) {
            for (int j = 0; j < t; j++) {
                z.C[j] = y.C[j + t];
            }
        }

        y.n = t - 1;

        for (int j = n; j >= i + 1; j--) {
            C[j + 1] = C[j];
        }
        C[i + 1] = z;

        for (int j = n - 1; j >= i; j--) {
            keys[j + 1] = keys[j];
        }
        keys[i] = y.keys[t - 1];
        n++;
    }
}

class BTree {
    BTreeNode root;
    int t; // Minimum degree

    // Constructor
    public BTree(int t) {
        this.root = null;
        this.t = t;
    }

    // Function to traverse the tree
    public void traverse() {
        if (root != null) {
            root.traverse();
        }
    }

    // Function to search a key in this tree
    public BTreeNode search(int k) {
        return (root == null) ? null : root.search(k);
    }

    // Insert a new key in this B-Tree
    public void insert(int k) {
        if (root == null) {
            root = new BTreeNode(t, true);
            root.keys[0] = k;
            root.n = 1;
        } else {
            if (root.n == 2 * t - 1) {
                BTreeNode s = new BTreeNode(t, false);
                s.C[0] = root;
                s.splitChild(0, root);

                int i = 0;
                if (s.keys[0] < k) {
                    i++;
                }
                s.C[i].insertNonFull(k);
                root = s;
            } else {
                root.insertNonFull(k);
            }
        }
    }
}

// Driver program to test above functions
public class Main {
    public static void main(String[] args) {
        BTree t = new BTree(3); // A B-Tree with minimum degree 3
        t.insert(10);
        t.insert(20);
        t.insert(5);
        t.insert(6);
        t.insert(12);
        t.insert(30);
        t.insert(7);
        t.insert(17);

        System.out.print("Traversal of the constructed tree is: ");
        t.traverse();

        int k = 6;
        System.out.println("\nSearch for key " + k + ": " + (t.search(k) != null ? "Present" : "Not Present"));

        k = 15;
        System.out.println("Search for key " + k + ": " + (t.search(k) != null ? "Present" : "Not Present"));
    }
}

Output:
Traversal of the constructed tree is: 5 6 7 10 12 17 20 30
Present
Not Present

The B-tree is a data structure similar to a binary search tree but allows multiple keys per node and has a higher fanout. This enables the B-tree to store a large amount of data efficiently, making it commonly used in databases and file systems.

Characteristics of B-Tree

Balanced Tree: All paths from the root to a leaf have the same length.
Minimum Degree t:
- Each non-root node must contain at least (t - 1) keys and can contain at most (2t - 1) keys.
- The root can have at least one key and at most (2t - 1) keys.
- Each node can have at most (2t) children.

Supported Operations

Search(k): Searches for a key k in the tree.
Insert(k): Inserts a key k into the tree.
Delete(k): Deletes a key k from the tree.

Search Operation

The search operation is similar to that of a binary search tree, where you traverse down the tree based on the comparison of the key.