Bloom Filter Algorithm
A Bloom Filter is a probabilistic data structure that enables efficient membership testing. It is used when space is limited, and false positives are acceptable, but false negatives are not.
Inputsβ
- Size of Bit Array (
m): Number of bits in the array. - Number of Hash Functions (
k): Number of hash functions to use. - Bit Array (
bit_array): An array of sizem, initialized to 0. - Hash Functions: A set of
kindependent hash functions, each generating an index within the range[0, m-1].
Bloom Filter Operationsβ
1. Insertion (add(item))β
This operation inserts an item by marking specific positions in the bit array based on hashed values.
Steps:
- For each of the
khash functions:- Hash the
itemto get an index in the bit array. - Set the bit at this index in
bit_arrayto1.
- Hash the
Pseudocode:
function add(item):
for i in 0 to k-1:
index = hash_function[i](item) % m
bit_array[index] = 1
2. Membership Check (contains(item))β
This operation checks the itemβs membership by verifying specific positions in the bit array. If all relevant bits are set to 1, the item is likely in the set. If any bit is 0, the item is definitely not in the set.
Steps:
- For each of the
khash functions:- Hash the
itemto get an index in the bit array. - Check if the bit at this index is
1.
- Hash the
- If all checked bits are
1, return True (item is likely in the set). - If any bit is
0, return False (item is definitely not in the set).
Pseudocode:
function contains(item):
for i in 0 to k-1:
index = hash_function[i](item) % m
if bit_array[index] == 0:
return False
return True
Example Workflowβ
- Initialize a Bloom Filter with
m = 10bits andk = 3hash functions. - Insert an item,
"apple":- Hash
"apple"with each of the 3 hash functions. - Compute bit array indices for
"apple"and set those indices to1inbit_array.
- Hash
- Check Membership of
"apple":- Use the hash functions to compute indices for
"apple"again. - If all the indices are set to
1, then"apple"is likely in the set. If any index is0,"apple"is definitely not in the set.
- Use the hash functions to compute indices for