Bloom Filter Algorithm
A Bloom Filter is a probabilistic data structure that enables efficient membership testing. It is used when space is limited, and false positives are acceptable, but false negatives are not.
Inputsβ
- Size of Bit Array (
m
): Number of bits in the array. - Number of Hash Functions (
k
): Number of hash functions to use. - Bit Array (
bit_array
): An array of sizem
, initialized to 0. - Hash Functions: A set of
k
independent hash functions, each generating an index within the range[0, m-1]
.
Bloom Filter Operationsβ
1. Insertion (add(item)
)β
This operation inserts an item by marking specific positions in the bit array based on hashed values.
Steps:
- For each of the
k
hash functions:- Hash the
item
to get an index in the bit array. - Set the bit at this index in
bit_array
to1
.
- Hash the
Pseudocode:
function add(item):
for i in 0 to k-1:
index = hash_function[i](item) % m
bit_array[index] = 1
2. Membership Check (contains(item)
)β
This operation checks the itemβs membership by verifying specific positions in the bit array. If all relevant bits are set to 1
, the item is likely in the set. If any bit is 0
, the item is definitely not in the set.
Steps:
- For each of the
k
hash functions:- Hash the
item
to get an index in the bit array. - Check if the bit at this index is
1
.
- Hash the
- If all checked bits are
1
, return True (item is likely in the set). - If any bit is
0
, return False (item is definitely not in the set).
Pseudocode:
function contains(item):
for i in 0 to k-1:
index = hash_function[i](item) % m
if bit_array[index] == 0:
return False
return True
Example Workflowβ
- Initialize a Bloom Filter with
m = 10
bits andk = 3
hash functions. - Insert an item,
"apple"
:- Hash
"apple"
with each of the 3 hash functions. - Compute bit array indices for
"apple"
and set those indices to1
inbit_array
.
- Hash
- Check Membership of
"apple"
:- Use the hash functions to compute indices for
"apple"
again. - If all the indices are set to
1
, then"apple"
is likely in the set. If any index is0
,"apple"
is definitely not in the set.
- Use the hash functions to compute indices for