Skip to main content

Disjoint Set Union (DSU) / Union-Find

Disjoint Set Union (DSU), also called Union-Find, is a powerful data structure that efficiently tracks a collection of elements partitioned into non-overlapping (disjoint) sets. It answers two questions in nearly O(1) time:

  1. Find: Which set does this element belong to?
  2. Union: Merge two sets together.

๐Ÿง  Real-World Analogyโ€‹

Think of students in different friend groups:

Initially everyone is their own group:
{Alice} {Bob} {Charlie} {David} {Eve}

After Alice and Bob become friends โ†’ Union(Alice, Bob):
{Alice, Bob} {Charlie} {David} {Eve}

After Charlie and David become friends โ†’ Union(Charlie, David):
{Alice, Bob} {Charlie, David} {Eve}

After Bob and Charlie become friends โ†’ Union(Bob, Charlie):
{Alice, Bob, Charlie, David} {Eve}

Are Alice and David in the same group? โ†’ Find(Alice) == Find(David)? โ†’ YES โœ…
Are Alice and Eve in the same group? โ†’ Find(Alice) == Find(Eve)? โ†’ NO โŒ

๐Ÿข Naive Implementation (Without Optimizations)โ€‹

Each element points to a parent. The root of the tree is the representative of the set.

Initial state: parent[i] = i (everyone is their own parent)
Elements: 0 1 2 3 4
parent: 0 1 2 3 4

Union(0,1): parent[1] = 0
0 1 2 3 4
|
1

Union(1,2): parent[2] = 1
0
|
1
|
2

Union(2,3): parent[3] = 2
0
|
1
|
2
|
3

โš ๏ธ Problem: Tree becomes a chain โ†’ Find() takes O(n) per call!


โšก Optimization 1: Union by Rankโ€‹

Idea: Always attach the shorter tree under the taller tree to keep the tree flat.

Union by Rank โ€” always attach smaller rank under larger rank:

rank[0]=0 rank[1]=0 rank[2]=0 rank[3]=0

Union(0,1): rank equal โ†’ make 0 root of 1, rank[0]++
0(rank=1)
|
1(rank=0)

Union(2,3): rank equal โ†’ make 2 root of 3, rank[2]++
2(rank=1)
|
3(rank=0)

Union(0,2): rank equal โ†’ make 0 root of 2, rank[0]++
0(rank=2)
/ \
1 2(rank=1)
|
3

Tree stays BALANCED โ†’ max height = O(log n)

โšก Optimization 2: Path Compressionโ€‹

Idea: When we call Find(x), make every node on the path point directly to the root. This flattens the tree for all future calls.

Before Path Compression:
Find(3) traverses: 3 โ†’ 2 โ†’ 0 (root)

0
|
2
|
3

After Path Compression (Find(3)):
All nodes on path now point directly to root 0:

0
/|\
1 2 3 โ† 3 now points directly to 0!

Next Find(3) = O(1) โœ…

๐Ÿ”ฅ Combined: Path Compression + Union by Rankโ€‹

This gives nearly O(1) amortized time per operation โ€” the most efficient possible.

Time Complexity: O(ฮฑ(n)) per operation
where ฮฑ = inverse Ackermann function โ‰ˆ constant (โ‰ค 5) for all practical inputs

๐Ÿ’ป Full Implementationโ€‹

class DSU:
def __init__(self, n):
self.parent = list(range(n)) # parent[i] = i initially
self.rank = [0] * n # rank[i] = 0 initially

def find(self, x):
# Path Compression: point directly to root
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]

def union(self, x, y):
root_x = self.find(x)
root_y = self.find(y)

if root_x == root_y:
return False # Already in same set

# Union by Rank: attach smaller rank under larger rank
if self.rank[root_x] < self.rank[root_y]:
self.parent[root_x] = root_y
elif self.rank[root_x] > self.rank[root_y]:
self.parent[root_y] = root_x
else:
self.parent[root_y] = root_x
self.rank[root_x] += 1

return True # Successfully merged

def connected(self, x, y):
return self.find(x) == self.find(y)


# Example Usage
dsu = DSU(5)
dsu.union(0, 1)
dsu.union(2, 3)
dsu.union(1, 2)

print(dsu.connected(0, 3)) # True โ†’ 0-1-2-3 all connected
print(dsu.connected(0, 4)) # False โ†’ 4 is isolated

๐Ÿ” Step-by-Step Dry Runโ€‹

n=6, Elements: 0 1 2 3 4 5
Initial: parent=[0,1,2,3,4,5] rank=[0,0,0,0,0,0]

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Union(0, 1):
find(0)=0, find(1)=1
rank equal โ†’ parent[1]=0, rank[0]=1
parent=[0,0,2,3,4,5] rank=[1,0,0,0,0,0]

0(rank=1)
|
1

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Union(2, 3):
find(2)=2, find(3)=3
rank equal โ†’ parent[3]=2, rank[2]=1
parent=[0,0,2,2,4,5] rank=[1,0,1,0,0,0]

0(rank=1) 2(rank=1)
| |
1 3

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Union(4, 5):
find(4)=4, find(5)=5
rank equal โ†’ parent[5]=4, rank[4]=1
parent=[0,0,2,2,4,4] rank=[1,0,1,0,1,0]

0(rank=1) 2(rank=1) 4(rank=1)
| | |
1 3 5

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Union(0, 2):
find(0)=0, find(2)=2
rank[0]=1 == rank[2]=1 โ†’ parent[2]=0, rank[0]=2
parent=[0,0,0,2,4,4] rank=[2,0,1,0,1,0]

0(rank=2)
/ \
1 2(rank=1)
|
3

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Union(0, 4):
find(0)=0, find(4)=4
rank[0]=2 > rank[4]=1 โ†’ parent[4]=0
parent=[0,0,0,2,0,4] rank=[2,0,1,0,1,0]

0(rank=2)
/ | \
1 2 4(rank=1)
| |
3 5

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Find(5) with Path Compression:
5 โ†’ parent[5]=4 โ†’ parent[4]=0 (root!)
Path compress: parent[5] = 0 directly!
parent=[0,0,0,2,0,0]

0(rank=2)
/ | \ \
1 2 4 5 โ† 5 now points directly to 0!
|
3

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
connected(3, 5)?
find(3): 3โ†’2โ†’0 (root), path compress โ†’ parent[3]=0
find(5): 5โ†’0 (root, already compressed)
find(3)==find(5) โ†’ 0==0 โ†’ TRUE โœ…

๐ŸŒ Classic Application: Cycle Detection in Graphโ€‹

Problem: Given edges [(0,1),(1,2),(2,0)], detect if there's a cycle.

DSU = {0,1,2} initially

Edge(0,1): find(0)=0, find(1)=1 โ†’ different โ†’ union โ†’ no cycle
Edge(1,2): find(1)=0, find(2)=2 โ†’ different โ†’ union โ†’ no cycle
Edge(2,0): find(2)=0, find(0)=0 โ†’ SAME ROOT โ†’ CYCLE DETECTED! ๐Ÿ”ด
def has_cycle(n, edges):
dsu = DSU(n)
for u, v in edges:
if not dsu.union(u, v):
return True # same component โ†’ cycle!
return False

print(has_cycle(3, [(0,1),(1,2),(2,0)])) # True
print(has_cycle(3, [(0,1),(1,2)])) # False

โšก Brute Force vs DSUโ€‹

Problem: Check if nodes A and B are connected after Q union operations

โŒ BRUTE FORCE (BFS/DFS each time):
Each query: O(V + E)
Q queries: O(Q ร— (V + E))
For Q=10โต, V=10โต โ†’ 10ยนโฐ operations โ†’ TLE โŒ

โœ… DSU with Path Compression + Union by Rank:
Each union/find: O(ฮฑ(n)) โ‰ˆ O(1)
Q queries: O(Q ร— ฮฑ(n)) โ‰ˆ O(Q)
For Q=10โต โ†’ 10โต operations โ†’ Fast โœ…

๐Ÿ“Š Complexity Summaryโ€‹

OperationNaive DSUUnion by Rank onlyPath Compression onlyBoth Optimizations
FindO(n)O(log n)O(log n) amortizedO(ฮฑ(n)) โ‰ˆ O(1)
UnionO(n)O(log n)O(log n) amortizedO(ฮฑ(n)) โ‰ˆ O(1)
SpaceO(n)O(n)O(n)O(n)

ฮฑ(n) = Inverse Ackermann function. For all practical values of n (up to 10^80), ฮฑ(n) โ‰ค 5.


โŒ Common Mistakesโ€‹

  1. Not using Path Compression โ€” Without it, Find() degrades to O(n) for skewed trees.
  2. Not using Union by Rank โ€” Leads to unbalanced trees even with path compression.
  3. Forgetting to check if already connected before union โ€” Always check find(x) == find(y) first to avoid redundant merges.
  4. Using 1-indexed elements with 0-indexed parent array โ€” Initialize parent of size n+1 if elements are 1-indexed.
  5. Modifying rank incorrectly โ€” Only increment rank when two trees of equal rank merge.

๐Ÿ‹๏ธ Practice Problemsโ€‹

#ProblemConceptDifficulty
1Number of Connected Components in GraphBasic DSU๐ŸŸข Easy
2Find if Path Exists in GraphBasic DSU๐ŸŸข Easy
3Redundant ConnectionCycle Detection๐ŸŸก Medium
4Number of ProvincesConnected Components๐ŸŸก Medium
5Accounts MergeDSU with strings๐ŸŸก Medium
6Most Stones Removed with Same Row or ColumnDSU on coordinates๐ŸŸก Medium
7Satisfiability of Equality EquationsDSU on characters๐ŸŸก Medium
8Kruskal's Minimum Spanning TreeDSU + Greedy๐Ÿ”ด Hard
9Swim in Rising WaterDSU + Binary Search๐Ÿ”ด Hard

๐ŸŒ Real-World Applicationsโ€‹

  • Network connectivity โ€” Check if two computers are in the same network
  • Kruskal's MST algorithm โ€” Build minimum spanning tree by adding edges without creating cycles
  • Image segmentation โ€” Group pixels belonging to the same region
  • Social networks โ€” Friend circle / community detection
  • Compiler optimization โ€” Pointer analysis / Alias analysis (e.g., Steensgaard's algorithm) to group aliased variables

๐Ÿ”— Referencesโ€‹