Aho-Corasick Algorithm
The Aho-Corasick Algorithm is a powerful string-searching algorithm that locates elements of a finite set of strings (the dictionary/patterns) within an input text. Unlike algorithms that search for one pattern at a time, Aho-Corasick searches for all patterns simultaneously in time, where is the length of the text, is the combined length of all patterns, and is the number of matching occurrences.
It accomplishes this by building a Trie with failure links and output links, converting the Trie into a directed acyclic graph/finite state machine.
Aho-Corasick Trie & Failure Links Visualizer
How Aho-Corasick Works
The algorithm consists of two main phases:
- Trie Construction: All dictionary patterns are inserted into a standard Trie prefix tree.
- Failure Links Construction: Broadly speaking, the failure link of a node represents the longest proper suffix of the prefix represented by that node which is also a prefix of some pattern in the Trie. This is built using a Breadth-First Search (BFS) traversal.
- Searching: The text is parsed character-by-character. If no child matches the current character, the search follows the failure link, avoiding backtracking.
Complexity
- Pre-processing Time: where is the sum of the lengths of all patterns.
- Matching/Search Time: where is the length of the text and is the number of matches.
- Space Complexity: where is the size of the alphabet.
Telemetry Integration
Completed working through this block? Sync progress to workspace.