Sunday Algorithm
Definition:​
The Sunday Algorithm is a string matching algorithm developed by Daniel M. Sunday in 1990. It extends the Boyer-Moore approach by examining the character immediately following the pattern's alignment window to determine shift distances. This modification often results in larger shifts, making it particularly efficient for short patterns and large alphabets.
Characteristics:​
-
Next Character Rule:
- Examines character after current window
- Determines maximum possible shift
- Enables larger jumps in text
-
Shift Table:
- Preprocesses pattern for shifts
- Character-based lookup
- Simple calculation method
-
Preprocessing Phase:
- Builds shift lookup table
- One-time computation
- Alphabet-dependent preprocessing
-
Left-to-Right Pattern Comparison:
- Pattern comparison from left
- Uses preprocessed shift values
- Efficient character skipping
Time Complexity:​
-
Preprocessing:
- Where m is pattern length
- σ is alphabet size
- One-time table construction
-
Searching:
- Where n is text length
- Best case: O(n/m)
- Average case: sublinear
Space Complexity:​
- Space Usage:
- Shift table storage
- Alphabet size dependent
- Constant pattern storage
C++ Implementation:​
#include <iostream>
#include <vector>
#include <string>
#include <unordered_map>
using namespace std;
class SundayAlgorithm {
private:
// Compute the shift table for the pattern
unordered_map<char, int> computeShiftTable(const string& pattern) {
unordered_map<char, int> shiftTable;
int m = pattern.length();
// Initialize all characters with pattern length + 1
for (int i = 0; i < 256; i++) {
shiftTable[i] = m + 1;
}
// Fill the actual shifts for pattern characters
for (int i = 0; i < m; i++) {
shiftTable[pattern[i]] = m - i;
}
return shiftTable;
}
public:
vector<int> search(const string& text, const string& pattern) {
vector<int> matches;
int n = text.length();
int m = pattern.length();
if (m == 0 || m > n) return matches;
// Preprocess pattern
unordered_map<char, int> shiftTable = computeShiftTable(pattern);
// Searching phase
int pos = 0;
while (pos <= n - m) {
int i = 0;
// Try to match pattern at current position
while (i < m && pattern[i] == text[pos + i]) {
i++;
}
if (i == m) {
// Pattern found
matches.push_back(pos);
}
// Calculate shift based on next character
if (pos + m < n) {
pos += shiftTable[text[pos + m]];
} else {
break;
}
}
return matches;
}
};
// Demonstration class
class SundayDemo {
public:
static void demonstrateSearch() {
SundayAlgorithm algo;
string text = "GCATCGCAGAGAGTATACAGTACG";
string pattern = "GCAGAGAG";
cout << "Text: " << text << endl;
cout << "Pattern: " << pattern << endl;
vector<int> matches = algo.search(text, pattern);
cout << "Pattern found at positions: ";
for (int pos : matches) {
cout << pos << " ";
}
cout << endl;
// Demonstrate with different pattern
pattern = "TACG";
cout << "\nSearching for pattern: " << pattern << endl;
matches = algo.search(text, pattern);
cout << "Pattern found at positions: ";
for (int pos : matches) {
cout << pos << " ";
}
cout << endl;
}
};
int main() {
SundayDemo::demonstrateSearch();
return 0;
}
Key Features:​
-
Next Character Rule:
- Examines character beyond window
- Maximum possible shifts
- Efficient skip calculation
-
Shift Table:
- Character-based lookup
- Simple preprocessing
- Efficient shift values
-
Optimization Techniques:
- Quick character lookups
- Efficient shift calculation
- Minimal comparisons
Applications:​
-
Text Processing:
- Document searching
- Text editors
- Word processors
-
Pattern Matching:
- String searching
- Content filtering
- Data analysis
-
Information Retrieval:
- Document scanning
- Content indexing
- Pattern detection
-
Bioinformatics:
- DNA sequence matching
- Protein pattern search
- Genomic analysis
Advanced Features:​
-
Algorithm Variants:
- Multiple pattern matching
- Case-insensitive matching
- Unicode support
-
Implementation Optimizations:
- Cache-friendly versions
- Memory-efficient variants
- Parallel implementations
Comparison with Other Algorithms:​
-
Advantages:
- Simpler implementation than Boyer-Moore
- Better average-case performance
- Larger shifts possible
-
Trade-offs:
- Additional memory for shift table
- Pattern length dependent
- Alphabet size dependency
Performance Characteristics:​
-
Best Case:
- O(n/m) comparisons
- Maximum character skips
- Optimal shifting
-
Average Case:
- Sublinear performance
- Efficient for short patterns
- Good practical speed
-
Worst Case:
- O(mn) theoretical bound
- Rare in practice
- Still maintains efficiency
Summary:​
The Sunday Algorithm represents a practical improvement over traditional string matching algorithms through its innovative use of the next character rule. Its simplicity in implementation combined with efficient shifting makes it particularly effective for real-world applications, especially when dealing with short patterns and large alphabets.
The algorithm's strength lies in its ability to determine shifts based on characters following the current window, often allowing it to skip more characters than algorithms like Boyer-Moore. While it maintains the same worst-case complexity as other string matching algorithms, its practical performance and ease of implementation make it a valuable choice for many text processing applications.
The implementation provides a straightforward yet efficient approach to string matching, with the flexibility to be optimized for specific use cases. Its balanced combination of simplicity and performance, particularly for short patterns, makes it a practical choice in many string matching scenarios, complementing the broader family of string searching algorithms.