π Knuth-Morris-Pratt (KMP) Algorithm β In-Depth Explanation
The **Knuth-Morris-Pratt (KMP) Algorithm** is an **efficient string-searching algorithm** that finds occurrences of a **pattern (substring)** within a **text (string)** in **O(N + M) time**. It does this by **precomputing** a **Longest Prefix Suffix (LPS) array**, which helps in avoiding redundant comparisons.
π Why Do We Need KMP?
The **NaΓ―ve (Brute Force) Pattern Matching Algorithm** has a worst-case complexity of **O(N Γ M)** because it **restarts** comparisons every time a mismatch occurs. The **KMP algorithm avoids unnecessary comparisons** by **preprocessing the pattern**.
π Working of KMP Algorithm
KMP uses **two main steps**:
- π **Preprocess the Pattern**: Compute the **LPS (Longest Prefix Suffix) array**.
- π **Search the Pattern in Text**: Use the **LPS array** to avoid redundant checks.
π Step 1: Understanding the LPS (Longest Prefix Suffix) Array
The **LPS array** is a **precomputed array** that stores the length of the **longest proper prefix** which is also a **suffix** for each prefix of the pattern.
πΉ **Definition of LPS:**
**LPS[i]** represents the **length of the longest proper prefix** of `pattern[0...i]` which is also a **suffix** of `pattern[0...i]`.
πΉ **Example of LPS Array Construction**
Letβs take the pattern: **"ABABCABAB"**
Index (i) | Pattern | LPS[i] | Explanation |
---|---|---|---|
0 | A | 0 | No proper prefix = suffix. |
1 | AB | 0 | No proper prefix = suffix. |
2 | ABA | 1 | Proper prefix = "A", suffix = "A" (length 1). |
3 | ABAB | 2 | Proper prefix = "AB", suffix = "AB" (length 2). |
4 | ABABC | 0 | No proper prefix = suffix. |
5 | ABABCA | 1 | Proper prefix = "A", suffix = "A" (length 1). |
6 | ABABCAB | 2 | Proper prefix = "AB", suffix = "AB" (length 2). |
7 | ABABCABA | 3 | Proper prefix = "ABA", suffix = "ABA" (length 3). |
8 | ABABCABAB | 4 | Proper prefix = "ABAB", suffix = "ABAB" (length 4). |
π **Step 2: Compute the LPS Array (O(M) Time Complexity)**
void computeLPS(string pattern, vector& lps) {
int M = pattern.size();
int len = 0;
lps[0] = 0; // First character has no prefix-suffix match
int i = 1;
while (i < M) {
if (pattern[i] == pattern[len]) {
len++;
lps[i] = len;
i++;
} else {
if (len != 0) {
len = lps[len - 1]; // Move to previous LPS value
} else {
lps[i] = 0;
i++;
}
}
}
}
π Step 3: Searching Pattern in Text (O(N) Time Complexity)
Once the LPS array is computed, we use it to efficiently search the pattern.
void KMPSearch(string text, string pattern) {
int N = text.size(), M = pattern.size();
vector lps(M);
computeLPS(pattern, lps);
int i = 0, j = 0; // i = text index, j = pattern index
while (i < N) {
if (text[i] == pattern[j]) {
i++;
j++;
}
if (j == M) {
cout << "Pattern found at index " << i - j << endl;
j = lps[j - 1]; // Move j based on LPS array
} else if (i < N && text[i] != pattern[j]) {
if (j != 0) {
j = lps[j - 1]; // Use LPS array to skip redundant comparisons
} else {
i++;
}
}
}
}
β³ Time Complexity Analysis
Step | Time Complexity |
---|---|
Computing LPS Array | O(M) |
Searching Pattern in Text | O(N) |
Total Complexity | O(N + M) |
π Real-World Applications of KMP Algorithm
- π‘ **Text Searching in Large Files** β Used in **search engines, text editors.**
- 𧬠**DNA Sequence Matching** β Bioinformatics applications.
- π **Spam Detection & Plagiarism Checking** β Comparing documents efficiently.
- π **Data Compression** β Finding repeated substrings in text files.