πŸ” Knuth-Morris-Pratt (KMP) Algorithm – In-Depth Explanation

The **Knuth-Morris-Pratt (KMP) Algorithm** is an **efficient string-searching algorithm** that finds occurrences of a **pattern (substring)** within a **text (string)** in **O(N + M) time**. It does this by **precomputing** a **Longest Prefix Suffix (LPS) array**, which helps in avoiding redundant comparisons.

πŸ“Œ Why Do We Need KMP?

The **NaΓ―ve (Brute Force) Pattern Matching Algorithm** has a worst-case complexity of **O(N Γ— M)** because it **restarts** comparisons every time a mismatch occurs. The **KMP algorithm avoids unnecessary comparisons** by **preprocessing the pattern**.

πŸ“Œ Working of KMP Algorithm

KMP uses **two main steps**:

  • πŸ›  **Preprocess the Pattern**: Compute the **LPS (Longest Prefix Suffix) array**.
  • πŸ” **Search the Pattern in Text**: Use the **LPS array** to avoid redundant checks.

πŸ›  Step 1: Understanding the LPS (Longest Prefix Suffix) Array

The **LPS array** is a **precomputed array** that stores the length of the **longest proper prefix** which is also a **suffix** for each prefix of the pattern.

πŸ”Ή **Definition of LPS:**

**LPS[i]** represents the **length of the longest proper prefix** of `pattern[0...i]` which is also a **suffix** of `pattern[0...i]`.

πŸ”Ή **Example of LPS Array Construction**

Let’s take the pattern: **"ABABCABAB"**

Index (i) Pattern LPS[i] Explanation
0 A 0 No proper prefix = suffix.
1 AB 0 No proper prefix = suffix.
2 ABA 1 Proper prefix = "A", suffix = "A" (length 1).
3 ABAB 2 Proper prefix = "AB", suffix = "AB" (length 2).
4 ABABC 0 No proper prefix = suffix.
5 ABABCA 1 Proper prefix = "A", suffix = "A" (length 1).
6 ABABCAB 2 Proper prefix = "AB", suffix = "AB" (length 2).
7 ABABCABA 3 Proper prefix = "ABA", suffix = "ABA" (length 3).
8 ABABCABAB 4 Proper prefix = "ABAB", suffix = "ABAB" (length 4).

πŸ“Œ **Step 2: Compute the LPS Array (O(M) Time Complexity)**


                void computeLPS(string pattern, vector& lps) {
                    int M = pattern.size();
                    int len = 0;
                    lps[0] = 0; // First character has no prefix-suffix match
            
                    int i = 1;
                    while (i < M) {
                        if (pattern[i] == pattern[len]) {
                            len++;
                            lps[i] = len;
                            i++;
                        } else {
                            if (len != 0) {
                                len = lps[len - 1]; // Move to previous LPS value
                            } else {
                                lps[i] = 0;
                                i++;
                            }
                        }
                    }
                }
                

πŸ” Step 3: Searching Pattern in Text (O(N) Time Complexity)

Once the LPS array is computed, we use it to efficiently search the pattern.


                void KMPSearch(string text, string pattern) {
                    int N = text.size(), M = pattern.size();
                    vector lps(M);
                    computeLPS(pattern, lps);
            
                    int i = 0, j = 0; // i = text index, j = pattern index
                    while (i < N) {
                        if (text[i] == pattern[j]) {
                            i++;
                            j++;
                        }
                        if (j == M) {
                            cout << "Pattern found at index " << i - j << endl;
                            j = lps[j - 1]; // Move j based on LPS array
                        } else if (i < N && text[i] != pattern[j]) {
                            if (j != 0) {
                                j = lps[j - 1]; // Use LPS array to skip redundant comparisons
                            } else {
                                i++;
                            }
                        }
                    }
                }
                

⏳ Time Complexity Analysis

Step Time Complexity
Computing LPS Array O(M)
Searching Pattern in Text O(N)
Total Complexity O(N + M)

🌍 Real-World Applications of KMP Algorithm

  • πŸ“‘ **Text Searching in Large Files** – Used in **search engines, text editors.**
  • 🧬 **DNA Sequence Matching** – Bioinformatics applications.
  • πŸ” **Spam Detection & Plagiarism Checking** – Comparing documents efficiently.
  • πŸ—‚ **Data Compression** – Finding repeated substrings in text files.

πŸ”— Next Topics