🔍 Knuth-Morris-Pratt (KMP) Algorithm – In-Depth Explanation

The **Knuth-Morris-Pratt (KMP) Algorithm** is an **efficient string-searching algorithm** that finds occurrences of a **pattern (substring)** within a **text (string)** in **O(N + M) time**. It does this by **precomputing** a **Longest Prefix Suffix (LPS) array**, which helps in avoiding redundant comparisons.

📌 Why Do We Need KMP?

The **Naïve (Brute Force) Pattern Matching Algorithm** has a worst-case complexity of **O(N × M)** because it **restarts** comparisons every time a mismatch occurs. The **KMP algorithm avoids unnecessary comparisons** by **preprocessing the pattern**.

📌 Working of KMP Algorithm

KMP uses **two main steps**:

🛠 **Preprocess the Pattern**: Compute the **LPS (Longest Prefix Suffix) array**.
🔍 **Search the Pattern in Text**: Use the **LPS array** to avoid redundant checks.

🛠 Step 1: Understanding the LPS (Longest Prefix Suffix) Array

The **LPS array** is a **precomputed array** that stores the length of the **longest proper prefix** which is also a **suffix** for each prefix of the pattern.

🔹 Definition of LPS:

**LPS[i]** represents the **length of the longest proper prefix** of `pattern[0...i]` which is also a **suffix** of `pattern[0...i]`.

🔹 Example of LPS Array Construction

Let’s take the pattern: **"ABABCABAB"**

Index (i)	Pattern	LPS[i]	Explanation
0	A	0	No proper prefix = suffix.
1	AB	0	No proper prefix = suffix.
2	ABA	1	Proper prefix = "A", suffix = "A" (length 1).
3	ABAB	2	Proper prefix = "AB", suffix = "AB" (length 2).
4	ABABC	0	No proper prefix = suffix.
5	ABABCA	1	Proper prefix = "A", suffix = "A" (length 1).
6	ABABCAB	2	Proper prefix = "AB", suffix = "AB" (length 2).
7	ABABCABA	3	Proper prefix = "ABA", suffix = "ABA" (length 3).
8	ABABCABAB	4	Proper prefix = "ABAB", suffix = "ABAB" (length 4).

📌 Step 2: Compute the LPS Array (O(M) Time Complexity)


                void computeLPS(string pattern, vector& lps) {
                    int M = pattern.size();
                    int len = 0;
                    lps[0] = 0; // First character has no prefix-suffix match
            
                    int i = 1;
                    while (i < M) {
                        if (pattern[i] == pattern[len]) {
                            len++;
                            lps[i] = len;
                            i++;
                        } else {
                            if (len != 0) {
                                len = lps[len - 1]; // Move to previous LPS value
                            } else {
                                lps[i] = 0;
                                i++;
                            }
                        }
                    }
                }

🔍 Step 3: Searching Pattern in Text (O(N) Time Complexity)

Once the LPS array is computed, we use it to efficiently search the pattern.


                void KMPSearch(string text, string pattern) {
                    int N = text.size(), M = pattern.size();
                    vector lps(M);
                    computeLPS(pattern, lps);
            
                    int i = 0, j = 0; // i = text index, j = pattern index
                    while (i < N) {
                        if (text[i] == pattern[j]) {
                            i++;
                            j++;
                        }
                        if (j == M) {
                            cout << "Pattern found at index " << i - j << endl;
                            j = lps[j - 1]; // Move j based on LPS array
                        } else if (i < N && text[i] != pattern[j]) {
                            if (j != 0) {
                                j = lps[j - 1]; // Use LPS array to skip redundant comparisons
                            } else {
                                i++;
                            }
                        }
                    }
                }

⏳ Time Complexity Analysis

Step	Time Complexity
Computing LPS Array	O(M)
Searching Pattern in Text	O(N)
Total Complexity	O(N + M)

🌍 Real-World Applications of KMP Algorithm

📡 **Text Searching in Large Files** – Used in **search engines, text editors.**
🧬 **DNA Sequence Matching** – Bioinformatics applications.
🔍 **Spam Detection & Plagiarism Checking** – Comparing documents efficiently.
🗂 **Data Compression** – Finding repeated substrings in text files.