๐Ÿ”Ž Rabin-Karp String Matching Algorithm โ€“ Complete In-Depth Explanation

The Rabin-Karp algorithm is an efficient string searching algorithm that uses hashing to find occurrences of a pattern within a text. It's particularly useful when dealing with multiple pattern searches.

๐Ÿ“Œ Why is Rabin-Karp Important?

  • ๐Ÿš€ Efficient for multiple pattern searches.
  • ๐Ÿ’ก Uses hashing for fast comparisons.
  • โš™๏ธ Foundation for other string searching algorithms.

๐Ÿ“Œ Understanding Rabin-Karp with an Example

Let's say we have the text "ABABDABACDABABCABAB" and we want to find the pattern "ABAB".

Rabin-Karp uses a rolling hash function. The basic idea is:

  1. Calculate the hash value of the pattern.
  2. Calculate the hash value of the first substring of the text with the same length as the pattern.
  3. If the hash values match, compare the pattern and the substring character by character to confirm a match (to avoid hash collisions).
  4. "Roll" the hash of the substring by removing the first character and adding the next character from the text.
  5. Repeat steps 3 and 4 until the end of the text.

Let's illustrate with our example (using a simple hash function for demonstration โ€“ in practice, more sophisticated hash functions are used):

Text: ABABDABACDABABCABAB
Pattern: ABAB

We calculate the hash of "ABAB". Let's assume our hash function simply sums the ASCII values of the characters. (This is a very basic example; real Rabin-Karp uses modular arithmetic and prime numbers.) Hash("ABAB") = 65 + 66 + 65 + 66 = 262.

Now, we calculate the hash of the first substring of the text "ABAB". Hash("ABAB") = 262. Since the hashes match, we compare "ABAB" with "ABAB" character by character. They match! We found an occurrence.

Next, we "roll" the hash. We remove the 'A' (ASCII 65) and add the next character 'D' (ASCII 68). The new substring is "BABD". The new hash is 262-65+68 = 265. We compare the hash of the pattern and the substring. Since they don't match, we continue "rolling" the hash until we find another possible match.

๐Ÿ”น Key Points

  • Uses hashing to quickly identify potential matches.
  • Rolling hash allows efficient updating of hash values.
  • Handles hash collisions by character-by-character comparison.

๐Ÿ›  Solution Approach


                  #include 
                  #include 
                  #include 
              
                  using namespace std;
              
                  int main() {
                      string text = "ABABDABACDABABCABAB";
                      string pattern = "ABAB";
                      int n = text.length();
                      int m = pattern.length();
              
                      // A simple hash function (for demonstration)
                      auto hash_function = [](const string& s) {
                          int hash_val = 0;
                          for (char c : s) {
                              hash_val += c; // In a real implementation, you would use a better hash function
                          }
                          return hash_val;
                      };
              
                      int pattern_hash = hash_function(pattern);
              
                      for (int i = 0; i <= n - m; ++i) {
                          string sub = text.substr(i, m);
                          int sub_hash = hash_function(sub);
              
                          if (pattern_hash == sub_hash) {
                              if (sub == pattern) { // Check for actual match (handle collisions)
                                  cout << "Pattern found at index " << i << endl;
                              }
                          }
                      }
              
                      return 0;
                  }
                

โณ Time Complexity Analysis

The average-case time complexity of Rabin-Karp is O(n + m), where n is the length of the text and m is the length of the pattern. The worst-case complexity can be O(nm) if there are many hash collisions, but this is rare with a good hash function.

๐ŸŒ Real-World Applications of Rabin-Karp

  • ๐Ÿ” Text searching in editors and search engines.
  • ๐Ÿงฌ Finding DNA sequences.
  • ๐Ÿ•ต๏ธโ€โ™‚๏ธ Plagiarism detection.

๐Ÿ”— Next Topics