Rabin Karp Algorithm in C

April 20, 2022

Rabin Karp Algorithm (Rolling Hash)

Rabin-Karp string searching algorithm calculates a numerical (hash) value for the pattern p, and for each m-character substring of text t. Then it compares the numerical values instead of comparing the actual symbols. If any match is found, it compares the pattern with the substring by naive approach.

Lets understand this with the help of an example:-

Input: aaaaab
Output: aab

First will calculate the hashcode using hashfunction and iterate throughout the input string and find if the same hashcode is present if its present then we will check for the actual pattern.

Algorithm

It’s based on rolling hash. Suppose we have a hash function that take string as input and it gives a unique hash.
First we need to create a hash using hashfunction.
Lets understand this using example:-
Main string – abcde
Pattern to find – abc
To calculate the hash of the next substring, subtract the last most value and add the hash value of the new charater.
hash(abc) = 97 * 30 + 98*31 + 99 * 3 2
Subtract the first ASCII character
hash(bcd) = 97 * 30 + 98*31 + 99 * 3 2 – 99

= 98*31 + 99 * 3 2

Divide the whole by the prime number

= (98*31 + 99 * 3 2) / 3

= 98*30 + 99 * 3 1

Multiply the ASCII of the last character with the power of prime raise to the power length of the substring -1 and then add it to the new hash.

= 98*30 + 99 * 3 1 + 100 * 3 2

And this will be continued to find all new hashes.

C Code for Rabin Karp Algorithm:-

Run

 
#include <stdio.h>
#include <string.h>

// d is the number of characters in the input alphabet
  #define d 256

/* pat -> pattern
txt -> text
q -> A prime number
*/
  void search(char pat[], char txt[], int q) {
  int M = strlen(pat);
  int N = strlen(txt);
  int i, j;

  int p = 0; // hash value for pattern
  int t = 0; // hash value for txt
  int h = 1;

  // The value of h would be "pow(d, M-1)%q"
  for (i = 0; i < M - 1; i++)
    h = (h * d) % q;
  // Calculate the hash value of pattern and first
  // window of text
  for (i = 0; i < M; i++) {
    p = (d * p + pat[i]) % q;
    t = (d * t + txt[i]) % q;
  }

  // Slide the pattern over text one by one
  for (i = 0; i <= N - M; i++) {
  // Check the hash values of current window of text
  // and pattern. If the hash values match then only
  // check for characters one by one
  if (p == t) {
  /* Check for characters one by one */
    for (j = 0; j < M; j++) {
      if (txt[i + j] != pat[j])
        break;
    }
  // if p == t and pat[0...M-1] = txt[i, i+1, ...i+M-1]
  if (j == M)
   printf("Pattern found at index %d \n", i);
 }

  // Calculate hash value for next window of text: Remove
  // leading digit, add trailing digit
  if (i < N - M) {
    t = (d * (t - txt[i] * h) + txt[i + M]) % q;
  // We might get negative value of t, converting it
  // to positive
  if (t < 0)
    t = (t + q);
   }
  }
 }

/* Driver Code */
int main() {
  char txt[] = "PREP INSTA PREPARE INSTANTLY";
  char pat[] = "PREP";

  // A prime number
  int q = 101;
  // function call
  search(pat, txt, q);
  return 0;
}

Pattern found at index 0 
Pattern found at index 11