Getting the next largest palindrome

Thursday 31 July 2014

For a given number n we need to find the numerically larger number that is a palindrome. The simple approach which is not good for large numbers is to check if the number is palindrome and increment by 1 till we find the next number which is a palindrome.

A smarter algorithm is as follows :
1. Split the string into the left and right halfs
2. Compare the last digit in the left half and the first digit in the right half.
     a. If right is greater than the left, increment the left and stop
     b. If right is less than the left, stop
     c. If right is equal to the left, then continue the step 3 with the next the previous digit and so on
3. Take the left half and append the reverse to it.

Now complications may arise when the number is odd length and when the new palindrome is of greater length than the current number.So for this we check if our new palindrome is less than the given number. If so than we start increasing our middle digit by 1 and mirroring it to the other digits. If the middle digit is a 9 than we need to change it to zero and update the next digit by 1 or make it zero if it is a 9 until we reach a number that is not 9. Else we need to add 1 to the start of the number.The C code for the procedure is given below:

                cin >> str;
  s = str;
  int inc,dec;
  inc = s.size()/2;
  dec = inc;
  if(s.size()%2==0)
   dec--;
  for(int i=inc,j=dec;i<s.size() && j>=0;i++,j--)
   s[i] = s[j];
  while(s<=str && s.size()<=str.size()){
   int i=dec,j=inc;
   while((s[i]-'0')==9 && i>=0 && j<=s.size()){
    s[i] = s[j] = '0';
    i--;
    j++;
   }
   if(i<0){
    s = '1' + s;
    int l = s[s.size()-1] - '0';
    l++;
    s[s.size()-1] = (l+'0');
   }else{
    int l = s[i] - '0';
    l++;
    s[i]=s[j]=(l+'0');
   }
  }
  cout << s << endl;

Posted By:

saswat raj

Reverse Polish Notation : Expression Transformation

Tuesday 29 July 2014

Reverse Polish notation (RPN) is a mathematical notation in which every operator follows all of its operands, in contrast to Polish notation, which puts the operator in the prefix position. It is also known as postfix notation and is parenthesis-free as long as operator arities are fixed. The description "Polish" refers to the nationality of logician Jan Łukasiewicz, who invented (prefix) Polish notation in the 1920s.In computer science, postfix notation is often used in stack-based and concatenative programming languages. It is also common in dataflow and pipeline-based systems, including Unix pipelines.

The C code for converting an expression into RPN format as required in SPOJ problem 4 is done using one stack to store the members and one integer to denote the opening brackets. The function printRPN takes a string expression and prints the RPN equivalent.

void printRPN(string s){
 stack<string> members;
 int opening_brackets = 0;
 for(int i=0;i<s.length();i++){
  if(s[i]==' ')continue;
  else
  if(s[i]=='(')opening_brackets++;
  else
  if(s[i]==')'){
   opening_brackets--;
   string op1 = members.top();members.pop();
   string op2 = members.top();members.pop();
   string op3 = members.top();members.pop();
   op3.append(op1);
   op3.append(op2);
   members.push(op3);
  }else
  members.push(string(1,s[i]));
 }
 cout << members.top() << endl;
}

Posted By:

saswat raj

Substring Matching Algorithms

Algorithms, C++, Strings

Substring matching is an important part of any algorithmic problem. There are different complexities to consider about like the size of the string, the size of the pattern, the running time etc. Considering all these we list down some of the approaches which is used in solving the substring problem.

Naive Brute Force approach :

In the naive approach we iterate over all the elements of the text and start matching for each character of the pattern.If it doesnot match we advance the start of the match of the text by one character and again check for the presence of the pattern. Since for each character of the text we match the next characters for the pattern the worst case time complexity if n is the length of the pattern and m is the length of the text is O(mn). The code is mentioned below :

void naiveBruteForce(string a,string b){
 bool isMatched = true;
 int i=0;
 for(i=0;i<=a.length() - b.length();i++){
  isMatched = true;
   for(int j=0;j<b.length();j++){
    if(a[i+j] != b[j])
     isMatched = false;
   }
  if(isMatched)
   break;
 }
 cout << isMatched << " offset :" <<  i << endl;
}

Rabin Karp Algorithm :

In the Rabin Karp method we try to eliminate the two inner loop computation of the naive algorithm in which we try to match two strings. Instead of matching two strings we assign a value by using a hash of the characters. We precompute the hash of the pattern and we only check the if the hash of the substring of the text matches the pattern's hash value. Again problem will arise if the number of characters is big or the length of the string is big in which case we can use a modular hash function. But in this two strings may have the same value even if they do not match, thus the modulus M should be taken a large number. A sample code in which we have taken squares of the ascii value of the characters as a hash value is given below:

void rabinKarp(string a,string b){
 int hashsub = 0,tmp,tmp1,hash = 0;
 for(int i=0;i&ltb.length();i++){
  tmp = b[i];
  tmp1 = a[i];
  hashsub += tmp * tmp;
  hash += tmp1 * tmp1;
 }
 for(int i=b.length();i&lta.length();i++){
  if(hash==hashsub)
   cout << "offset :" << i - b.length() << endl;
  tmp = a[i - b.length()];
  tmp1 = a[i];
  hash -= (tmp * tmp);
  hash += (tmp1 * tmp1);
 }
}

Knuth-Morris-Pratt Algorithm:

In KMP algorithm we use the fact that if a substring match fails then we know by precomputing the pattern that we can certainly shift by a certain amount in which the substring wont match. The precomputing is done by finding the longest substring of the pattern that is a prefix as well as a suffix. This can be done in the following way- Consider a string in the pattern b₁b₂...b_k...b_j. Sup pose we know that the longest substring that is a prefix as well as a suffix is the substring upto k. Then F[j] = k, and if k+1th character is equal to the j+1th character then F[j+1] = F[j] + 1. Else there would be some other substring smaller than bk that would be initial and terminal substring then it would also be a border string of b1...bk. Hence if the above condition is false then we check for the string F[F[j]] for the initial and terminal and this check continues until we find a border or there is no border at all in which case we inialize it to . The pseudocode is as follows:
Prefix-Compute-Function(Pattern):
F[1] = 0;
for j from 1 to m-1 do:
    i = F[j];
    while(bi+1 != bj+1 and i>0) i=F[i]; //reduce the search space
    if(i==0 and bi+1 != bj+1) F[j+1] = 0;
else
F[j+1] = i+1;

The C implimentation is given below :

int* KMP_Preprocess(string pattern){
 int *pre = (int *)malloc(sizeof(int)*pattern.length());
 memset(pre,0,pattern.length());
 for(int i=1;i<pattern.length();i++){
  pre[i] = pre[i-1];
  while(pattern[i] != pattern[pre[i]]){
   if(pre[i] == 0){
    pre[i] = -1;
    break;
   }else{
    pre[i] = pre[pre[i] - 1];
   }
  }
  pre[i] = pre[i] + 1;
 }
 return pre;
}

void KMP_Match(string a, string b){
 int *pre = KMP_Preprocess(b);
 for(int i=0,j=0;i<a.length();){
  while(a[i+j] == b[j]){
   ++j;
   if(j==b.length()){
    cout << "Match offset :" << i << endl;
    break;
   }
  }
  if(j>0){
   i+=(j-pre[j-1] > 1)?(j-pre[j-1]):1;
   j = pre[j-1];
  }else{
   ++i;
  }
 }
}

Posted By:

saswat raj

Floating Point Representation Basics

Sunday 27 July 2014

The details in this post are acquired from IEEE 764 floating point standards. Usually a real number in binary will be represented in the following format- I_mI_m-1…I₂I₁I₀.F₁F₂…F_nF_n-1where I_m and F_n will be either 0 or 1 of integer and fraction parts respectively.

A finite number can also be represented by four integer components : a sign (s), a base (b), a significand (m), and an exponent (e). Then the numerical value of the number is evaluated as (-1)^s x m x b^e.In the binary32 format the floating point numbers are stored in the single precision format. In the single precision format there are 23 bits for the significand 8 bits for the exponent and 1 bit for the sign.

For example, the rational number 9÷2 can be converted to single precision float format as following,

9₍₁₀₎ ÷ 2₍₁₀₎ = 4.5₍₁₀₎ = 100.1₍₂₎

The result said to be normalized, if it is represented with leading 1 bit, i.e. 1.001(2) x 22. Omitting this implied 1 on left extreme gives us the mantissa of float number.

In other words, the above result can be written as (-1)⁰ x 1.001₍₂₎ x 2² which yields the integer components as s = 0, b = 2, significand (m) = 1.001, mantissa = 001 and e = 2. The corresponding single precision floating number can be represented in binary as shown below,

Where the exponent field is supposed to be 2, yet encoded as 129 (127+2) called biased exponent.

One of the goals of the IEEE floating point standards was that you could treat the bits of a floating point number as a (signed) integer of the same size, and if you compared them that way, the values will sort into the same order as the floating point numbers they represented.
If you used a twos-complement representation for the exponent, a small positive number (i.e., with a negative exponent) would look like a very large integer because the second MSB would be set. By using a bias representation instead, you don't run into that -- a smaller exponent in the floating point number always looks like a smaller integer.

A bias of (2^n-1 – 1), where n is # of bits used in exponent, is added to the exponent (e) to get biased exponent (E). So, the biased exponent (E) of single precision number can be obtained as

E = e + 127

The range of exponent in single precision format is -126 to +127. Other values are used for special symbols.

Double Precision Format:

Precision:
The smallest change that can be represented in floating point representation is called as precision. The fractional part of a single precision normalized number has exactly 23 bits of resolution, (24 bits with the implied bit). This corresponds to log₍₁₀₎ (2²³) = 6.924 = 7 (the characteristic of logarithm) decimal digits of accuracy. Similarly, in case of double precision numbers the precision is log₍₁₀₎ (2⁵²) = 15.654 = 16 decimal digits.

Accuracy:
Accuracy in floating point representation is governed by number of significand bits, whereas range is limited by exponent. Not all real numbers can exactly be represented in floating point format. For any numberwhich is not floating point number, there are two options for floating point approximation, say, the closest floating point number less than x as x- and the closest floating point number greater than x as x+. A rounding operation is performed on number of significant bits in the mantissa field based on the selected mode. The round down mode causes x set to x_, the round up mode causes x set to x+, the round towards zero mode causes x is either x- or x+ whichever is between zero and. The round to nearest mode sets x to x- or x+ whichever is nearest to x. Usually round to nearest is most used mode. The closeness of floating point representation to the actual value is called as accuracy.

Special Bit Patterns:
The standard defines few special floating point bit patterns. Zero can’t have most significant 1 bit, hence can’t be normalized. The hidden bit representation requires a special technique for storing zero. We will have two different bit patterns +0 and -0 for the same numerical value zero. For single precision floating point representation, these patterns are given below,

0 00000000 00000000000000000000000 = +0
1 00000000 00000000000000000000000 = -0

Similarly, the standard represents two different bit patters for +INF and -INF. The same are given below,
0 11111111 00000000000000000000000 = +INF
1 11111111 00000000000000000000000 = -INF

All of these special numbers, as well as other special numbers (below) are subnormal numbers, represented through the use of a special bit pattern in the exponent field. This slightly reduces the exponent range, but this is quite acceptable since the range is so large.

An attempt to compute expressions like 0 x INF, 0 ÷ INF, etc. make no mathematical sense. The standard calls the result of such expressions as Not a Number (NaN). Any subsequent expression with NaN yields NaN. The representation of NaN has non-zero significand and all 1s in the exponent field. These are shown below for single precision format (x is don’t care bits),
x 11111111 1m0000000000000000000000
Where m can be 0 or 1. This gives us two different representations of NaN.
0 11111111 110000000000000000000000 _____________ Signaling NaN (SNaN)
0 11111111 100000000000000000000000 _____________Quiet NaN (QNaN)
Usually QNaN and SNaN are used for error handling. QNaN do not raise any exceptions as they propagate through most operations. Whereas SNaN are which when consumed by most operations will raise an invalid exception.

Overflow and Underflow:
Overflow is said to occur when the true result of an arithmetic operation is finite but larger in magnitude than the largest floating point number which can be stored using the given precision. Underflow is said to occur when the true result of an arithmetic operation is smaller in magnitude (infinitesimal) than the smallest normalized floating point number which can be stored. Overflow can’t be ignored in calculations whereas underflow can effectively be replaced by zero.

Posted By:

saswat raj

Segmented Sieve of Eratosthenes

If you have enough space to store all primes upto sqrt(b) then you can sieve for primes in the range a to b using additional space O(b - a) :

To do this let us take an example of the #SPOJ problem 2 : PRIME1 where we are given the upper limit as 1000000000 which is quite big if we want to store all the numbers in an array although we can store them by reducing them to array where each bit of a number stores whether the corresponding number is a prime or not i.e. if there are 64 numbers in the sieve we need two integers of 32 bits to store information of whether the number is prime or not.

Another method is if you have space to store upto square root of upper limit using sieve of eratosthenes then we can calculate the sieve of b-a from the sieve already present. The way to do it is mentioned below :

1. First get the sieve upto the square root of upper limit which in this case is 1000000000. For 1000000000 the square root comes near 31625. Hence we create an array pre_sieve[31625] and fill it with the values:

pre_sieve[31265] = {1};
for(int i=2;i<=177;i++)
{
    if(pre_sieve[i] == 1)
    {
         for(int j = i*i;j< = 31264; j+=i)
             pre_sieve[j] = 0;
    }
} 
vector<int> primes;
for(int i=2;i<=31264;i++)
if(pre_sieve[i] == 1)
    primes.push_back(i);

Now for a range [a,b] we need to find the closest number that is in the range and is a composite number of the prime p. To do this we can divide the start of the range by p and take the floor and then multiply the resulting by p.We can then iterate in the sieve to get the primes.

for(int i=0;i<primes.size();i++)
{
     int *arr = (int *)malloc(sizeof(int)*(b-a+1));
     memset(arr,1,b-a+1);
     int prime = primes[i];
     int low = a/prime;
     low = low * prime;
     for(int j = low * 2;j<=b;j+= prime)
             arr[j] = 0 
}

Again you can reduce the space further by only sieving the odd number.

Posted By:

saswat raj

Archive for July 2014

Getting the next largest palindrome

Reverse Polish Notation : Expression Transformation

Substring Matching Algorithms

Floating Point Representation Basics

Segmented Sieve of Eratosthenes

code-craft

Blog Archive