Wednesday, December 18, 2013

CAP theorem and NoSQL databases

I was talking to a friend yesterday who said "RDBMS is going to go away, everyone uses NoSQL these days". This served as the motivation behind writing this post. No, I dont think that is the case by any stretch of imagination. This got me into reading more about NoSQL databases. Lets travel down this path to understand why the NoSQL databases are so popular today and how they started.

To get started on this, lets first try to understand the CAP theorem. There are three ingredients in the CAP theorem namely:

  1. Consistency- Having the same data across all the nodes in the cluster at any given instant of time.

  2. Availability- Being able to serve always. No downtime and least possible response time.

  3. Partition Tolerance- The system continues to serve even if some link is  broken and your cluster is broken into two or more parts. There could be a loss of a message, some node may crash, but you still want to be able to serve. 


Now the CAP theorem states that you can carry home only two out of these three. This is where the difference in RDBMS and NoSQL lies! Lets look at the three combinations we can form here[2]:

  1. CA - data is consistent between all nodes - as long as all nodes are online - and you can read/write from any node and be sure that the data is the sam.

  2. CP - data is consistent between all nodes, and maintains partition tolerance by becoming unavailable when a node goes down.

  3. AP - nodes remain online even if they can't communicate with each other and will resync data once the partition is resolved, but you aren't guaranteed that all nodes will have the same data (either during or after the partition)


Now look at the case of some popular NoSQL customers and then return back to see why NoSQL is good and applicable to them but RDBMS in my opinion will co-exist.
Lets talk of amazon.com first. Their business model is such that they want to be available all the time. They wouldn't want their site to be down or have a higher response time at any moment. So it is very essential for them to have the 'A' and 'P' attributes of the CAP theorem. They would rather give away the 'C' for it to an extent. Getting a regret from amazon.com saying we don't have this item although we showed you it was available earlier is not as bad as the site itself going down. So if there was one item and two people simultaneously put it into their carts, that could happen but given their business model they can have alternatives to save their customers of this situation. For instance they could have some extra items in the stock always.

Similarly when you think of facebook.com, suppose you post a picture on your wall. Its not a great deal if one of your friends can see that picture and the other will be able to see the picture a few moments later. Again, it doesn't care as much about consistency as it does to the availability.

Lets now think why was the cluster or a farm of servers needed after all. Its because everything you do on internet is being stored in a database. Google, facebook, amazon etc are examples who keep all this data for providing personalized search or recommendations etc. This huge amount of data in the order of petabytes or zetabytes can not be stored on one disk. To try to store all of them on one disk and replicate it to more such disks is a pain and that is why google chose to use a farm of of several servers with smaller disks. Traditional RDBMS was built to best serve on a single disk and that is why people with this huge data came up with BigTable, DynamoDB etc.

And as we near the end of this article, its importnat to have a look at some NoSQL databases. There are many out there which can be broadly divided into 4 categories:

  1. ColumnHBaseAccumulo

  2. DocumentMarkLogicMongoDBCouchbase

  3. Key-value : DynamoRiakRedisCacheProject Voldemort

  4. GraphNeo4JAllegroVirtuoso


Note that there isnt a concrete line between the 4 types. As an example, the document oriented databases and the key-value databases could resemble the other type to seom extent at times. So the boundaries are a little fuzzy. To conclude with, I would say NoSQL databases are popular and are good in certain circumstances, but when you come to something like say banking you really need ACID compliance and therefore the RDBMS. So in my opinion they will co-exist as they today.

References:

  1. http://en.wikipedia.org/wiki/CAP_theorem

  2. http://stackoverflow.com/questions/12346326/nosql-cap-theorem-availability-and-partition-tolerance

  3. http://stackoverflow.com/questions/16779348/does-the-cap-theorem-imply-that-acid-is-not-possible-for-distributed-databases

  4. http://en.wikipedia.org/wiki/NoSQL

  5. https://www.youtube.com/watch?v=qI_g07C_Q5I

Tuesday, December 10, 2013

A sneak peek into the Hadoop ecosystem

I jumped into the IT field for the love I had for it and the biggest distraction for me has always been trying to know something I know nothing about. Ever since some of my colleagues started working with Hadoop, I have been wanting to read about it. To follow with the same, the continous drive has been a feeing that there has to be something really nice about big data for everyone talking about it find themselves so cool. And finally the latest post on facebook by a professor saying:-
Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it...

Well this quote is pretty famous by now and I must acknowledge this was something which pushed me into studying more about what this actually is, why is it so cool! Hopefully the next time I bump into some cool people I have something to talk about as well :D . I finally found some time and energy to study some of it this weekend. Here is a high level overview of the picture[1] I have in my mind right now:

hadoop_ecosystem_

HDFS

The base of this ecosystem is the Hadoop Distributed File system derived from the Google's whitepaper for Google File System(GFS). Lets take a simple example[2] to understand HDFS. Lets say we have a record containing the phone numbers of all the people in a city. You use say a 5 machines Hadoop Cluster to keep this data. Lets say we want to have a replication factor of 3 which means every chunk of data you have will have 2 extra backup copies stored on different machines. Further lets assume that you divide the hard disks on your 5 machines into 78 pieces. Lets say you store phone numbers of all the people whose names start with 'A' on one piece of a disk and keep its back up on the other two machines. Similary do that to all people's names starting with alphabets 'B'-'Z' In this way you organize your data on the 3 machines.

MapReduce: To generate a report from all the data, you would now need MapReduce codes. The MapReduce API is available opensource for use. But you will have to write some good Java codes to run the map jobes parallely on all those machines and get the results back (Reduce) to generate the final report.

Hive & Pig Frameworks: Writing MapReduce jobs isnt a piece of cake. So, Facebook made the Hive framework allow an easy way to do the same. Hive uses SQL-ish syntax to do things on the data lying on the Hadoop layer below. Pig is a similar framework built by Yahoo but it is more of a data flow language. In Hive a user can specify that data from two tables must be joined with an easy syntax like SQL, but not what join implementation to use. Pig is procedural and though you will have a little more to write there it does allows you the flexibility to specify an implementation of how you would like to join different chunks of data.

HBase is a NoSQL database that allows you to work the enormous amount of data stored on the Hadoop system. It is a column-oriented database management system. It is well suited for sparse data sets, which are common in many big data use cases. HBase does not support a structured query language like SQL. They are written in Java much like a typical MapReduce application.

ZooKeeper is a centralized infrastructure that helps synchronize across clusters. It maintains common objects needed in large cluster environments. Examples of these objects include configuration information hierarchical naming space, and so on. Applications can leverage these services to coordinate distributed processing across large clusters.

Sqoop can be used to import data from a RDBMS system (say MySQL/Oracle/MsSQL server) into HDFS or vice-versa.

Flume is the tool to gather data from various sources into your Hadoop system. Its mostly used for log data.

Closing Remarks:
To end with I would like to state that I am by no means an expert on big data. In fact I am a beginner just interested in knowing this uber-cool technology. And with this post all I aim is to start a discussion so that we can together start learning it bit by bit! So, if you find a mistake therein, please do help me learn it better :)

References:
1) http://www.youtube.com/watch?v=R-qjyEn3bjs
2) http://www-01.ibm.com/software/data/infosphere/hadoop/

Wednesday, November 20, 2013

Backtracking-I (Primer)

Recursion has always fascinated me for its simplicity. Whenever I come across a new recursive equation I had never known I feel happy about knowing it. But then there is a trade of with this simplicity for they say that in general recursive solutions are not as efficient as their iterative versions. Being a computer science geek and attached to algorithm programming, I am obsessed about efficiency of the code I write but recursion for its simplicity always allows me that exception. A road less traveled for me has been backtracking for backtracking involves trying all possibilities and it sounds awful in terms of complexities to say the least when you hear that first. But sometimes it is the only solution available or at least I see some classic problems like 8-queens problem which people solve using backtracking. It is for this reason that I thought it would be good to cover a post on backtracking to throw in some much deserved respect to this genre of algorithms. Lets try and learn backtracking approach to algorithms. Lets go through some simple problems to understand the idea. The first code we will write here is to generate all strings of n bits. For simplicity, we assume that we have a global array 'ar' available. Here is the code:
void generate_all(int n)
{
        if(n<1) printf("%s\n", ar);
        else{
                ar[n-1]='0';        //fix (n)th bit as '0'
                generate_all(n-1);  //generate all combinations for other n-1 positions.
                ar[n-1]='1';        //fix (n)th bit as '1'
                generate_all(n-1);  //generate all combinations for other n-1 positions.
        }
}
The output of this code is: 000 100 010 110 001 101 011 111 The code is exponential in complexity- O(2^n). Here n=3, so we have 8 strings in the output. Lets now generalize this code to print k-ary strings. void generate_all(int n, int k)
{
if(n<1) printf("%s\n", ar);
else{
for(int j=0;j<k;j++) //iterate over all k elements.
{
ar[n-1]='0'+j; //fix the (n)th position
generate_all(n-1,k); //generate all combinations for other n-1 positions.
}
}
}
[/sourcecode]

The output of this code is as follows:
000 100 200 300 010 110 210 310 020 120 220 320 030 130 230 330 001 101 201 301 011 111 211 311 021 121 221 321 031 131 231 331 002 102 202 302 012 112 212 312 022 122 222 322 032 132 232 332 003 103 203 303 013 113 213 313 023 123 223 323 033 133 233 333
This code is also exponential in nature- O(k^n). Here k=4 and n=3, so we have 64 strings in the output. Lets now add just a little more logic to that. We now write a code to print all permutations of a given string. The input string can be anything as opposed to first k characters starting with '0' in the earlier example.

[sourcecode language="cpp"]
#include
#include
#include
using namespace std;

void generate_all(int depth, char* permutation, int *used, char *original)
{
int length=strlen(original);
if(depth==length){ //base case
permutation[depth]='\0'; //so that we have an end marker for the string
printf("%s\n", permutation);
}
else{
for(int i=0;i<length;i++){
if(!used[i]){
used[i]=1; //Mark this position in "permutation" string used
permutation[depth]=original[i]; //fix the (i)th position
generate_all(depth+1, permutation, used, original); //generate all permutations for remaining (not marked used yet)positions
used[i]=0; //Prepare to backtrack
}
}
}
}

int main()
{
int used[]={0,0,0};
char * p;
char original[]="abc";
generate_all(0, p, used,original);
return 0;
}
[/sourcecode]

The output of the above code is:

abc acb bac bca cab cba

If we instead make our input string: "aaa", we get the output:

aaa aaa aaa aaa aaa aaa

I found the same question (last one) on geeksforgeeks but they solve it differently which is not very intuitive to me. To print only the unique strings in output, you can follow this code. This post is a primer and hence let us stop here. Try not to learn the exact questions and their solution but the way recursion unfolds and the way it backtracks again to look at the other parts of solution. Since we stop here with pretty basic stuff, let me make a promise to cover more backtracking codes in the coming posts. In the meantime if you are interested, have a look at the classic 8-queens problem and how backtracking is used to solve it.

Wednesday, November 13, 2013

Open Source India Conference: Multi-source Replication

This session was presented in Open source India 2013. The presentation covers MySQL multi-source replication which is released under MySQL Labs.

Friday, November 8, 2013

MySQL User Camp: MySQL Replication and Multi-threaded Slaves


The agenda of this talk was to introduce MySQL Replication and then follow it up with Multi-threaded slaves (MTS) support. The presentation introduces Multi threading slaves by database which is a part of MySQL-5.6 as well as multi-threading policy introduced in MySQL-5.7.2. Finally there is a brief coverage of the new replication monitoring tables to monitor MySQL Replication. These tables are part of MySQL Performance Schema.

Tuesday, October 15, 2013

Primality testing- II (Sieve of Atkins)

In the previous post on primality testing, we discussed successive division (brute force method) and sieve of Eratosthenes for checking if a number is prime and generating all primes below a given number respectively. Lets move to the Sieve of Atkins algorithm today which is a modified version of sieve of Eratosthenes. Sieve of Atkins is something I wouldn't suggest for a programming contest for it is difficult to understand and remember. To be frank, there are parts to it that I don't understand why they are right but it it works pretty fast when coupled with a good implementation. Note that if implemented badly, this could perform worse! The purpose behind putting this article on the blog is to spread the word about the existence of such an algorithm and that the code can be used when needed ;) Lets check out the algorithm now. Some excerpts first:

  1. The algorithm treats 2, 3 and 5 as special cases and just adds them to the set of primes to start with.

  2. Like Sieve of Eratosthenes, we start with a list of numbers we want to investigate. Suppose we want to find primes <=100, then we make a list for [5,1000] . As explained in (1), 2,3 and 5 are special cases and 4 is not a prime.

  3. The algorithm talks in terms of modulo-sixty remainders.

  4. All numbers with modulo-sixty remainder 1, 13, 17, 29, 37, 41, 49, or 53 have a modulo-four remainder of 1. These numbers are prime if and only if the number of solutions to 4x2 + y2 = n is odd and the number is squarefree. A square free integer is one which is not divisible by any perfect square other than 1.

  5. All numbers with modulo-sixty remainder 7, 19, 31, or 43 have a modulo-six remainder of 1. These numbers are prime if and only if the number of solutions to 3x2 + y2 = n is odd and the number is squarefree.

  6. All numbers with modulo-sixty remainder 11, 23, 47, or 59 have a modulo-twelve remainder of 11. These numbers are prime if and only if the number of solutions to 3x2 − y2 = n is odd and the number is squarefree.


The points (4-6) are not intuitive and hence this algorithm is a little difficult to understand. But, if you are interested, check out the proofs on Atkin's paper.  With these points in mind now, lets look at the following pseudocode from the Wikipedia article:
// arbitrary search limit
limit ← 1000000

// initialize the sieve
for i in [5, limit]: is_prime(i) ← false

// put in candidate primes:
// integers which have an odd number of
// representations by certain quadratic forms
for (x, y) in [1, √limit] × [1, √limit]:
n ← 4x²+y²
if (n ≤ limit) and (n mod 12 = 1 or n mod 12 = 5):
is_prime(n) ← ¬is_prime(n)
n ← 3x²+y²
if (n ≤ limit) and (n mod 12 = 7):
is_prime(n) ← ¬is_prime(n)
n ← 3x²-y²
if (x > y) and (n ≤ limit) and (n mod 12 = 11):
is_prime(n) ← ¬is_prime(n)

// eliminate composites by sieving
for n in [5, √limit]:
if is_prime(n):
// n is prime, omit multiples of its square; this is
// sufficient because composites which managed to get
// on the list cannot be square-free
is_prime(k) ← false, k ∈ {n², 2n², 3n², ..., limit}

print 2, 3
for n in [5, limit]:
if is_prime(n): print n

And finally lets convert this easy to understand algorithm into a python code:

[sourcecode language="Python"]
#!/usr/bin/python2.7 -tt

import sys
from math import sqrt, ceil, pow

def primes_below(limit): #sieve of Atkins
sqroot= int(ceil(sqrt(limit)))
is_prime=[False]*limit
primes = [2,3]

for x in xrange(sqroot+1):
for y in xrange(sqroot+1):
# n = 4*i^2 + j^2
n = 4*int(pow(x, 2)) + int(pow(y,2))
if n <= limit and (n % 12 == 1 or n % 12 == 5):
is_prime[n]= not is_prime[n]
# n = 3*i^2 + j^2
n = 3*int(pow(x, 2)) + int(pow(y,2))
if n <= limit and n % 12 == 7:
is_prime[n]= not is_prime[n]
# n = 3*i^2 - j^2
n = 3*int(pow(x, 2)) - int(pow(y,2))
if n <= limit and x > y and n % 12 == 11:
is_prime[n]= not is_prime[n]

for x in range(5, limit, 2):
if(is_prime[x] == True):
for y in xrange(x*x, limit, x):
is_prime[y]= False
primes.append(x);
print primes

def main():
primes_below(100)

if __name__=='__main__':
main()
[/sourcecode]

The C++ code for the algorithm can be found here. Enjoy!

Primality testing- I (brute force, Sieve of Eratosthenes)

If you are into algorithms, you would realize that Primality testing is a very important topic. In simple terms, Primality testing means finding if a number is prime or not. Lets discuss a few tricks for the same. There is actually a lot of number theory based on primality testing and it is impossible to cover all of them here. Nevertheless lets discuss some of them. This post starts with a simple brute-force algorithm and introduces Sieve of Eratosthenes. With these simple things covered, we will try and discuss more of primality testing in the following posts. Lets get started with the brute force approach!

NAIVE ALGORITHM:

Lets say the given number is n. The simplest(brute-force) way to find if a number is prime or not is to check if the number is divisible by any number in the range 2 to n-1. This follows from the definition of a prime number and should be easy to understand. At this moment, lets revisit a basic school grade mathematics property which says:
If a number, say n, is divisible by an integer k, then either k or n/k has to be <= sqrt(n)

This means you dont need to check if n is divisible by any of the numbers in [0, n-1], rather you can just check for [0, sqrt(n)]. Well, Give yourself a pat on the back because that seems like a good enough optimization. If you were making 10000 checks in the former case, you have reduced it to just 100 iterations with the addition of this property.  Lets have a look at a small python code to illustrate this (source: github):

[sourcecode language="python"]

def isPrime(n):
maxPossibleFactor = int( floor( sqrt(n) ) )
possibleFactors = range( 2, maxPossibleFactor+1 )
divisibleFactors = [ n % i == 0 for i in possibleFactors ]
return not any( divisibleFactors )

[/sourcecode]

That was easy. Right?

SIEVE OF ERAOSTHENES:

Lets move ahead and look at a slightly different question: Identify all the prime numbers below 1000. A naive way to do this is to call the above python function for [0-1000].  Sounds like bad... Can we do better? Yes, we can. Turns out there is a well known algorithm for doing this- Sieve of Eratosthenes. Lets see how this algorithm works (The explanation below is  copied from here):
Make a list of all the integers less than or equal to n (and greater than one). Strike out the multiples of all primes less than or equal to the square root of n, then the numbers that are left are the primes.

For example, to find all the primes less than or equal to 30, first list the numbers from 2 to 30.
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

The first number 2 is prime, so keep it (we will color it green) and cross out its multiples (we will color them red), so the red numbers are not prime.
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

The first number left (still black) is 3, so it is the first odd prime. Keep it and cross out all of its multiples. We know that all multiples less than 9 (i.e. 6) will already have been crossed out, so we can start crossing out at 32=9.
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Now the first number left (still black) is 5, the second odd prime. So keep it also and cross out all of its multiples (all multiples less than 52=25 have already been crossed out, and in fact 25 is the only multiple not yet crossed out).
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

The next number left, 7, is larger than the square root of 30, so there are no multiples of 7 to cross off that haven't already been crossed off (14 and 28 by 2, and 21 by 3), and therefore the sieve is complete. Therefore all of the numbers left are primes: {2, 3, 5, 7, 11, 13, 17, 19, 23, 29}. Notice we just found these primes without dividing.

Now, lets look at a fun animation to illustrate the algorithm (source: wikipedia):

Sieve_of_Eratosthenes_animation
With the algorithm clear now, lets look at a python code (source: stackoverflow).

[sourcecode language="python"]
def primes_sieve2(limit):
a = [True] * limit # Initialize the primality list
a[0] = a[1] = False

for (i, isprime) in enumerate(a):
if isprime:
for n in xrange(i*i, limit, i): # Mark factors non-prime
a[n] = False
[/sourcecode]

If you have a trouble understanding some part of  this post or any other related question, please drop a comment. And by the time, we have the next post ready, you may try some questions related to prime numbers at http://projecteuler.net/problems. Happy coding :)

Sunday, September 22, 2013

MySQL 5.7: Monitoring Replication with the Performance Schema tables

In the previous post, we had a look at the Performance Schema tables to monitor MySQL replication. We discussed why these were needed, what advantages they have to offer and what each table stands for. We also did a small exercise to see how one can select a particular table given the data that is needed. We will now dive deeper and look at the individual tables to understand what fields they are made of. We will also go through different scenarios to see how the different pieces of data can be used.

Note that these tables report the status/configuration of a replication slave. So, these are empty at the master server or at a standalone server. All the SELECTs we will talk of, are done at the slave. Lets take the tables one by one and go through them.

REPLICATION_CONNECTION_CONFIGURATION:

This table shows the configuration parameters used by the slave server for connecting to the master server. Lets set up replication with the following CHANGE MASTER TO command executed on a slave server. The  CHANGE MASTER TO command adds/changes the parameters that the slave server uses for connecting to the master server. To see the definition of all the fields in this table, please visit our official documentation.

mysql> change master to master_host='127.0.0.1', master_port=13000 ,
master_user='rpluser', master_password='secret', master_connect_retry=40, 
master_retry_count=10, master_ssl=1, master_ssl_verify_server_cert=1, 
master_auto_position=1; 
Query OK, 0 rows affected, 2 warnings (0.51 sec)

We can now look up the connection parameters set in our table for replication connection configuration. Lets do a SELECT * on this table to see what we get:

mysql> select * from performance_schema.replication_connection_configuration\G
*************************** 1. row ***************************
                         HOST: 127.0.0.1
                         PORT: 13000
                         USER: rpluser
            NETWORK_INTERFACE:
                AUTO_POSITION: 1
                  SSL_ALLOWED: YES
                  SSL_CA_FILE:
                  SSL_CA_PATH:
              SSL_CERTIFICATE:
                   SSL_CIPHER:
                      SSL_KEY:
SSL_VERIFY_SERVER_CERTIFICATE: YES
                 SSL_CRL_FILE:
                 SSL_CRL_PATH:
    CONNECTION_RETRY_INTERVAL: 40
       CONNECTION_RETRY_COUNT: 10
1 row in set (0.00 sec)

You can see that the changes made by CHANGE MASTER TO command can be seen in this table. These values are preserved here until the next CHANGE MASTER TO or RESET SLAVE ALL command. A row in this table represents a connection between the slave and its master. Since we have only one master here, we see only one row. If MySQL supported multi-source, then this table would have information about multiple sources- one row per source.

Note that a START SLAVE is not done yet. The CHANGE MASTER TO command makes changes to the internal data structures regarding the connection parameters (if you know the internals, the "active_mi" variable is what I am talking about) and this table just picks up values from this data structure. So, the values shown in the table just reflects the state of the this data structure at the moment the SELECT was done.

Lets now move to the next table "replication_connection_status".

REPLICATION_CONNECTION_STATUS:

This table shows the current status of connection between the slave and its master. Technically, it talks about the IO thread's status. To view the definition of all the fields in this table, please visit our official documentation. Now, we need a START SLAVE because this command establishes the connection between slave and its master. So, lets execute a START SLAVE/ START SLAVE IO_THREAD command followed by a SELECT* from "replication_connection_status" table.

mysql> start slave; Query OK, 0 rows affected (0.05 sec)

mysql> select * from performance_schema.replication_connection_status\G
*************************** 1. row ***************************
             SOURCE_UUID: 1766a057-10cd-11e3-b06a-0021cc72bfc3
               THREAD_ID: 2
           SERVICE_STATE: ON
RECEIVED_TRANSACTION_SET:
       LAST_ERROR_NUMBER: 0
      LAST_ERROR_MESSAGE:
    LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00

So, we can see from here that, at the moment SELECT was done:

  1. The slave was reading from a master with UUID= 1766a057-10cd-11e3-b06a-0021cc72bfc3,
  2. The thread that handles this connection (IO thread) has an id=2. The thread_id field is very useful if you want to join this table with other Performance Schema/Information Schema tables. As an example, you can join this table with the table performance_schema.threads to extract a lot more information about this connection (IO thread). Likewise, there is a lot more  thread-wise statistics you can collect with the other existing Performance Schema tables. Note that there is a bug here  but thankfully there is an easy workaround.
  3. Further, we see that our connection (the IO thread) is ready for service (SERVICE_STATE: ON). 
Note that the service state shows one of (ON/OFF/CONNECTING):
  • ON: meaning ready to serve i.e. if there is an event that is executed on master, you can assume      that it will be read by the slave via this connection.
  • OFF: meaning the connection (IO thread) is not serving i.e. it is not ready to read from the master or killed.
  • CONNECTING: meaning the connection (IO thread)is not established yet but the slave is trying to connect.
It is important to note here that the names and semantics of thread_id and service_state is extended to all the status tables and they all mean exactly the same. Hence, going forward lets skip explaining these two columns in the following tables.

Lets now execute a couple of queries on the master server and see if we can get the set of received transactions on our slave server.

mysql>create table test.t(a int primary key); Query OK, 0 rows affected (0.61 sec)
mysql>insert into test.t values(1); Query OK, 1 row affected (0.06 sec)

Lets turn back to our slave server and see what the field "received_transaction_set" has.

mysql> select received_transaction_set from performance_schema.replication_connection_status\G
*************************** 1. row ***************************
RECEIVED_TRANSACTION_SET: 1766a057-10cd-11e3-b06a-0021cc72bfc3:1-4
1 row in set (0.00 sec)

Lets now cause an error in the connection (IO thread) and look at the error related fields in this table. To cause an error in the connection, we will provide a wrong password in our CHANGE MASTER TO command.


mysql>stop slave; Query OK, 0 rows affected (0.10 sec)mysql>change master to master_password='abcd'; Query OK, 0 rows affected, 2 warnings (0.29 sec)
mysql> start slave;Query OK, 0 rows affected (0.05 sec)


mysql> select last_error_number, last_error_message, last_error_timestamp from 
performance_schema.replication_connection_status\G
*************************** 1. row ***************************                   
  LAST_ERROR_NUMBER: 1045                                                         
  LAST_ERROR_MESSAGE: error connecting to master 'rpluser@127.0.0.1:13000' - retry-time: 40 retries: 2                                                           
LAST_ERROR_TIMESTAMP: 0000-00-00  
1 row in set (0.00 sec)


The names and semantics of the error related fields(number, message and timestamp) are again exactly same across all the tables, wherever applicable. Hence, going ahead, lets skip monitoring error related fields in the following tables.

Lets now clean-up the error we caused to move ahead and work with other tables.

mysql> stop slave; Query OK, 0 rows affected (0.06 sec)mysql> change master to master_password='secret'; Query OK, 0 rows affected, 2 warnings (0.25 sec)mysql> start slave; Query OK, 0 rows affected (0.08 sec)

mysql> select last_error_number, last_error_message, last_error_timestamp from 
performance_schema.replication_connection_status\G
*************************** 1. row ***************************                   
  LAST_ERROR_NUMBER: 0                                                         
  LAST_ERROR_MESSAGE:                                                           
LAST_ERROR_TIMESTAMP: 0000-00-00  
1 row in set (0.00 sec) 

Great! We are done with monitoring the connection specific parameters between a slave and its master. Lets now move to the other half of replication module- execution of events received from the master on the slave server.

REPLICATION_EXECUTE_CONFIGURATION/STATUS:

This table shows the configuration parameters that affect execution of transactions by the slave server. Parameters stored in the table can be changed with the CHANGE MASTER TO command. Lets setup a delayed replication now and see how the same can be read from the tables replication execute configuration.


mysql> stop slave; Query OK, 0 rows affected (0.11 sec)mysql> change master to master_delay=150; Query OK, 0 rows affected (0.29 sec)mysql> start slave; Query OK, 0 rows affected (0.06 sec)

mysql> select desired_delay from performance_schema.replication_execute_configuration\G
*************************** 1. row ***************************
DESIRED_DELAY: 150
1 row in set (0.00 sec)

Lets now execute an event at our master to see if the event is executed at the slave or not.

mysql> insert into test.t values(1); Query OK, 1 row affected (0.06 sec)

Lets move to slave and make sure our slave has not elapsed the 150 second delay yet.

mysql> select remaining_delay from performance_schema.replication_execute_status\G
*************************** 1. row ***************************
REMAINING_DELAY: 147
1 row in set (0.00 sec)

Now we should be able to see that although the insert(2) is received at slave it is not executed yet. The last time we saw our received_transaction_set (before insert(2)), it was 1766a057-10cd-11e3-b06a-0021cc72bfc3:1-4. So insert(2) should be assigned a GTID 1766a057-10cd-11e3-b06a-0021cc72bfc3:5.

mysql> select received_transaction_set from performance_schema.replication_connection_status\G
*************************** 1. row ***************************
RECEIVED_TRANSACTION_SET: 1766a057-10cd-11e3-b06a-0021cc72bfc3:5
1 row in set (0.00 sec)

Ok, we have received the event. Now lets see if this insert(2) with GTID 1766a057-10cd-11e3-b06a-0021cc72bfc3:5 is executed yet.

mysql> select @@global.gtid_executed\G
*************************** 1. row ***************************
@@global.gtid_executed: 1766a057-10cd-11e3-b06a-0021cc72bfc3:1-4
1 row in set (0.00 sec)

Lets now wait until, our delay is elapsed.

mysql> select remaining_delay from performance_schema.replication_execute_status\G
*************************** 1. row ***************************
REMAINING_DELAY: 0
1 row in set (0.00 sec)

And check if the insert(2) has been executed.

mysql> select @@global.gtid_executed\G
*************************** 1. row ***************************
@@global.gtid_executed: 1766a057-10cd-11e3-b06a-0021cc72bfc3:1-5
1 row in set (0.00 sec)

Executed. Great!

Before going ahead to have a look at the other goodies in store, lets revise a couple of things:
  • SQL THREAD: Executes the events received from the master at slave.
  • MULTI-THREADED SLAVE (MTS): With only one thread (SQL thread) reading and applying events that are applied by multiple clients concurrently on the master, the slave starts lagging behind the master. In this case, one can chose to opt for MTS. Simply put, we have a buffer (relaylog) on slave server that stores the events executed on the master server. The coordinator thread reads the relaylog and assigns these to worker threads which execute the events assigned to them parallely on the slave. The coordinator thread is the scheduler and the worker threads are responsible for executing the events.

REPLICATION_EXECUTE_STATUS_BY_COORDINATOR:

This table is used in two ways depending on whether the slave is operating with one applier (MTS disabled) or multiple appliers executing events in parallel(MTS enabled). When operating with one applier, this table reports the status of this applier. When multiple appliers, the same table reports the status of the scheduler (coordinator thread). Lets explore the single applier use case.

mysql> select * from performance_schema.replication_execute_status_by_coordinator\G
*************************** 1. row ***************************
           THREAD_ID: 13
       SERVICE_STATE: ON
   LAST_ERROR_NUMBER: 0
  LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
1 row in set (0.00 sec)

We see that the applier's (SQL thread) id is 13 and it is alive (SERVICE_STATE: ON). For now there are only two states that we show- ON/OFF. The semantics of error fields are exactly similar to the explanations earlier. Hence we skip them here.

The Multiple applier case (MTS enabled) is similar except that this table will show the scheduler's (coordinator) thread id. Hence we will skip the explanations and move to the next table. To read more about this table, please visit our official documentation for this table.

REPLICATION_EXECUTE_STATUS_BY_WORKER (Single Applier Case):

When the slave server is operating in the single applier mode, lets see what the table replication_execute_status_by_worker shows us. Conceptually, we don't have a scheduler and multiple appliers. so the table should be empty.

mysql> select * from performance_schema.replication_execute_status_by_worker\G
Empty set (0.00 sec)

And it is empty :)

REPLICATION_EXECUTE_STATUS_BY_WORKER (Multiple Appliers Case):

Lets now set up our slave to use multiple appliers (MTS enabled) and then see how we can monitor the replication status concerned with our scheduler and appliers (coordinator and worker threads). Lets call our multiple appliers "worker".

mysql> stop slave; Query OK, 0 rows affected (0.10 sec)
mysql> SET GLOBAL slave_parallel_workers=2; Query OK, 0 rows affected (0.00 sec)

Note that a START SLAVE is necessary to have the workers ready for action. Just setting slave_parallel_workers=2, doesn’t spawn the 2 workers.

mysql> start slave; Query OK, 0 rows affected, 1 warning (0.14 sec)

mysql> select * from performance_schema.replication_execute_status_by_worker\G
*************************** 1. row ***************************
            WORKER_ID: 0
            THREAD_ID: 16
        SERVICE_STATE: ON
LAST_SEEN_TRANSACTION:
    LAST_ERROR_NUMBER: 0
   LAST_ERROR_MESSAGE:
 LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
            WORKER_ID: 1
            THREAD_ID: 17
        SERVICE_STATE: ON
LAST_SEEN_TRANSACTION:
    LAST_ERROR_NUMBER: 0
   LAST_ERROR_MESSAGE:
 LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
2 rows in set (0.00 sec)

So, we see that we have two rows to represent the two workers. The worker_id field here is same as the id field in slave_worker_info table, so you can again join the two tables to know more about a worker thread. The service_state(ON/OFF) has nothing new to add.
The interesting parts here are:
  1. You get the per worker error. So, if two of your workers have errored out in a group, you can identify both the errors as opposed to only one through SHOW SLAVE STATUS.
  2. last_seen_transaction is the new field we have added to help detect MTS failures better. This helps you find the GTID of all the transactions being executed by the all the workers at the moment the error happened. These values are preserved in case of an error and in the idle state of the worker after executing the transactions.
To read more about this table and field-wise descriptions, please visit our official documentation for this table.

CONCLUDING REMARKS:

The goal of this post was to help you familiarize with the Performance Schema tables relating to replication. Please try out this new interface for monitoring MySQL replication and let us know your feedback. To read more on these Performance Schema tables, you can visit the blog on multisource replication by Rith. These Performance Schema tables look really powerful with multisource replication. The motive behind this feature was to make it easier to monitor the health and performance of MySQL replication. We hope this new interface to monitor MySQL replication makes our user's life easier.

Finally a "software developer" !

It was the summer of 2007. I was a a class X student whose only interaction with a computer had been having played Mario once kind of 5 years ago. I was offered to choose between one of biology and computer science. Mathematics or Hindi Language being the other choice I had to make. Though I used to write poems, stories, give lecture in Hindi (believe me, I was quite famous in school for these), Mathematics was my darling (Alas, this changed at college!) and it was easy to chose Mathematics. The choice between Biology and Computer science was a difficult one and it ended up in the favor of computer science just because I was terrible at drawing a heart or a kidney. Some of my friends would remember that I took 3 hours to draw a heart in my biology class ;)

So I ended up being a computer science geek and thankfully so, for I am thoroughly enjoying the experience every single day. The next two years the only interaction I had with a computer was the blue turboC screen, yes the horrible old compiler but I have very fond memories of it, so I still don't call it stupid :) The OOPs concepts two years later was better than it is today, thanks to Nancy Madam. Ma'm, if you are reading this, you deserve a big thanks. And the choice of branch when I got into engineering was a no brainer.

Thanks to all those, whose names appear in the acknowledgement section of this blog and more, I am a proud open source contributor working for the most popular database on web- MySQL. I am a part of MySQL replication development team and working with these guys here is a looooooot of fun. Being a part of MySQL Replication for an year, I feel the best part has been the amount of learning that has come talking to the MySQLers, not to forget the NCGs 2012 (Well, we are a bunch of people, new college grads who joined MySQL in July 2012). NCGs you rock \o/.

Lets now return to the title and motivation for this post. It was a dream since 2007 when I had no interaction with computers- to develop a lot of features, to contribute to the this golden age of technology as much as I can. And the first tiny (well, tiny in terms of years of contributing to software development which I aim at) step into this world, my first design and implementation for a MySQL replication feature is now released. And while people check out the blogs on this feature, browse the source I have written, I will sit and enjoy motivating myself to do a lot more into this huge IT world. I am a software developer by choice, nothing very very big as of now, but I am confident I will make it really really big one day. For now, for the first time in my life, I feel I have become a software developer and its time to start on the journey.

And finally some advertisement :D. Please check out:

  1. My blog about this feature

  2. The official documentation

  3. Download mysql-5.7.2, try it out & let us know your feedback and

  4. The source code too, if you feel interested enough :)


[UPDATE]:

My second blog on this feature- http://shivjijha.blogspot.com/2013/09/Monitoring-Replication-with-the-NEW-performance-schema-tables.html

Enjoy!

Saturday, September 21, 2013

MySQL 5.7: Introducing the Performance Schema tables to monitor Replication


MySQL-5.6 was our best release ever and we are happy to see people praising it. This motivates us to work even harder for our next release. MySQL-5.7.2 DMR is out and we have already got a number of MySQL replication features. Below is a list of these features:

    - mysqlbinlog idempotent mode
    - mysqlbinlog --rewrite-db option
    - Dump thread does not take binary log lock
    - Lossless semisync
    - intra-schema multi-threaded slave
    - Performance Schema tables for replication monitoring (this post :) )

We want to make MySQL-5.7 huge and these are tiny steps towards the same.

This post introduces the Performance Schema tables to monitor MySQL replication. You can find the official documentation for the same here. This post gives a brief overview of these Performance Schema tables in an attempt to prepare you to dive deeper into the details. Please follow my second blog post on this topic to know the details on the individual tables.

Why Performance Schema Tables:

Replication is a delicate module of MySQL and people want to know its status frequently. To monitor the MySQL replication health, we have been using the SHOW SLAVE STATUS command for long. But replication is growing fast and this static command is not able to match up to people's expectations. SHOW SLAVE STATUS does not scale in the sense that there are multiple parts of the slave server: receiver part, applier part, intermediate queues, etc, in cases, multiple instances of each part. It has 54 fields, as of now, interleaving different information together. It has now reached such a point where we foresee that not having an SQL interface to query exactly what is required from the replication status would make monitoring tedious. So, we decided to have tables to monitor replication and we have put all our tables in the performance_schema database.

The motivation behind having tables is:

  • To make it easier to access exactly what is required, through an SQL interface,
  • Pieces of data can be assigned to variables and thus used in stored procedures etc,
  • Easier testing with SELECT item from Performance Schema tables,
  • Split the logically unrelated information into different tables,
  • Cross-reference monitoring data seamlessly by joining with other Performance Schema tables, Information_Schema tables etc,
  • Easier to extend and
  • More flexibility to accommodate a lot of replication information but still be organized and easy to use.

Additionally, we noted that SHOW SLAVE STATUS used a lot of technical jargons(words like IO, SQL, Relay_log etc). In order to make it easier to monitor replication, we decided to hide these implementation specific details so that it is easier to understand the names. We have tried our best to make it convenient for everyone to understand the names of the tables and the fields.

Why 6 tables? What does each stand for:

The idea is to come up with a better organized and an easy to extend interface. To start with, we have split the information under SHOW SLAVE STATUS into different parts based on:

  1. Connection information or execute information
  2. In each of (1), we further have configuration and status related fields put into different tables.

Based on the above classifications, we have got four tables:

    a)    replication_connection_configuration,
    b)    replication_connection_status,
    c)    replication_execute_configuration and
    d)    replication_execute_status

Note that all replication applier module status information under (d) would become confusing again and it makes more sense to have them split based on overall stats, coordinator's status or worker's status. So, we have two more tables namely,

    e)    replication_execute_status_by_coordinator and
    f)     replication_execute_status_by_worker.

Note that (d) is responsible for showing the overall status of the applier module in replication, (e) relates to the coordinator thread and (f) refers to the worker thread.

See the tree diagram below for the basis of division of data into different tables as discussed above:


To make it easier to select a table when you have a configuration/status parameter in mind, just ask yourself the following questions:

  • Is it relating to the  connection between the slave and its master?
  • If yes, you have narrowed down your scope to only (a) and (b).
  • If not, the others (c, d, e or f).
  • Suppose, the data you are looking for is relating to the connection (not status), now just ask yourself if it is a configuration parameter or relating to the status of connection between slave and its master. And there you are- you know which table to look at.
  • If the data you are looking for is relating to the execution of events received at your slave (c, d, e, f) are the tables you have. Now ask yourself the same question. Is this data you need relating to a configuration parameter (c)or relating to the status of execution of events at the slave(d, e or f).

For those who know the internals and have been using SHOW SLAVE STATUS, the names might look new and it could take a little time to get used to them. So, lets now see how these tables relate to the different replication threads. If you are not familiar with the threads and buffers, you don't really need to understand them. This new replication monitoring interface is designed to make sure you don't have to and hence you can skip this section. If you are familiar with the replication threads and want to understand the tables in terms of threads, you can use the table below to map these tables to the replication thread you want to monitor. But before we move ahead lets revise a couple of things:

  • IO THREAD: Responsible for the connection between master and slave, also gets queries   executed on the master onto the slave.
  • SQL THREAD: Executes the events received from the master at slave.
  • MULTI-THREADED SLAVE (MTS): With only one thread (SQL thread) reading and applying events that are applied by multiple clients concurrently on the master, the slave starts lagging behind the master. In this case, one can chose to opt for MTS. Simply put, we have a buffer (relaylog) on slave server that stores the events executed on the master server. The coordinator thread reads the relaylog and assigns these to worker threads which execute the events assigned to them parallely on the slave. The coordinator thread is the scheduler and the worker threads are responsible for executing the events.

With the background set, lets now look at the table below to understand the mapping between the Performance Schema tables and the replication threads they monitor.













To end with, it may interest you to note that these Performance Schema tables not only give you an easier way to monitor replication but also allow you to easily obtain all the information available about replication threads and coordinates through a join with the other Performance Schema tables, Information Schema tables, mysql.slave_worker_info etc.

Go check it out and let us know if it allows you to know more about the replication status. We want to have your valuable feedback to improve this and add a lot more information to this interface.

Want to read more and explore the fields, how-to, what fields etc ? Here is the deep-dive in the next post :) Enjoy!

Tuesday, September 17, 2013

z-algorithm for pattern matching

String algorithms are a traditional area of study in computer science and there is a wide variety of standard algorithms available. The more algorithms you know, the more easier it gets to work with them :) As the title of the post says, we will try to explore the z-algorithm today. Lets first formulate the problem statement and a couple of definitions. To start with lets look at the below mentioned question (from interviewstreet):

QUESTION

For two strings A and B, we define the similarity of the strings to be the length of the longest prefix common to both strings. For example, the similarity of strings “abc” and “abd” is 2, while the similarity of strings “aaa” and “aaab” is 3.

Calculate the sum of similarities of a string S with each of it’s suffixes.

Sample Input:
2
ababaa
aa

Sample Output:
11
3

Explanation:
For the first case, the suffixes of the string are “ababaa”, “babaa”, “abaa”, “baa”, “aa” and “a”. The similarities of each of these strings with the string “ababaa” are 6,0,3,0,1,1 respectively. Thus the answer is 6 + 0 + 3 + 0 + 1 + 1 = 11.

For the second case, the answer is 2 + 1 = 3.

SOLUTION 1:

The most intuitive way (brute force)  to solve this problem is simple, take out every suffix of the given string and match it with the given string to find the number of matching characters.Now add all such numbers for every suffix and there you have it solved.

[sourcecode language="cpp"]
#include <string>
#include <iostream>
#include <algorithm>
using namespace std;
long long similarity(string s, int k){
long long i;
for(i=0;k+i<s.size();i++){
if(s[i]!=s[k+i]) break;
}
return i;
}

int main() {
string s; long long sum;
getline(cin,s);
sum=s.size();
for(long long i=1;i<s.size();i++){
sum+=similarity(s,i);
}
cout<<sum<<endl;
return 0;
}

[/sourcecode]

But this is costly as we are not using one result to calculate the next one.  So we are throwing away some valuable info and calculating things from scratch everytime. If you try the above solution, it will get you a TLE (time limit exceeded). To get this in time, we need to come up with a better algorithm. Lets discuss this better algorithm to solve this- the Z-algorithm.  The z-algorithm gives you a Z-array which is exactly what we need in this question to sum up and print the answer- The similarity of each suffix to the prefix. A tutorial for the z-algorithm can be found in video at:

  1. http://www.youtube.com/watch?v=MFK0WYeVEag   and

  2. http://www.youtube.com/watch?v=NVJ_ELSbbew


The algorithm is explained quite nicely in these videos.

SOLUTION 2:

 Once you understand the z-algorithm, its easy to guess that the solution to this question is nothing but sum of all the z values. Lets see the code now:

[sourcecode language="cpp"]

#include <string>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
using namespace std;

long long compute_z(string s){
long long sum=0;
int n = s.length();
vector<int> z(n,0);

long long L = 0, R = 0;
for (long long i = 1; i < n; i++) {
if (i > R) {
L = R = i;
while (R < n && s[R-L] == s[R]) R++;
z[i] = R-L; R--;
}

else {

int k = i-L;
if (z[k] < R-i+1) z[i] = z[k];
else {
L = i;
while (R < n && s[R-L] == s[R]) R++;
z[i] = R-L; R--;
}
}
sum+=z[i];
}
return sum;
}

int main() {
int t; string s; long long sum;
cin>>t;getchar();
while(t--){
getline(cin,s);
cout<<compute_z(s)+s.size()<<endl;
}
return 0;
}

[/sourcecode]

And yes this code is accepted :)

In case you didn't notice yet,

  • Z-algorithms can be used to find a given pattern (P) in a given text (T) in linear time. To do the same,  you need to pass the string(P+S) to the function compute_z(). If you have a z-value>=pattern at an index 'i' . the pattern exists in the text at that index 'i'.  So, the z-algorithm helps you find all occurances of of pattern in the text.

  • Another advantage of Z-algorithm is that though KMP has same time complexity as Z-Algorithm, Z-algorithm is a lot more easy to understand, remember and code :)

Tuesday, September 10, 2013

Hello World...

Hello,

I am Shivji Kumar Jha (shiv), a graduate from BIT Mesra who fell in love with MySQL sometime ago. I am one of those lucky ones who got to work at MySQL as well and I thoroughly enjoy the experience every single day. I believe this is the golden age of technology and I aim to  contribute my bit to it affecting millions of people through an open source software – the most popular open source database on web – MySQL ! At MySQL I am currently working as a software developer with the replication team, which provides high-availability solution built into MySQL server. My area of expertise revolves around analyzing and enhancing performance Database replicas, enhancing the replication monitoring tools. This blog is an attempt to reach out to people and share the tiny details I know about MySQL Replication. The views expressed on this blog are my own and do NOT necessarily reflect the views of Oracle Corporation. 


I do not claim that all information available on this blog is perfectly correct. So, if you find something that is wrongly published, please let me know of the same. Also, in case of doubts please use the comments section of this blog. Together lets talk MySQL Replication :)

Thursday, May 23, 2013

C/C++: Variable number of arguments to a function(2)

Summary: This post discuses va_copy and its uses. There is a code below that shows its use.

This is my second post on the topic. If you dont know how these va_* macros work, you may want to read my previous post on this topic. When you learn the concept for the first time, you are tempted to believe that its a great idea to have a lot of functions of this type. Wont that be a great flexibility to have all the time? Unfortunately, thats not true. The problem is:

  1. va_arg() assumes the elements in va_list object to be of a given data type. So, you must know the types in advance to be able to extract the values from the va_ist object.

  2. Once you have traversed a va_list object, you can not get the first element again by resetting the pointers in for the same object.


Lets talk about the possible problems with (2). Lets try out various ways of getting the pointers reset to the beginning of the va_list object. Lets assume we have va_list "src" as the given list which we have traversed and "dest" as a copy of the same list. We want to be able to get the first element of the va_list after src has traversed  the complete va_list object once. On that route, lets first think what a possible implementation of va_list would do. The first thing that comes to mind is that a va_list object could be a pointer to the stack frame of the variadic function. In this situation, it seems perfectly alright to make an assignment say,
va_list dest = src;

Unfortunately, that doesnt work all the time. The next possibility could be that the va_list object is an array of pointers pointing to individual arguments in the variable argumet list , So, we can do a
va_list dest;
*dest = *src;

But that is also not the foolproof way because, on certain systems where arguments are passed in registers, it may be necessary for va_start() to allocate memory, store the arguments there, and also an indication of which argument is next, so that va_arg() can step through the list. Now va_end() can free the allocated memory again. So, there is no way you will be able to access that element again. But all is not gloomy my friend  for the standard thus defines an interface in the name of va_copy() , so that the above assignments can be replaced by

va_list dest;
va_copy(dest, src);
...
va_end(dest);


And this is a foolproof way !!! Finally :)
Note that the compiler wont complain about the wrongdoings in the methods discusses above. The compilation and linking will all pass fine, only when you see the outputs they will appear weird.  The good news is, you can always use va_copy() function to get a replica of a va_list object at that instant of time. So, if you want to use the variable argument list a second time, just use this function. The prototype is:
void va_copy (va_list destination, va_list source);

Just remember that each invocation of va_copy() must be matched by a corresponding invocation of va_end() in the same function. Lets now look at a small code to see va_copy() in action.

[sourcecode language="cpp"]

#include<stdio.h>
#include <stdarg.h>

/* The sum() function can accept variable number of arguments.
In the function declaration "..." at the end means that the
number and type of the arguments may vary. The marker ... can
only appear at the end.
*/
int sum(int num_of_arguments, ... )
{
/* A list to store the aruements. In example, this may contain
1,2,3 and 4 elements on successive calls from main()
*/
va_list args;
int sum= 0, i;

/* Directing the va_list args initialized above to start storing
all parameters folloowing the first parameter 'num_of_arguments'
*/
va_start (args, num_of_arguments );

va_list args_copy; //initialize a copy of va_list to be used
va_copy(args_copy, args); //copy args to args_copy

for (i= 0; i< num_of_arguments; i++ )
/* We add 5 to each element in variable argument list and
print it.
*/
printf("%d ", va_arg (args_copy, int) + 5);

va_end(args_copy); //destruct args_copy

printf("\n");

/**** Next, we use args as in previous post ****/

/* Loop until all parameters are seen. */
for (i= 0; i< num_of_arguments; i++ )
/* Extract the next value in argument list and add it to sum. */
sum += va_arg (args,int);

/* Signal that we are done with our usage of the list */
va_end (args);

return sum; /* Returns the calculated sum.*/
}
int main()
{
int result;

result=sum (4,1,2,3,4);
printf("result of sum() with 5 arguement: %d \n", result);

return 0;
}

[/sourcecode]

Play with the above code on ideone. Change a few things here and there and see how it works.

Happy coding !!!

References:

Saturday, May 18, 2013

C/C++ - Variable number of arguments to a function

Ever wondered how printf() in C would be implemented ? If you are a C-lover, I guess you would have, for its a function that looks pretty different from the other library functions, consumes tons of options and its very powerful. If you are not a C-fan and do not know much of it, you may want to have a look here . Well, one of the great flexibility of printf() comes from the fact that it can consume variable number of arguments. If you are scratching your head about how it does that, hold on,  you will know it in a minute. That is what this post aims at :) . Lets look at some examples to understand what 'variable number of arguments' means.

[sourcecode language="c"]

#include<stdio.h>
int main()
{
char * string = "with";
printf("printf with 1 arguement\n");
printf("Printf with %d arguements\n", 2);
printf("Printf %s %d arguements", string, 3);
return 0;
}
[/sourcecode]

Note that the first version takes one argument, the second version takes two arguments and the third takes three. Have you ever written a function definition in C that can be invoked with any number of arguements 1...N ? If not, lets write one. But on the way we need to learn a couple of things as ingredients to it. We need to know of four simple things. The definitions have been taken from cplusplus.com. If you dont understand what they mean, you may first want to see how they are used ( see the code below) and believe me, its very simple to understand them from the usage.

  1. va_list:  Type to hold information about variable arguments.
     Objects of this type can only be used as argument for the  va_start,  va_arg,  va_end and  va_copy macros, or functions that use them, like the variable argument functions in <cstdio>  (vprintf,  vscanf,  vsnprintfvsprintf and  vsscanf).


  2. void va_start (va_list ap, paramN)

    Initialize a variable argument list. Initializes ap to point to the first unnamed argument. It retrieve the additional arguments after parameter paramN. A function that invokes va_start(), should also invoke va_end() before it returns.


  3. type va_arg (va_list ap, type)

    Retrieve next argument. This macro expands to an expression of type type with the value of the current argument in the variable arguments list identified by ap.


  4. void va_end (va_list ap);

    Each invocation of va_start() must be matched by a corresponding invocation of va_end() in the same function.. It performs the appropriate actions to facilitate a normal return by a function that has used the va_list object ap .


Pheww thats a lot of technical Jargon !! Lets revert to some simpler words now. In short, you need to first define a list of type va_list by saying something like.. va_list args. Next, invoke a call to va_start() to indicate where the variable list of arguments  starts from, extract each argument from the variable list (args in our case) using va_arg() and finally dont forget calling va_end().

Note that apart from va_ag(), a va_list object (named ap in declarations below ) can also be consumed by versions of printf() that are preceded by 'v', for example:

  1. int vprintf ( const char * format, va_list ap ); - Prints formatted data from variable argument list to stdout,

  2. int vfprintf ( FILE * stream, const char * format, va_list ap ); - Writes formatted data from variable argument list to stream and

  3. int vsprintf (char * s, const char * format, va_list ap ); - Writes formatted data from variable argument list to string s.


Lets look at an example that uses the a va_arg(). For the above mentioned printf versions, you can find a C program for each of the 3 functions listed above in the links in the "References" section  (#2, #3, #4 respectively) at the end of this post.

[sourcecode language="c"]
#include<stdio.h>
#include <stdarg.h>

/* The sum() function can accept variable number of arguments.
In the function declaration ... means that the number and
type of the arguments may vary. The marker ... can only
appear at the end.
*/
int sum(int num_of_arguments, ... )
{
/* A list to store the aruements. In example, this may contain
1,2,3 and 4 elements on successive calls from main()
*/
va_list args;
int sum= 0, i;

/* Directing the va_list args initialized above to start storing
all parameters folloowing the first parameter 'num_of_arguments'
*/
va_start (args, num_of_arguments );

/* Loop until all parameters are seen. */
for (i= 0; i< num_of_arguments; i++ )
/* Extract the next value in argument list and add it to sum. */
sum += va_arg (args,int);

/* Signal that we are done with our usage of the list */
va_end (args);

return sum;  /* Returns the calculated sum.*/
}
int main()
{
int result;

result=sum (1,1);
printf("result of sum() with 2 arguement: %d \n", result);

result=sum (2,1,2);
printf("result of sum() with 3 arguement: %d \n", result);

result=sum (3,1,2,3);
printf("result of sum() with 4 arguement: %d \n", result);

result=sum (4,1,2,3,4);
printf("result of sum() with 5 arguement: %d \n", result);

return 0;
}
[/sourcecode]

Time to see what the function outputs. Here you go:
result of sum() with 2 arguement: 1 
result of sum() with 3 arguement: 3
result of sum() with 4 arguement: 6
result of sum() with 5 arguement: 10

And as always play with the above  code on ideone. Change a few things here and there and see how it works. Done with this? Lets try and implement our own version of printf().

References:

  1. Linux man page: http://linux.die.net/man/3/va_arg

  2. http://www.cplusplus.com/reference/cstdio/vprintf/

  3. http://www.cplusplus.com/reference/cstdio/vfprintf/

  4. http://www.cplusplus.com/reference/cstdio/vsprintf/