Monday, December 17, 2018

Understanding Pointers - Part 1 - Basics

Pointer is variable type in C Programming language, that is used to store a memory address. Being able to store a memory address in a variable and then use it turns out be a great asset when you are doing low level programming.


Basics

A Pointer is a variable type that stores a memory address.

Note: When we say 'memory', we will assume we are talking about the RAM, where your program is loaded before execution.

On Memory and addresses.

Before going further, we need to get some minimal basic understanding of memory. This is crucial to successful understanding and use of pointers in C.

Let's say that you have a 4 GB RAM installed in your computer. There are (4 * 1000 * 1000 * 1000) bytes in your RAM. Now, each byte stores some information. To access information stored in a particular byte or to store some information in a that byte, the CPU needs to identify that particular byte somehow. And for that, we have a memory address. Each byte is assigned a unique number that is called its address, its memory address.

Normally, from a programming prespective, we just need to remember that memory addresses are a set of sequential numbers, starting at some number and ending at another. 

So, a pointer variable basically stores a number. A number representing a particular location in your computer memory.

Declaring a pointer

To declare a pointer variable in your code, you must first need to decide what kind of variable is going to be stored at that memory address.

For example, you could have an integer stored at that memory address. Or it could be a character. Or a float. etc.

So, to declare a pointer in your code, you must know what's going to be stored at that address in memory.

To start with, we will deal with integers.

Let's say I have a integer variable, var, and I have assigned it a value of 10. Like this:

int var = 10;

Now, var is stored somewhere in memory. To know the memory address where var is stored, we would declare a pointer variable that stores an address of an integer. Then we would assign the address of var to the pointer variable. Look at the code below:

int var = 10; // var declared and assigned a value of 10
int *ptr;    // a pointer variable 'ptr' declared
ptr = &var;   // Value of var's memory address stored in ptr

Let's understand, step by step, what we did here.
  • First, we declared an integer variable var and we assigned it a value of 10.
  • Then, we declared a pointer variable ptr using the * operator. Using this syntax tells the compiler that 'ptr' is a pointer variable, that will store memory address of an integer.
  • Then, we assigned the address of var to ptr, using & operator. & operator is used to access the location of an object in C.
Here's a complete program, to print the address of var, using a pointer variable.

#include <stdio.h>
int main()
{
    int var = 10;
    int *ptr;
    
    ptr = &var;
    
    printf("0x%x",ptr);
}


Here is the output that I get when I run this program:

0x48d51dac

Try running this program on your computer and you may get a completely different memory address printed. That's ok.

The exact memory address assigned to any variable in our program, depends on a whole lot of factors and is out of scope of this lesson. As long as you were able to get some memory address printed, we are on the right track.

Note, how in the above program, I printed the ptr using a %x in printf. There's no special reason behind doing this. It's just more of a habit to represent memory addresses as hexadecimals.
We could have printed the memory address as an integer too. But, in the rest of the lesson, I shall print memory addresses in hexadecimal format as a convention.

Practicing declaring pointers

Now, remember I said that to declare a pointer we must know what kind of variable is going to be stored in the memory address represented by the pointer. To declare a pointer that stores address of an integer, we declared it like this:

int *ptr;

If we read the above statement backwards, we shall get some idea of how to declare a pointer variable. Let's read this statement backwards like this:

Address stored in ptr points to 'int' or 'integer'. Meaning, if you will go to the address stored in ptr, you will find an integer stored there.

Let's try another declaration:

int *newptr;

Reading the above declaration backwards -> Address stored in newptr points to 'integer'.

Another one:
char *ptr;

Reading this backwards -> Address stored in ptr points to a 'character'.

Now, let's declare a pointer that stores address of a float number.

float *ptr; // Address stored in ptr points to a floating point.

Let's write a program to declare different kind of pointers:

#include <stdio.h>
int main()
{
    int a = 10;
    char b = 'A';
    float c = 3.14159;
    double d = 2.71828182;
    
    int * ptr1;        // address stored at ptr1 points to integer
    char * ptr2;       // address stored at ptr1 points to character
    float * ptr3;      // address stored at ptr1 points to float
    double * ptr4;     // address stored at ptr1 points to double
    
    ptr1 = &a;
    ptr2 = &b;
    ptr3 = &c;
    ptr4 = &d;
    
    printf("0x%x \n", ptr1);
    printf("0x%x \n", ptr2);
    printf("0x%x \n", ptr3);
    printf("0x%x \n", ptr4);

    return 0; 

}


Here is the output that I get, when I run the above program:

0x215c78f0
0x215c78ef
0x215c78f4
0x215c78f8

I would encourage you to type this program in your favorite editor and compile and run it on your machine. This will make you more comfortable with pointer notation and then we can starting looking ahead at how we use pointers in C programming.

Printing the value stored at a memory address

We have understood that a pointer stores a memory address. We can use the * operator, to print the value stored at the memory address in a pointer.

Let's see how to do this:

int * ptr;
int a = 10;
ptr = &a;

printf("%d \n", *ptr);

As we can see, the * operator is used to access the value stored at the memory address in a pointer.

Once we apply the * operator to a pointer, we get the value at that memory address and we can use it, just like we use any other value in our program. Like, assigning it to another variable.

int * ptr;
int a = 10;
ptr = &a;     // ptr now stores the address of a
int b = *ptr; // b now gets the value stored at memory address of a, which is 10

Here's the full program:

#include <stdio.h>
int main()
{
    int *ptr;
    int a,b;
    
    a = 10;
    ptr = &a;
    
    b = *ptr;
    
    printf("a=%d b=%d \n",a,b);

    return 0; 

}


Here's the output that I get for the above program:-

a=10 b=10


I would encourage you to try running each of the example programs shown in this lesson. That's it for this lesson.

In the next lesson, we shall see how pointers are used in C language to accomplish various tasks and also learn about pointer arithmatic.

Saturday, December 15, 2018

String tokenization in C

Let's say you have to write a C program to tokenize a string that contains a list of tokens separated by some delimiter, say a comma.

You could figure out your own algorithm for doing that, but usually a saner approach is to find a library function that already does that.

And C Programming language library does provide a function, called strtok that does exactly that. You provide it with a string and a delimiter string, and it helps you in splitting the string based on that delimiter.

So instead of writing our own function, we shall go forward and try to use strtok for our task. In doing so, we shall learn how to use it and other such functions.


Here is the function declaration of strtok function:

char *strtok(char *str, const char *delim);


From the function declaration above, we can see that strtok takes the string as its first argument and the delimiter/separator string as its second argument.
The first call to strtok shall contain the string to be parsed and the delimiter. This shall return the text between the start of the string and the first occurance of the delimiter string. To get the next token, we only need to pass a NULL as the first argument and the delimiter as the second. strtok maintains the context and knows it has to parse the same string that you provided earlier.

The following program shows an example.


#include <stdio.h>
#include <string.h>
int main()
{
    char str[50];
    char *token;

    strcpy(str,"abc,def,ghi");
    token = strtok(str,",");
    printf("%s \n",token);
    
    token = strtok(NULL,",");
    printf("%s \n",token);
    
    token = strtok(NULL,",");
    printf("%s \n",token);
    
    return 0;
}


In the above example, we have a string "abc,def,ghi" and we parse it into tokens abc, def and ghi.
Note, how we pass str as the first argument in the first call to strtok and NULL as the first argument in the subsequent calls. Here is how the output of this program is going to look like:

abc
def
ghi



strtok() returns a NULL if it is not able to find any tokens. We can use this property when there are unknown number of tokens to be parsed. We can keep calling strtok() in a loop after the first call, until it returns NULL.
#include <stdio.h>
#include <string.h>
int main()
{
    char str[50];
    char *token;

    strcpy(str,"abc,def,ghi,jkl,mno");
    token = strtok(str,",");
    
    while ( token != NULL ) 
    {
        printf("%s \n",token);
        token = strtok(NULL,",");
    }
    
    return 0;
}
Here is the output of above program:
abc
def
ghi
jkl
mno


We can also provide multiple delimiters to strtok, as shown below.
#include <stdio.h>
#include <string.h>
int main()
{
    char str[50];
    char *token;

    strcpy(str,"abc,def:ghi;jkl,mno");
    
    token = strtok(str,",;:");
    
    while ( token != NULL ) 
    {
        printf("%s \n",token);
        token = strtok(NULL,",;:");
    }

    return 0;
}
Here we pass a delimiter string which contains a comma, semicolon and a colon. strtok function checks for each of these and identifies a token if it finds any of these characters.
Output of above program:
abc
def
ghi
jkl
mno
Another thing about strtok that we need to know is that when it encounters more than one delimiters in succession, it considers them to be a single delimiter. Program below shows this scenario:
#include <stdio.h>
#include <string.h>
int main()
{
    char str[50];
    char *token;

    strcpy(str,"abc,,,def,,,,,,ghi");
    
    token = strtok(str,",");

    while ( token != NULL ) 
    {
        printf("%s \n",token);
        token = strtok(NULL,",");
    }

    return 0;
}
Output of the above program:
abc
def
ghi 
Also, if  a delimiter is encountered at the start or end of a string, strtok() ignores them, as shown below.
#include <stdio.h>
#include <string.h>
int main()
{
    char str[50];
    char *token;

    strcpy(str,",,,abc,,,def,,,,,,,ghi,,,,,,");
    
    token = strtok(str,",");
    while ( token != NULL ) 
    {
        printf("%s \n",token);
        token = strtok(NULL,",");
    }

    return 0;
}
Output of the above program:
abc
def
ghi 

There are a few common pitfalls that we should avoid while using strtok().

Never pass a constant char pointer as the first argument of strtok. That's because strtok changes the first argument internally, so passing a constant pointer shall result in a crash ( unless you have a signal handler implemented ).

Something like below will result in a crash:
#include <stdio.h>
#include <string.h>
int main()
{
    char *str = "abc,def";
    char *token;

    token = strtok(str,",");
    printf("%s \n",token);

    return 0;
}

Also, if want to maintain the string that is to be parsed for further usage, you should avoid passing the pointer to the string directly to strtok, as it changes its first argument internally. It's advisable to copy the string into a temporary buffer and pass that to strtok to get your tokens.

Next, strtok is not thread-safe. That's because it uses a static buffer internally. So, you should take care that only one thread in your program calls strtok at a time.

If you have a multithreaded program, then you should be using strtok_r function instead of strtok. strtok_r is a reentrant version of strtok. Let's understand how we can use strtok_r.

Here is the function declaration:

char *strtok_r(char *str, const char *delim, char **saveptr);


strtok_r has an additional third argument, saveptr, a pointer to a char pointer that is provided by the caller. strtok_r uses this saveptr to maintain context between subsequent calls for the same string. The value of saveptr should remain unchanged in all calls to the strtok_r for it work correctly.


Here is an example program showing strtok_r usage
#include <stdio.h>
#include <string.h>


int main()
{
    char str[] = "apple,orange,banana";
    char *saveptr;
    char *token;

    token = strtok_r(str,",",&saveptr);

    while ( token != NULL )
    {
        printf("%s \n",token);
        token = strtok_r(NULL,",",&saveptr);
    }

    return 1;
}
Output of the above program:
apple
orange
banana

Both strtok and strtok_r change their first argument, the pointer to the string supplied to it in the first call. As a programmer, you should be aware of this, while using strtok or strtok_r in your programs.


Now, let's say that you would like to know if there was no content between successive delimiters in the input string. With strtok you wouldn't be able to get this information, because it treats successive delimiters as one. For this task, its best to use strsep() function instead of strtok/strtok_r.
Here is an example of strsep usage:
#define TESTSTRING ("abc,,,def,,,ghi")
int main()
{
    char *str; 
    char *token;

    str = (char *) malloc(sizeof(char) * (strlen(TESTSTRING)+1));

    strcpy(str,TESTSTRING);
    
    token = strsep(&str,",");

    while (token != NULL)
    {
        if ( strlen(token) == 0 )
        {
            printf("No Content\n");
        }
        else
        {
            printf("%s\n",token);
        }

        token = strsep(&str,",");
    }

    return 0;
}
Here is the output of the above program:
abc
No Content
No Content
def
No Content
No Content
ghi 

Again with strsep, we have to keep in mind that it changes the pointer whose address is passed to it as the first argument.

With both strsep and strtok/strtok_r functions, its best to copy the string to be split into a temporary buffer and then use these functions for tokenization. By doing this we don't have to worry about changing the original string pointer, as it could have been passed to you from another function and you may not really know exactly which memory the string resides in or what the caller intends to do with it after you have returned.

So, as we can see, C library does provide us functions that help us with string tokenization, but we need to take care how we use them.