Initializing Strings

There really are no strings in C, but there are character arrays. Okay, semantics aside, a string is a character array, no matter which language you're using. In C, the string is coded by enclosing it in double quotes. Internally, the string is capped with a NULL character, \0, at the end. This is all information you'll find in my C language books. Common stuff.

Like pointers, strings must be initialized before they're used. This can happen in a number of ways:

• The string is created and initialized at the same time. For example:

char string[] = "Hello there, Nerdly.";

Or you can use the more risky pointer method:

char *string = "Hello there, Nerdly";

But just don't modify that type of initialized string.

• The string is created using some function. I call this "building a string." For example, you use strcpy() to copy the contents of one string to another.

• You manually assign characters to a string. For example, you first create the string buffer or array:

char buffer[48];

Then you use some code to pack the buffer with characters. But therein lies the trouble: A proper string must be capped with that NULL character. When you don't, you find yourself wandering down the path to trouble.

The Path To Trouble

The results you get when you use a string that hasn't been properly initialized are potentially bad. Take this code:

trouble.c

#include <stdio.h>

int main()
{
	char buffer[48];

	printf("The buffer contains \"%s\"\n",buffer);
	return 0;
}
	

The string buffer is used without being initialized. The results could be this output:

The buffer contains ""

But most likely the output will be a string of garbage characters, data left over in memory that just spews out all over the screen. That's probably not what you want.

Capping The String

In my code, what I do is initialize an empty string variable the same way I initialize pointers. Here's how I'd fix the code for trouble.c:

notrouble.c

#include <stdio.h>

int main()
{
	char buffer[48];

	buffer[0] = '\0';
	printf("The buffer contains \"%s\"\n",buffer);
	return 0;
}
	

The new line sets the first character of the buffer string to \0, the NULL character. By doing so, the string is essentially fixed. It's a real string. And it can be used like any other string.

More Problems

Initializing strings works well, but another problem occurs when you're doing your own string manipulations. In fact, even some of the string.h functions have this problem: the NULL is not placed at the end of the string. It's up to you to remember to do that and it's often a source of woe.

One solution to the problem is to initialize the entire buffer, filling it with NULL characters. It's what I call zeroing out, or filling a buffer with the \0 character. This technique erases whatever was in memory previously and lets the buffer start empty, which can also be a security advantage as any old data is wiped clean when you zero-out the buffer.

You could use a loop to zero out a buffer, but instead check to see whether your C library comes with the memset() function. Here's the man page format:

void *memset(void *b, int c, size_t len);

b is a string or character buffer, initialized or not.

c is a character that will pad the buffer.

len is the buffer's size, or actually the number of times to write character c to the buffer.

The memset() function returns the character used to pad the buffer and the function prototype is contained in the string.h header file.

Here is a modification of the on-going program, but this time memset() is used to fully clear and initialize the buffer:

solution.c

#include <stdio.h>
#include <string.h>

#define BUF_LEN 48

int main()
{
	char buffer[BUF_LEN];

	memset(buffer,'\0',BUF_LEN);
	
	printf("The buffer contains \"%s\"\n",buffer);
	return 0;
}
	

Note that a few more modifications were made to the program in addition to adding memset(). Here's the rundown:

  • The string.h header file is included for prototyping the memset() function.
  • For convenience sake, because the value 48 is now used twice, I've set it equal to BUF_LEN using a #define statement.
  • The memset() function fills the buffer with NULL characters.

The rest of the program proceeds as normal, though the buffer, by being fully packed with NULL characters, is initialized and zero'd, ready for data.

The Now He Tells Us, Dept.

The memset() function isn't just used to zero out values in a buffer. You can specify any value and fill a buffer with that value. NULL is just handy because it helps totally zap out a potential string. But if you merely need to zero out a string, there is another function to use: bzero(). Here is the man page format:

void bzero(void *b, size_t len);

The function has the same arguments as memset(), but the argument c is not needed; the value zero is used automatically. Like memset(), the bzero() function requires the string.h header file included.

Here is the modified solution.c code, which uses bzero():

solution2.c

#include <stdio.h>
#include <string.h>

#define BUF_LEN 48

int main()
{
	char buffer[BUF_LEN];

	bzero(buffer,BUF_LEN);
	
	printf("The buffer contains \"%s\"\n",buffer);
	return 0;
}

Copyright © 2007 by Quantum Particle Bottling Co.
All rights reserved