Random Essays on Programming

Meaningless Variable Names Considered Useful

Although meaningful variable names can make reading code easier, they often lead to mistakes and can make debugging difficult. Consider three experiences I had:

In my first job interview at Microsoft, the programmer who later became my supervisor asked me to write a function to trim spaces from the ends of a string. I had written similar routines many times and quicky wrote down a function. I usually keep a variable called "length" to keep track of the length of the string so I don't have to recalculate it all the time. Because of the particular requirements of this function, though, it was more convenient to keep the "length" variable pointing to the last character in the string, one less than the length. Of course, the name of variable misled me and I had several bugs as a result.

A few weeks before writing this essay, I found a bug in a co-worker's code here at UNC. He had a C++ class that kept track of identifiers in an array with an associated variable called "NumElements". Again, he was using that variable to point to the last element in the list instead of the number of elements. One searching loop was written as "for (i = 0; i < NumElements; i++)", missing the last element.

A few summers ago I had to modify a 500-line assembly language routine that was doing some fancy memory allocation and hardware initialization. I was forced to step through the whole code and keep track of each register's contents. The meaninglessness of the register's names made this task more difficult, but they insured that the code was actually doing what was intended and not what was implied by the variable names.

In the first two cases, had the variables been called "a", the programmers would not have been misled. Of course, the correct solution to both is to either rename the variable or use it to hold the length. However, even if the variables are named correctly, the programmer immediately assumes that its name reflects its contents without checking. Meaningless variable names force the programmer to ask, "What's in this 'a' variable being used as a loop bound?"

It would be nice to have an editor that could switch into "meaningless" mode for debugging purposes, replacing all identifiers with random letters.

GOTOs Considered Useful

Sometimes, a "goto" is preferable to the extra clutter necessary to get the work done. For example, this code is hard to read:

    do {
	scanf ("%d", &sel);
	if (sel < 0) {
	    printf ("Invalid selection.\n");
	    done = 0;
	} else {
	    done = 1;
	}
    } while (!done);

It's clumsy because the while() loop will almost never loop. Loop constructs should be used when looping needs to be done. It's clearer to use a goto, as in:

askagain:
    scanf ("%d", &sel);
    if (sel < 0) {
	printf ("Invalid selection.\n");
	goto askagain;
    }

This is not only shorter, but it uses one less variable and there's no loop construct involved. Unindent labels for clarity, and always keep the label within half of a screen height from the goto so that the reader can find it easily. Here's another example of a place where a goto is useful:

    for (i = 0; i < 128; i++) {
	for (j = 0; j < 128; j++) {
	    if (a[i][j] == 5) {
		break;
	    }
	}
	if (j < 128) {
	    break;
	}
    }
    if (i < 128) {
	printf ("Found at %d, %d\n", i, j);
    } else {
	printf ("Not found.\n");
    }

The above code is strange to read because the two "if" statements don't make immediate sense. Here is another ugly solution to this common problem:

    found = 0;
    for (i = 0; i < 128 && !found; i++) {
	for (j = 0; j < 128 && !found; j++) {
	    if (a[i][j] == 5) {
		found = 1;
	    }
	}
    }
    if (found) {
	printf ("Found at %d, %d\n", i - 1, j - 1);
    } else {
	printf ("Not found.\n");
    }

This solution is also clumsy because the ending condition of the loops must be modified to break out early. Also, the loop indices will be incremented before the loops end, so they must be decremented at the end. This is awkward and it would take a few minutes for a reader to understand what's going on. Here is the same problem, solved with a goto statement:

    for (i = 0; i < 128; i++ ) {
	for (j = 0; j < 128; j++) {
	    if (a[i][j] == 5) {
		goto found;
	    }
	}
    }
    printf ("Not found.\n");
    return;
found:
    printf ("Found at %d, %d\n", i, j);

This is smaller, simpler, faster, and easier to read. Some people claim that a label should only come after its goto, and some people claim it should always come before. I don't think it matters as long as it's close and the label name implies where the goto is going ("found", "again", etc.). Another good use for a goto is to break out of functions that need some cleanup code. For example:

    f = open (...);
    read (f, &magic, sizeof (long));
    if (magic != 0xABCD) {
	fprintf (stderr, "Bad magic.\n");
	goto die;
    }
    ... /* Long routine to read from the file */

    return 0;
die:
    close (f);
    /* Whatever other cleanup */
    return -1;
}

This avoids two possible problems: to have the cleanup code at every possible exit point in the function, which is error-prone; or having a large "else" statement, which isn't scalable:

    if (magic != 0xABCD) {
	printf ("Bad magic.\n");
	error = -1;
    } else {
	... /* Long routine to read from the file */
    }

    close (f);
    return error;
}

In the last example, avoid having a "return -1" right after the failure because it will cause the function to have multiple exit points. This can be a problem if someone later wants to add some cleanup code and forgets to cleanup before every "return".

On Programming Testing

It is unacceptable for a programmer to release new code to his team without testing it. By "testing" I don't mean running one or two obvious cases through it; I mean being reasonably sure that there are no bugs. There are several ways to test software: (1) banging on it, (2) doing an operational proof, and (3) doing an assertional proof. The first is what's most commonly used to see if the program has any bugs. As Dijkstra says, it's a great way to verify the presence of bugs, but a lousy way to show their absence. When method (1) finds problems or if there's a particularly nasty piece of code, methods (2) or (3) should be used. Method (2) is easy: you walk through the code and "play computer", simulating the computer's move at every instruction. There are two problems with this technique: humans make lousy computers, and you will probably make assumptions about input values or the state at any given point that aren't true. Method (3) avoids these problems at a fairly large cost: the proofs are tedious and can take pages of work for even a few lines of code. An assertional proof keeps track of all the possible state between each program statement and verifies through formalisms that the state we wish the program to end in is actually reachable from the start.

Here are a few examples of where assertional verification would have been useful. A few years ago I wrote some code to convert a string to an integer. The top of the routine checked for invalid characters and a negative sign:

    if (!isdigit (s[0])) {
	return ERROR;
    }
    sign = 1;
    if (s[0] == '-') {
	sign = -1;
	s++;
    }

I tested the routine with method (1) fairly well, but forgot to give any negative numbers (my need for the routine at the time only involved positive numbers). A few days later my needs changed and I passed it a negative number; the routine failed. It took quite a bit of method (2) to figure out that !isdigit() was not really what I wanted to test for. An assertional proof would have immediately shown that the test for the negative sign would have never succeeded.

My officemate at the time wrote this snippet to free a linked list:

    while (p != NULL) {
	q = p->next;
	free (p);
	p = q;
    }
    free (p);

Even the simplest of assertional proofs would have shown that the second free() could only be reached if p was NULL.

That same summer (at Microsoft), I found two seperate routines written by two different people in two different programs to imitate the strstr() function. (strstr() searches a string for a particular substring.) Both had the same bug:

    char *search (char *s1, char *s2)
    {
	/*
	 * Looks for s2 in s1, returns location in s1 or NULL.
	 */

	char *s, *match;

	for (; *s1; s1++) {
	    match = s1;
	    for (s = s2; *s; s++) {
		if (*s1 == *s) {
		    s1++;
		} else {
		    break;
		}
	    }
	    if (!*s) {
		return match;
	    }
	}

	return NULL;
    }

(Spend a minute here finding the bug.) Method (1) is very unlikely to find this bug. Method (2) is also very unlikely, just because the reader will cheat in the mental simulation and not consider every possible combination of input strings.

It's not reasonable to expect programmers to constantly write out pages of formal proofs for every routine they write. Particularly complex or bug-prone code should be scrutinized more formally than methods (1) and (2) allow, though. Two tools can help in this respect:

I wrote a program a few years ago to do trivial assertional proofs of programs. It was very limited, but while writing the program I ran into a bit of tricky code where a particular setting of a debug switch should execute one printf inside a loop, and another setting should execute another after the loop. (This was more complicated than it sounds.) I fed this 10-line segment into the program with printfs replaced with setting boolean variables (printdebug1 and printdebug2 for inside and after the loop), and asked the program to prove that exactly one of the two would get printed: that printdebug1 != printdebug2 after the code segment was executed. The program returned "TRUE". Without any complicated testing or mental walk-through of code I was able to prove without a doubt (within the correctness of the proving program and my input to it) that my code was going to behave correctly. Programs are rarely, if ever, used to prove the correctness of critical sections of code, which is surprising considering how easy it is to do.

The second help is an intelligent editor or compiler. (The former would be harder to write but would catch more subtle bugs.) Either of these could do simple tracking of possible variable ranges. A fairly simple test would have detected my negative-value "if" above by tracking the range of s[0]. Some compilers already do this (range-tracking), although I haven't seen them warn users on this basis. For example, gcc will convert this code:

    i *= 2;
    i++;

into:

    i <<= 1;    /* Easy enough */
    i |= 1;     /* We know it's even, so just OR it in */

This is impressive and could lead to very helpful compile-time warnings.

In general, be wary of claims that code has been tested -- it is often not tested nearly enough and ends up costing much more wasted time later when bugs pop up and it's not obvious what module they're in. Get used to the formalisms that assertional proofs use. They even help when doing operational proofs.