September 13, 2008

[C/C++] - Tips for Better Coding Style

In this entry, I show you 4 tips that address frequently asked questions from C++ programmers of all levels of expertise. It's surprising to discover how many experienced programmers are still unaware of the deprecation of the .h notation of standard header files, the proper usage of namespaces, and the rules regarding binding of references to temporary objects, for example. These issues and others will be discussed here.

First, we start by explaining the difference between the deprecated “xxx.h” header names and the modern, standard-compliant “xxx” header-naming notation. Next, we explore a few dark corners of C++ which due to compilers' limitations and the somewhat recondite nature of the associated language rulestend to confuse many programmers, e.g., the notion of comma-separated expressions and the rules of binding references to rvalues. Finally, we will learn how to invoke a function prior to a program's startup.

Tip 1: “iostream.h” or “iostream”?

Many C++ programmers still use “iostream.h” instead of the newer, standard compliant “iostream” library. What are the differences between the two? First, the .h notation of standard header files was deprecated more than five years ago. Using deprecated features in new code is never a good idea. In terms of functionality, “iostream” contains a set of templatized I/O classes which support both narrow and wide characters, as opposed to “iostream.h” which only supports char-oriented streams. Third, the C++ standard specification of iostream's interface was changed in many subtle aspects. Consequently, the interfaces and implementation of “iostream” differ from those of “iostream.h”. Finally, “iostream” components are declared in namespace std whereas “iostream.h” components are global.

Because of these substantial differences, you cannot mix the two libraries in one program. As a rule, use “iostream” unless you're dealing with legacy code that is only compatible with “iostream.h”.

Tip 2: Binding a Reference to an Rvalue

Rvalues and lvalues are a fundamental concept of C++ programming. In essence, an rvalue is an expression that cannot appear on the left-hand side of an assignment expression. By contrast, an lvalue refers to an object (in its wider sense), or a chunk of memory, to which you can write a value. References can be bound to both rvalues and lvalues. However, due to the language's restrictions regarding rvalues, you have to be aware of the restrictions on binding references to rvalues, too.

Binding a reference to an rvalue is allowed as long as the reference is bound to a const type. The rationale behind this rule is straightforward: you can't change an rvalue, and only a reference to const ensures that the program doesn't modify an rvalue through its reference. In the following example, the function f() takes a reference to const int:

void f(const int & i);

int main()

{

f(2); /* OK */

}

The program passes the rvalue 2 as an argument to f(). At runtime, C++ creates a temporary object of type int with the value 2 and binds it to the reference i. The temporary and its reference exist from the moment f() is invoked until it returns; they are destroyed immediately afterwards. Note that had we declared the reference i without the const qualifier, the function f() could have modified its argument, thereby causing undefined behavior. For this reason, you may only bind references to const objects.

The same rule applies to user-defined objects. You may bind a reference to a temporary object only if it's const:

struct A{};

void f(const A& a);

int main()

{

f(A()); /* OK, binding a temporary A to a const reference*/

}

Tip 3: Comma-Separated Expressions

Comma-separated expressions were inherited from C. It's likely that you use such expressions in for- and while-loops rather often. Yet, the language rules in this regard are far from being intuitive. First, let's see what a comma separated expression is.

An expression may consist of one or more sub-expressions separated by commas. For example:

if(++x, --y, cin.good()) /*three expressions*/

The if condition contains three expressions separated by commas. C++ ensures that each of the expressions is evaluated and its side effects take place. However, the value of an entire comma-separated expression is only the result of the rightmost expression. Therefore, the if condition above evaluates as true only if cin.good() returns true. Here's another example of a comma expression:

int j=10;

int i=0;

while( ++i, --j)

{

/*..repeat as long as j is not 0*/

}

Tip 4: Calling a Function Before Program's Startup

Certain applications need to invoke startup functions that run before the main program starts. For example, polling, billing, and logger functions must be invoked before the actual program begins. The easiest way to achieve this is by calling these functions from a constructor of a global object. Because global objects are conceptually constructed before the program's outset, these functions will run before main() starts. For example:

class Logger

{

public:

Logger()

{

activate_log();

}

};

Logger log; /*global instance*/

int main()

{

record * prec=read_log();

//.. application code

}

The global object log is constructed before main() starts. During its construction, log invokes the function activate_log(). Thus, when main() starts, it can read data from the log file.

2 comments:

Unknown said...

Re tip 3... while these uses of the comma operator are appealingly concise in folding what are conceptually distinct statements into one, they can also make the code less readable and maintainable. To help accurate "scan/reading" of a program, it's good to be able to accurately guess the significance of a section of code based on context.

Part of this is that stuff appearing in an expression is relevant to the expression's value (e.g. prefer the separation of assignment, test and increment in "for (int i = 0, j = 9; j; ++i, --j)" over "int j=10; int i=0; while( ++i, --j)", [assuming i and j are only needed within the loop, else move "int i,j" before but leave initialisation within the for loop]).


Another aspect is that short-circuit evaluation can be applied mentally (e.g. "if (known_true || don't even have to read this - just look at next statement)" which fails for "if (known_true || whatever, but_its_really_this_that_counts)".

Minh Hoa said...

For Tony:
This tips is only written to notice programmers about the effect of using commas in code. I do not recommend programmers use this way because it will make the application become hardly to read, modify and maintain.
Thanks for your comments :)