Monday, November 13, 2006

C++ : What's Undefined behavior in C++ ?

Let me quote here the 2003 standard for C++

...
1.3.12 undefined behavior

behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements. Undefined behavior may also be expected when this International Standard omits the description of any explicit definition of behavior. [Note: permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed. ]

1.3.13 unspecified behavior

behavior, for a well-formed program construct and correct data, that depends on the implementation. The implementation is not required to document which behavior occurs. [Note: usually, the range of possible behaviors is delineated by this International Standard. ]
...

One of the frequently asked questions to Bjarne is:

Why are some things left undefined in C++?

And here's his answer....

Because machines differ and because C left many things undefined. For details, including definitions of the terms "undefined", "unspecified", "implementation defined", and "well-formed"; see the ISO C++ standard. Note that the meaning of those terms differ from their definition of the ISO C standard and from some common usage. You can get wonderfully confused discussions when people don't realize that not everybody share definitions.

This is a correct, if unsatisfactory, answer. Like C, C++ is meant to exploit hardware directly and efficiently. This implies that C++ must deal with hardware entities such as bits, bytes, words, addresses, integer computations, and floating-point computations they way they are on a given machine, rather than how we might like them to be. Note that many "things" that people refer to as "undefined" are in fact "implementation defined", so that we can write perfectly specified code as long as we know which machine we are running on. Sizes of integers and the rounding behavior of floating-point computations fall into that category.

Consider what is probably the the best known and most infamous example of undefined behavior:

int a[10];
a[100] = 0; // range error
int* p = a;
// ...

// range error (unless we gave p a better
// value before that assignment)
p[100] = 0;



The C++ (and C) notion of array and pointer are direct representations of a machine's notion of memory and addresses, provided with no overhead. The primitive operations on pointers map directly onto machine instructions. In particular, no range checking is done. Doing range checking would impose a cost in terms of run time and code size. C was designed to outcompete assembly code for operating systems tasks, so that was a necessary decision. Also, C -- unlike C++ -- has no reasonable way of reporting a violation had a compiler decided to generate code to detect it: There are no exceptions in C. C++ followed C for reasons of compatibility and because it too compete directly with assembler (in OS, embedded systems, and some numeric computation areas). If you want range checking, use a suitable checked class (vector, smart pointer, string, etc.). A good compiler could catch the range error for a[100] at compile time, catching the one for p[100] is far more difficult, and in general it is impossible to catch every range error at compile time.

Other examples of undefined behavior stems from the compilation model. A compiler cannot detect an inconsistent definition of an object or a function in separately-compiled translation units. For example:

// file1.c:
struct S { int x,y; };
int f(struct S* p) { return p->x; }


// file2.c:
struct S { int y,x; };

int main(){
    struct S s;
    s.x = 1;
    int x = f(&s); // x!=s.x !!
    return 2;
}
Compiling file1.c and file2.c and linking the results into the same program is illegal in both C and C++. A linker could catch the inconsistent definition of S, but is not obliged to do so (and most don't). In many cases, it can be quite difficult to catch inconsistencies between separately compiled translation units. Consistent use of header files helps minimize such problems and there are some signs that linkers are improving. Note that C++ linkers do catch almost errors related to inconsistently declared functions.

Finally, we have the apparently unnecessary and rather annoying undefined behavior of individual expressions. For example:

void out1 { cout << 1 ; }
void out2 { cout << 2 ; }
int main(){
    int i = 10;
    int j = ++i + i++; // value of j unspecified
    f(out1(),out2()); // prints 12 or 21
}
The value of j is unspecified to allow compilers to produce optimal code. It is claimed that the difference between what can be produced giving the compiler this freedom and requiring "ordinary left-to-right evaluation" can be significant. I'm unconvinced, but with innumerable compilers "out there" taking advantage of the freedom and some people passionately defending that freedom, a change would be difficult and could take decades to penetrate to the distant corners of the C and C++ worlds. I am disappointed that not all compilers warn against code such as ++i+i++. Similarly, the order of evaluation of arguments is unspecified.

IMO far too many "things" are left undefined, unspecified, implementation-defined, etc. However, that's easy to say and even to give examples of, but hard to fix. It should also be noted that it is not all that difficult to avoid most of the problems and produce portable code.


This is Bjarne at his best.

3 comments:

Unknown said...

As main is afunction which is to be called by the system on the requirement of the exe..... it must return a value 0 to say program successful in any other case system takes the program to be unsuccessful

Zaman Bakshi said...

I have quoted Bjarne.

Zaman Bakshi said...

Although it is bad programming practice, in C++, with proper main() writing 'return' statement is optional (is implicit).