Get Your Computer Science Queries Answered

Sunday, December 3, 2006

C/C++ : Speed Variation

People believe that C code has faster execution speed than of C++. I argue otherwise. Many C++ gurus have spent their valuable time explaining that there's no difference. Now, I may lack credibility but the writer of C++, Bjarne Stroustrup, certainly doesn't. So, would you believe his words? While browsing the Internet, I found myself laying hands on one of the emails sent to Bajrne to explain his views on this contentious topic. Here's the email (in full):

TITLE: Speed and size of C versus C++

PROBLEM: ???

I also heard the size of C++ program is generally bigger than C. I am a C programmer trying to learn C++. So, don't blame me if I have created any misconception about C++.

RESPONSE: ???

For one minor detail: including printf/scanf can include more code than is actually used. An intelligent C++ linker will only get that parts of the stream library REALLY needed.

RESPONSE: cshaver@informix.com (Craig Shaver @ Informix Software, Inc.)

Who implements an intelligent C++ linker??? I was under the impression that you get all the functionality in a class when you link, whether you use it or not.

RESPONSE: bs@alice.att.com (Bjarne Stroustrup), 12 Jan 93
AT&T Bell Laboratories, Murray Hill NJ

Let me try to clear up one or two points. Consider first a somewhat minimal C and C++ program x1.c:

main()
{
int i;
for (i = 0; i < 1000000; i++ ) printf("Hi,mom!\n") ;
}

and its more C++ looking cousin x2.c:

main()
{
int i;
for (i = 0; i<1000000; i++ ) cout << "Hi,mon!\n" ;
}

I compiled and ran x1.c and x2.c:

c: cc x1.c
c: size a.out
text | data | bss | dec | hex
12288| 6144| 7608| 26040| 65b8

c: time a.out > /dev/null
25.5u 0.5s 37r a.out

c: PTCC x1.c
c: size a.out
text |data | bss | dec | hex
12288 | 6144 | 7620 | 26052 | 65c4

c: time a.out > /dev/null
25.7u 0.4s 33r a.out

PTCC is the driver for my standard off the-shelf Cfront 3.0 (i.e. I'm not using any technology you couldn't buy half a year ago). Note that the size of the generated code is essentially the same. So is the speed. Running these examples a few times to eliminate random error in the timing mechanism shows that the run-time isn't biased one way or the other.

This is what you should expect for a program in the common sub-set of C and C++. There is fundamental reasons for that. You should expect identical code from two C and C++ compilers using the same technology. The only possibly SYSTEMATIC difference I can think of is that a C++ compiler can use better function call sequences than a C compiler that doesn't apply global optimization because in many cases a C compiler must guard against possible calls with differing numbers of arguments where a C++ compiler doesn't need to because of C++'s stronger type checking. In most C and C++ compilers, this difference is theoretical, but I'm told that in Zortech C++ it is real (i.e. C++ programs are ever so slightly faster than their C equivalents). However, this is all noise, I doubt the difference between C and C++ in this kind of comparison matters to any real programmers. The difference is far smaller than differences between different C compilers - but surprisingly, it is in C++'s favor.

Programs in the common subset of C and C++ results in
equal sized code that execute at equal speed.

If that conclusion doesn't appear to hold, check if your C and C++ compilers are of similar quality. If your C++ compiler appears to loose badly you have the option of using a Cfront variant to get the benefits of your C compiler's code generation facilities. If your C compiler loose badly, switch to C++ even if you aren't ready to use the ``++ features.''

Now, a common argument is ``OK, so C++ can match C for a C programs but as soon as you use the REAL C++ features your programs get bigger and slower.'' Clearly you can write big and slow programs in any language (even C), but you don't necessarily take a performance hit when you start using C++. Consider x2.c. It uses the C++ stream I/O library that is certainly bigger than C's stdio and is unlikely to be tuned to the same degree as stdio. It is also a library that
uses a very large sub-set of C++'s features in its interface and implementation (operator overloading, multiple inheritance, virtual functions, etc.):

c: PTCC x2.c
c: size a.out
c: text | data | bss | dec | hex
17408 | 2048 | 0 |19456| 4c00
c: time a.out > /dev/null
32.8u 1.0s 43r a.out

Surprisingly enough, the code generated for x2.c is noticeably smaller than the code generated for x1.c (75% of x1.o) though - as expected it runs a bit slower (29% user cpu time, 16% better elapse time).

I claim, but cannot prove, that the run-time overhead is primarily a difference in tuning. Other programs that rely heavily on C++ features show improvements over their C counterparts - and others again show overhead. The differences does not appear systematic to me; that is, they are differences in design and effort, rather than inherent overhead in C or C++.

The space advantage of the C++ program is an advantage of the same kind; that is, it is there because a little extra care and thought was spent. Other implementations of stream I/O will show different space and time usage, as will different implementations of stdio. To do simple things only the essential parts of the stream I/O library is brought in. You don't actually need a very ``intelligent'' linker, the dumb old Unix ld will do: Just manually split your implementation into several .c files. A simple example:

X.h:
class X {
// details
public:
void f(); // common function
void g(); // uncommon function
// more functions
};

X1.c:
// common functions:

void X::f() { ... }

X2.c:
// uncommon functions:

void X::g() { ... }

Now, any half-way decent archive program can bring in the object code for X1.c (only) for programs that use the common functions (only) and leave the expense of bringing in the object code for X2.c for the programs that actually use functions defined in X2.c.

There exist linkers that can do that without human help (mostly in the PC world), I just happen not to have one. I think it is important to note that this technique and the tools that supports it carried over from C to C++. We wasn't at the mercy of some ``smart'' and possibly espensive or unavailable technology. We don't have to forget or loose all of our effective techniques in moving from C to C++. We should - as ever - use them with a suitable amount of judgement.

C++ was designed not to leave room "below'' for a lower level language, except assembler for machine specific operations.

I found this email on this following link:
http://nkari.uw.hu/Tutorials/CPPTips/split_impl

You can click on it to check for any inconsistencies. So, I take his words on this issue, do you?

Saturday, November 18, 2006

C++ : All About Temporaries

Even the most trivial statements, like A = B, in a computer language may produce temporaries. Moreover, the generation of these temporaries has to be standardized to maintain a language's efficacy. The C++ language is no exception to that rule.

Following is an embellished version of 'The C++ Standard'.

Temporary Objects

1 Temporaries of class type are created in various contexts: binding an rvalue to a reference, returning an rvalue, a conversion that creates an rvalue, throwing an exception, entering a handler, and in some initializations. Even when the creation of the temporary object is avoided, all the semantic restrictions must be respected as if the temporary object was created. [Example: even if the copy constructor is not called, all the semantic restrictions, such as accessibility, shall be satisfied. ]

2 [Example:

class X {
// ...
public:
// ...
X(int);
X(const X&);
˜X();
};

X f(X);

void g()
{
X a(1);
X b = f(X(2));
a = f(a);
}

Here, an implementation might use a temporary in which to construct X(2) before passing it to f() using X’s copy-constructor; alternatively, X(2) might be constructed in the space used to hold the argument. Also, a temporary might be used to hold the result of f(X(2)) before copying it to b using X’s copy-constructor; alternatively, f()’s result might be constructed (directly) in b. On the other hand, the expression a=f(a) requires a temporary for either the argument a or the result of f(a) to avoid undesired aliasing of a. ]

3 When an implementation introduces a temporary object of a class that has a non-trivial constructor, it shall ensure that a constructor is called for the temporary object. Similarly, the destructor shall be called for a temporary with a non-trivial destructor. Temporary objects are destroyed as the last step in evaluating the full-expression (a full-expression is an expression that is not a subexpression of another expression) that (lexically) contains the point where they were created. This is true even if that evaluation ends in throwing an exception.

4 There are two contexts in which temporaries are destroyed at a different point than the end of the full-expression . The first context is when an expression appears as an initializer for a declarator defining an object. In that context, the temporary that holds the result of the expression shall persist until the object’s initialization is complete. The object is initialized from a copy of the temporary; during this copying, an implementation can call the copy constructor many times; the temporary is destroyed after it has been copied, before or when the initialization completes. If many temporaries are created by the evaluation of the initializer, the temporaries are destroyed in reverse order of the completion of their construction.

5 The second context is when a reference is bound to a temporary. The temporary to which the reference is bound or the temporary that is the complete object to a subobject of which the temporary is bound persists for the lifetime of the reference except as specified below. A temporary bound to a reference member in a constructor’s ctor-initializer persists until the constructor exits. A temporary bound to a reference parameter in a function call persists until the completion of the full expression containing the call. A temporary bound to the returned value in a function return statement persists until the function exits. In all these cases, the temporaries created during the evaluation of the expression initializing the reference, except the temporary to which the reference is bound, are destroyed at the end of the full-expression in which they are created and in the reverse order of the completion of their construction. If the lifetime of two or more temporaries to which references are bound ends at the same point, these temporaries are destroyed at that point in the reverse order of the completion of their construction. In addition, the destruction of temporaries bound to references shall take into account the ordering of destruction of objects with static or automatic storage duration; that is, if obj1 is an object with static or automatic storage duration created before the temporary is created, the temporary shall be destroyed before obj1 is destroyed; if obj2 is an object with static or automatic storage duration created after the temporary is created, the temporary shall be destroyed after obj2 is destroyed. [Example:

class C {
// ...
public:
C();
C(int);
friend C operator+( const C&, const C& );
˜C();
};

C obj1;
const C& cr = C(16)+C(23);
C obj2;

the expression C(16)+C(23) creates three temporaries. A first temporary T1 to hold the result of the expression C(16), a second temporary T2 to hold the result of the expression C(23), and a third temporary T3 to hold the result of the addition of these two expressions. The temporary T3 is then bound to the reference cr. It is unspecified whether T1 or T2 is created first. On an implementation where T1 is created before T2, it is guaranteed that T2 is destroyed before T1. The temporaries T1 and T2 are bound to the reference parameters of operator+; these temporaries are destroyed at the end of the full expression containing the call to operator+. The temporary T3 bound to the reference cr is destroyed at the end of cr’s lifetime, that is, at the end of the program. In addition, the order in which T3 is destroyed takes into account the destruction order of other objects with static storage duration. That is, because obj1 is constructed before T3, and T3 is constructed before obj2, it is guaranteed that obj2 is destroyed before T3, and that T3 is destroyed before obj1. ]

Friday, November 17, 2006

C++ : The Object Destruction Process

A user-defined destructor is augmented in much the same way as are the constructors, except in reverse order:

1. If the object contains a vptr, it is reset to the virtual table associated with the class.

2. The body of the destructor is then executed; that is, the vptr is reset prior to evaluating the user-supplied code.

3. If the class has member class objects with destructors, these are invoked in the reverse order of their declaration.

4. If there are any immediate non-virtual base classes with destructors, these are invoked in the reverse order of their declaration.

5. If there are any virtual base classes with destructors and this class represents the most-derived class, these are invoked in the reverse order of their original construction.

Get Your Computer Science Queries Answered

Sunday, December 3, 2006

C/C++ : Speed Variation

Saturday, November 18, 2006

C++ : All About Temporaries

Friday, November 17, 2006

C++ : The Object Destruction Process

I Recommend

Tags

Archive

Recent Comments