Tuesday, November 7, 2006

C++: What are vptr and vtbl in C++ ?

Well well, now this is one of the most asked questions of all times in C++. Let me be clear first. These two data structures are not mentioned in 'The C++ Standard'. Rather, are implementation details of C++. So, are not part of the standard C++.

Eligibility: you should have a clear understanding of polymorphism with C++.


Why do we need them?

The question that arises is 'Why do we need them?' The answer is simple. To support polymorphism and RTTI. Let's look at a piece of code in C++.

class Parent {
    public:
        Parent():data(0) { }
        void doSomeThing ( ) {
            //.......
            showData( ) ; //equivalent to this->showData( );
        }
        virtual void someFunction( ) { //...... }
    private:
        int data;
        virtual void showData ( ) {
            cout << data ;
        }
};


class Child :     public Parent {
    public:
        Child():data(10) { }
    private:
        int data;
        void showData ( ) {
            cout << data ;
        }
};

int main ( ) {
    Parent *p = 0;
    p = new Child ( ); // p points to Child Now
    p -> doSomeThing ( ); //gives 10
    return 0;
}

How to produced intermediate-code capable of handling polymorphic behavior?

The crux of the problem for compiler, is that, just like main ( ) as shown here, many other functions in a large program would call doSomeThing( ) through parent class object. This object may actually (as in this case) point to the derived class object. Now, the compiler has to produce an intermediate code that could decide which showData( ) to call whenever doSomeThing( ) is called through a Parent class pointer. If the pointer points to Parent object, cout should output 0, else if the pointer points to the Child object it should output 10.

To make the decision, it employs the technique of vptr (virtual pointer) and vtbl (virtual table). Every class object that exhibits polymorphic behavior (has virtual function) embeds a pointer. This pointer is vptr.


What does vptr point to?

Now the question arises, 'what does this pointer (vptr) point to?'. To answer this question we need to understand where the member functions are stored. Every member function of a class is stored statically with a mangled name. Say, doSomeThing( ) would be stored as __Parent__d001( ), as would be the two virtual functions __Parent__showData001( ) and __Child_showData001( ). So, the code like

Parent p = new Parent ( );
p.doSomeThing( );

is changed to

//...
__Parent__d001(p);

which means, call __Parent__d001( ) through p. Now, let's see what are the effects of the keyword 'virtual' before a function declaration. A derived class declaration can:

  1. Override the virtual function.
  2. Not override the virtual function.
To incorporate both the situations, a compiler creates a data structure called vtbl (for virtual table). Every class has one or more vptrs and vtbls, but are in equal numbers. In simple scenario (as with above code) we will have one vptr and one vtbl, and would consider it for explanation. The vtbl can be considered having first member as the 'type_info' of the class, and the rest members (variable numbers -- but at least one) are pointers to virtual functions of the class object. The structure of vtbl is given below:



When a derived class decides not to override the virtual function the address of Parent's virtual function is added to the vtbl. But if, the derived class decides to override the function, the address of the overriding function replaces the old address. Thus, yielding the figure shown on the left-hand side. The positions of the functions remains fixed within the hierarchy of vtbls. Here, &__XXX__showData001( ) will always be the first entry within the vtbls.







Where in object is vptr stored?

Now, the second question to be answered in our journey to find out what does vptr point to, is, 'Where in the object structure is the vptr stored?'. The answer is anywhere you would like to. Most standard compilers choose to embed vptr as the last member (to have C language compatibility), whereas others like Microsoft compilers make it as the first member of the object structure. So, considering the former case, we will reach to the following object layout:

Now, we are in a position to answer the question. The vptr of the object points to the respective vtbl.















Rewriting the intermediate code...



So the polymorphic code:

void doSomeThing ( ) {
    //.......
    showData( ) ;
}

can be written as


void
doSomeThing ( ) {
    //.......
    this->vptr[1](this);
    //[1] is the second entry of vtbl, which is showData( )

}

Now depending upon the type of 'this', i.e. the type of object that called doSomeThing( ), it's respective virtual function would be called.

Hurrah! That solves the problem of embellishing polymorphic intermediate code properly. In the same way the type_info for every object can be checked to see if it can be properly cast into any other type. Thus enabling Runtime Type Information (RTTI).

3 comments:

Anonymous said...

u r doing gr8 work dude wish u best of luck honestly u r a doing the gr8 job by helping students like us to fix ur problems

Unknown said...

could you plz tell me what is the difference between the two for loop given below.

for(i=0;i<10;i++)


for(i=0;i<10;++i)


thank you.

Unknown said...

what is storage class in c?