Wednesday, February 28, 2007

C++ Objects Part 5: Virtual Base Classes

A virtual base class is a class that is included only once in your derived class no matter how many base classes refer to it. How does this happen?

The answer is two-fold:

First, the actual layout of a class with a virtual base depends on the full class. Consider this case:

class R { };
class B1 : public virtual R { };
class B2 : public virtual R { };
class D : public B1, public B2 { };

In this case R is a virtual base that is included twice in D. The location of R will be different for B1, B2 and D. Each time one of these classes is set up, R is placed in the class once. Since R cannot be in D twice, clearly the location of R relative to B1 and B2 can't be the same when B1 or B2 is the final class vs. D. (If both B1 and B2 took R with them, we'd have two copies in D, which is not acceptable!)

To get around this, we hit the second part of our answer: classes contain pointers to their virtual bases. With non-virtual containment, bases are contained, e.g. a fixed chunk of the derived class's memory is used for the base class. With a virtual base, the derived class simply contains a pointer to the base class. This lets us move R around and adjust the pointer to R (in B1 or B2) at runtime in a way that lets us use different memory layouts for R with various classes that contain B1 or B2.

It should be noted that in D while there is only one copy of R, there are multiple pointers to R - one in B1 and one in B2. This is what assures that the memory layout of B1 and B2 don't change when they are used in a derived class. Only the contents of the layout of B1 and B2 change. The compiler generates code to set up these pointers for us.

For CodeWarrior, pointers to virtual bases come before the pointer to the vtable. This means that to find the vtable we have to know how many virtual bases a class has. Fortunately this information is dependent only on class info that is available at compile time - it never changes.

Here's a C version of what we have above:

struct R {
R_vtable * vtbl;
// R data
};

struct B1_partial {
R * vbase;
B1_vtable * vtbl;
// data for B1
};

struct B1_final {
B1_partial self;
R vbase;
};

struct B2_partial {
R * base;
B2_vtable * vtbl;
// data for B2
};

strut B2_final {
B2_partial self;
R vbase;
};

struct D {
R * vbase;
R vbase_data;
B1_partial base1;
B2_partial base2;
};

One of the interesting things to note here is that B1 and B2 as included in D are not as large as B1 and B2 when used by themselves. This is because B1 and B2 must have storage for the virtual base R at their end when used by themselves. But when used in D, that storage is provided by D.

Now we can see why dynamic cast first goes to the most derived base. Given a ptr to R we have no
idea where the rest of the object is -- position of R varies with the actual final type - that is, the relative position of R in the object is not the same relative to B1 if the real object is B1 or D.

Fortunately the vtable of any object tells us where the final object's real start is. So when R is a virtual base, depending on its final type, it will refer to a vtable that takes into account the actual full class's layout, letting us recover the full type and figure out the true relationship between the virtual base and the whole object.

Note that a static down-cast from a virtual base will cause a compiler error. This is the compiler telling us that it can't even speculate what the memory layout of the virtual base in the final class might be without inspecting the real type at runtime. But dynamic down-casts and even cross-casts are legal, since all dynmic casts are a maximum down cast followed by a possible up-cast.

1 comment:

  1. good work. even though c# take out virtual base from c++, the other cost related to multiple inheritance remains.
    Rio

    ReplyDelete