Thursday, February 16, 2006

testing memory failure allocation

there is a difference of how the memorey failure is handled in C++.
int *pa = new int[10000];

in the article "The new and delete Operators" the MSDN says:
"Beginning in Visual C++ .NET 2002, the CRT's new function (in libc.lib, libcd.lib, libcmt.lib, libcmtd.lib, msvcrt.lib, and msvcrtd.lib) will continue to return NULL if memory allocation fails. However, the new function in the Standard C++ Library (in libcp.lib, libcpd.lib, libcpmt.lib, libcpmtd.lib, msvcprt.lib, and msvcprtd.lib) will support the behavior specified in the C++ standard, which is to throw a std::bad_alloc exception if the memory allocation fails."

hence is vital to know which one you are using.
"Normally, if you #include one of the C++ standard headers, like , you'll get a /defaultlib directive in your object that will reference the appropriate C++ Standard Library according to the CRT model you used (the /M* compiler options). Generally, that will cause the linker to use the throwing operator new from the C++ Standard Library instead of the nonthrowing one from the main CRT, because the order of defaultlib directives will cause libcp.lib to be searched before libc.lib (under /ML)."

hence if you have the following code:

int *pn = new int[1000];

// ok, pn is valid

the testing is valid only if you use the CRT new and not when you use the std::new. with the std::new this is not valid, it will throw an exception (i am not sure what will pn point to).

Wednesday, February 08, 2006

remember the rule

sometimes its hard to figure out which method is called when looking at a hierarchy of classes, which use virtual specifier for their methods.

but, as always, if you know how the things works internally, you will always find the right answer.
I always think that instead of remembering 10 specific cases (which it might not be hard at all for many people, but me for example, I have a bad memory) its much easier to remember 1 thing only, how it works.

this is a case for the virtual specifier.
the thumb rule is this (this is from MSDN):
When calling a function using pointers or references, the following rules apply:
1.A call to a virtual function is resolved according to the underlying type of object for which it is called.
2.A call to a nonvirtual function is resolved according to the type of the pointer or reference.

but this rule might not be always simple to apply.
What is really good is to know why the things are happing like described in those 2 rules. Let's take each of them:
1. when you apply the virtual specifier to a function, at compile time, the compiler will do some things:
it will add an invisible vtable member ( a pointer to a table) to your class;
the vtable is initialized in constructor right after the base classes constructors are called (look here for construction order: this is a reason why when calling a virtual function in the constructor of a base class when creating an object of derived class, the virtual function is not called because the vtable is not initialized yet!)

also, its good to remember that there is only one vtable member in the derived class no matter how the big class hierarchy is and how many virtual functions are;

the term 'underlaying type' is used because you can always store inside a base object an a derived class object.

B b;
A *pa = &pb;

when you call pa->f() ( and f() is virtual declared in class A ) the vtable of the object points to the B::f() version, because the 'underlaying' type is B (i found the name confusing)

when we have an object A* we dont know if the object is an A or a B.
void function(A *pa)

2. if its not virtual then the function called is the one for the object class (the function is not in the vtable)

Tuesday, February 07, 2006

Final classes

i camed across a nice C++ question which shows the final class principle (as I read here: it seems that the final class specifier exist in Java)

1. A class that is not derived from a class nor is intended to have any class derived from it is an example of what type of class?
A. A concrete class
B. An abstract class
C. A base class
D. A virtual class
E. A final class

A final class is a class from which you dont derive (if you try to, the objects of the derived classes cant be constructed since the compiler will complain about this).
So how can you design such class ?

In C++ there is no keyword (final) to declare a class as non-inheritable as in Java.
But then C++ has its own features which you may exploit to get the same behaviour.
Basically it uses concepts of private constructor and friend class.

For example, lets try to create a class called CFinal.
class CFinal
// member data ...

We want to be able to use this class, of course, to create objects from it but what we dont want is to be able to use this class as a base for other classes. Or, in other words, if anybody creates a class derived from CFinal, it will not be able to create objects of this derived class.

The first idea will be to make the CFinal constructors private but doing so will prevent us to create objects of it. We need to way to be able to make construction unavailable for the derived classes only.

The solution:
We use the fact that in C++, if a class has a friend class, the friendship is not inheritate in any way; if the friend class has a derived class, this does not mean that the derived class is a friend also for the initial class

class Temp
private: ~Temp() { };
friend class CFinal;

class CFinal : virtual public Temp
{. . .};

Again, if we have this:
class CDerived : CFinal
{. . .}
CDerived is NOT a friend of the Temp class.

So that now if some one tries to inherit from this CFinal class, compilation gives error as this class cannot call constructor of its super class i.e.
Yet, you can create CFinal objects, because as a friend of the Temp class, your class has access to the private constructor.

The whole idea i got it from this article:
"Working with the Final Class in C++" by Zeeshan Amjad

Friday, February 03, 2006

subtle difference when using const_cast

ive just read this article: which shows the subtle difference (when is good \ bad) when using the const_cast on objects.

thinking from the point of view of what a variable may contain (its value) and where this value is stored in computer memory, you can think of 2 kinds of const'ness:

a 'true const':

const int cn = 5; // true const variable; it might be stored in computer ROM

and a 'contractual const':

int num = 0;

const int * pci = # // *pci is a contractual const int

if you try to remove the constness of a variable in order to modify its value, its good to know wheater that variable is a 'true' or 'contractual' const.

modifing a 'true' const variable is undefined (and not desirable to do).

the idea is that a pointer to a const variable is still (just) a pointer containig the address of a variable.

that variable, if you know that its not a true const (like const int num = 0) you can modify it safely.

in other words, 'const' can be more than just 'don't change the value', more then a specifier. it might have implications of how the variable is kept in memory.

name hidding again

the name hidding seems to be pretty compiler specific unfortunately.
for example:
struct A
int x;

struct B: A
int x;

struct C: A, B
void f() { x = 0; }

int main()
C i;

it says that "The assignment x = 0 in function C::f() is not ambiguous because the declaration B::x has hidden A::x."

when i see this I was confused, what the hell; indeed, B::x hiddens A::x but since i is of C type, both B::x and A::x are available to C.
i just put this code in Visual C++ 2003 and it gives (of course) the compile error:

e:\Projects\test2\test2\test2.cpp(15): error C2385: ambiguous access of 'x' in 'C' could be the 'x' in base 'A::x' or the 'x' in base 'B::x'

now, i bet that there are many things like this, compiler specific (the C++ ref from the link is for the IBM compiler);
and this makes life harder.

on the next interview, you can be very smart and write that this code works OK; if you get a mocking smile, you can ask politely "do you have an IBM C compiler ?"

the best C++ reference

i dont think there is a better reference on C++ then these:

Effective & More Effective C++
C++ Programming Language
C++ Faqs
Effective STL
and... MSDN C++ reference
MSDN yes, its really good; it has many C++ little language stuff you forget over time; although its short in description, if you have the time and patience, its good to read from time to time to refresh the concepts and things, but read it from start to end; it has many gritty things I (you) at time pass by, forget;
also, it has many things you find in C++ books (but explained more better in books).
btw, many of the stuff from my blog can be found in the MSDN C++ reference :D so why am I still writing here ? because its a good practice to actually explain in writing for myself different stuff


there is a thing called name dominance in C++. consider this:

class A
int f() { printf("\nA::f()"); return 1; }

class B : virtual public A
void f() { printf("\nB::f()"); } // hides the int A::f()

class C : virtual public A

class D : public B, public C


the hierarchy is this:


and D has only one copy of A in memory (beacuse B and C inherits virtual from A)

D d;

f() is not ambigous because void B::f() hides int A::f().
d.f() will get B:f() called because D derives from B which hides the A::f().
Note that this is not ambigous beacause the A is virtual inherited in B and C and this makes that the d object to contain only one instance of A within it. if, for example, B or C will not have virtual inherited from A (either one of them or both) this will mean that d will have 2 A objects within it.

now, consider this case:

C *d = new D;

d->f() will get A:f() called.
Why is that stays in the mechanism of calling a function.
in the first case, when d is a D object:
D d;
when you call
the function which is get called is determined at compile time; the function called is the D::f()

in the 2nd case, when
C *d = new D;
when you call
the object d is really a D object but since the function called is not virtual (not taken from the vtable) the function called is from C object scope, which is A::f(), beacuse C inherits A::f().

hence, the rule is this:
if the function called is not virtual, the function called is the one from the type of the object type.

public inheritance broken

consider this:

class A
void f();
void f(int x);

class B : public A
void f(); // hides all A::f() overloads

you have an object
B b;
and you want to call the f(int x)

normally you would think that since B inherits A, it inherits all the A functions, which is true, but not when for overloading.
the name f() declared in B class will override any base names. this includes the following case:

class A1
void f();

class B1 : public A1
int f; // hides all A1 'f'names

if you try to write the call:
you will get a compile error

as within a class, the names in a inheritance tree shoule be unique;

if you want to preserve the is-a relationship(a derived class should access all public inherited base class functionality) then you need to declare in the derived class the base functions by using 'using' declarator:

class B : public A
void f(); // hides all A 'f'names
using A::f; //OK, now we have the A::f(int x) within the B


Note that 'using' works for the seeing in derived classes the overloads, it will not work for the 2nd case, when the base class is hiding a base class function by a member variale (because its not possible to have within the same class B 2 names f() and int f)

this is a problem described in Effective C++. Item 33. 'Avoid hiding inherited names'

Thursday, February 02, 2006

Data alignment

Data alignment is (very?) important; or at least is very important to be aware of it when you write your code.
Consider this:
struct Broadcast
char timezone; // 1 byte data
int frequency; // 4 byte data
short timeofday; // 2 byte data

applying the sizeof operator on the Broadcast (sizeof can be applied to a class) the result is 12 on a 32 bit operating system.
this is because compilers usually round up a data structure's size to make it divisible by 2, 4, or 8—depending on the hardware's memory alignment requirements.

Most CPUs require that objects and variables reside at particular offsets in the system's memory. For example, 32-bit processors require a 4-byte integer to reside at a memory address that is evenly divisible by 4. This requirement is called "memory alignment". Thus, a 4-byte int can be located at memory address 0x2000 or 0x2004, but not at 0x2001. On most Unix systems, an attempt to use misaligned data results in a bus error, which terminates the program altogether. On Intel processors, the use of misaligned data is supported but at a substantial performance penalty.

Secondly, compilers insert padding bytes between data members to ensure that each member's address is properly aligned. The problem is that when members are declared in a random order, the compiler may need to insert more padding bytes between them, thereby inflating the data structure's size.

The padding which compiler is doing is because each data type must be in memory at memory addressed which are divisible by their own size.
for example:
an int object should be always at a 4 divisible memory address
a short object should be always at a 2 divisible memory address
an double object should be always at a 8 divisible memory address

Hence Broadcast after compilation will be like this:
struct Broadcast
char timezone; // 4 byte data (padded with 3 bytes)
int frequency; // 4 byte data
short timeofday; // 4 byte data (padded with 2 bytes)

without padding the data looks like this:
0 1 5 7 8 12
1 1111 11 1 1111 1 1

this is taken from this:

also look here

you can think that even if you try to arrange in this optimized way the structures, the processor might take some time for example for this:
char timezone; // 1 byte data //gets 1 padding byte
short timeofday; // 2 byte data -> these 2 will get into 4 byte memory location
int frequency; // 4 byte data

which is properly aligned (each member is on a good memory boundary)
you can think that, on a 32 bit processor for example, reading the short variable within the 4 byte address( the 4 byte memory location will contain the char, a padding byte and the 2 bytes from short ) might take some time, but it doesnt.
the main point of this alignament is this: the goal is to have a structure where member data does not get bad aligned:
when a variable value needs to be read, the CPU should not be forced (due to bad alignament) to read the value from 2 memory locations (on the 32 bit CPU).
for example, if an int (4bytes) variable is not on a memory address divisible by 4, this means that the variable 'stay' in 2 memory addresses.
the CPU in this case needs to fetch 2 memory addresses, extract the parts from each other and concatenate them in order to get the value.

the compiler by default (if you dont change the default #pragma) will try to aligned the structs on a 2, 4 or 8 respectively boundary in order to avoid the CPU to get into the above described overhead; what it does not do is to arrange for you the structs members such that to take less size (which you should do).