Blig

July 2, 2008

GNU C++ POD madness

Filed under: Lowlevel — by suasol @ 10:47 am

(You may want to have a look at the previous article for a refresher on data and const data sections)

Constant non-POD objects cost more than you think. There are penalties in

* code size (executable size)
* working set size (“dirty” memory)
* program startup speed

What is a POD object? POD means “plain old data” and basically means a basic type or a C struct. Once you have constructors or assignment operator or most of the stuff that 10,000 page “guru” books tell you to have, you become non-POD.

First some source and the asm output

struct PodPoint { int x; int y; };
extern const PodPoint origin = { 5,7 };
// MSVC
PUBLIC    ?origin@@3UPodPoint@@B                ; origin
CONST    SEGMENT
?origin@@3UPodPoint@@B DD 05H                ; origin
    DD    07H
CONST    ENDS
// GCC
        .globl origin
        .section        .rodata
        .align 4
        .type   origin, @object
        .size   origin, 8
origin:
        .long   5
        .long   7

Perfect! If we were to disassemble this, we’d get 8 bytes exactly as we expect, 4 each for x and y.

Now, being a good C++ citizen surely we will add a constructor?

struct NonPodPoint
{
    int x;
    int y;
    NonPodPoint(int a,int b) : x(a), y(b) {}
};
extern const NonPodPoint origin(5,7);

PUBLIC	??0NonPodPoint@@QAE@HH@Z			; NonPodPoint::NonPodPoint
; Function compile flags: /Ogtpy
;	COMDAT ??0NonPodPoint@@QAE@HH@Z
_TEXT	SEGMENT
_a$ = 8							; size = 4
_b$ = 12						; size = 4
??0NonPodPoint@@QAE@HH@Z PROC				; NonPodPoint::NonPodPoint, COMDAT
; _this$ = ecx
; File c:\dev\a.cpp
; Line 5
	mov	edx, DWORD PTR _b$[esp-4]
	mov	eax, ecx
	mov	ecx, DWORD PTR _a$[esp-4]
	mov	DWORD PTR [eax], ecx
	mov	DWORD PTR [eax+4], edx
	ret	8
??0NonPodPoint@@QAE@HH@Z ENDP				; NonPodPoint::NonPodPoint
_TEXT	ENDS
PUBLIC	?origin@@3UNonPodPoint@@B			; origin
_DATA	SEGMENT
?origin@@3UNonPodPoint@@B DD 05H			; origin
	DD	07H
_DATA	ENDS

        .section        .ctors,"aw",@progbits
        .align 4
        .long   _GLOBAL__I_origin
        .text
        .align 2
        .type   _Z41__static_initialization_and_destruction_0ii, @function
_Z41__static_initialization_and_destruction_0ii:
        pushl   %ebp
        decl    %eax
        movl    %esp, %ebp
        jne     .L5
        cmpl    $65535, %edx
        jne     .L5
        movl    $5, origin
        movl    $7, origin+4
.L5:
        popl    %ebp
        ret
        .size   _Z41__static_initialization_and_destruction_0ii, .-_Z41__static_initialization_and_destruction_0ii
        .align 2
        .type   _GLOBAL__I_origin, @function
_GLOBAL__I_origin:
        pushl   %ebp
        movl    $65535, %edx
        movl    %esp, %ebp
        movl    $1, %eax
        popl    %ebp
        jmp     _Z41__static_initialization_and_destruction_0ii
        .size   _GLOBAL__I_origin, .-_GLOBAL__I_origin
.globl origin
        .bss
        .align 4
        .type   origin, @object
        .size   origin, 8
origin:
        .zero   8

MSVC does really well here, it’s almost the same as the optimal case with one important difference – somehow the definition of “origin” was moved to the _data section and thus is unshared and writable.

GCC (4.1.2 here) sadly completely gives up. It reserves some zeroed space and then arranges for the constructor to be called in the static initialization phase (that magic twilight between program startup and when main() is called)

Much of the cost is in the code for the constructors. Roughly speaking, we need a mov per member initialized (6 bytes). As you can imagine this really adds up for arrays. Again we lose the benefits of the .rodata section. Finally, if we have many objects requiring static initialization, startup can be slower.

The moral of the story: If you have constants, store them in POD types, not objects. If you’re careful, (and use compile-time asserts) you could cast your POD data into non-POD data.

Advertisements

Data & Read-only sections

Filed under: Lowlevel — by suasol @ 9:30 am

First lets look at the code generated with some simple examples.

extern int global_writable_var = 9; // a.cpp
extern const int global_const_var = 9; // a.cpp

Generate the assembly source with

cl /c /Fa a.cpp # MSVC: /c means compile only, don't link. /Fa means output assembly
gcc -c -S a.cpp # GCC: -c means compile only, don't link. -S means output assembly

And we get:

// MSVC : a.asm
PUBLIC	?global_writable_var@@3HA	; global_writable_var
PUBLIC	?global_const_var@@3HB		; global_const_var
CONST	SEGMENT
?global_const_var@@3HB DD 09H		; global_const_var
CONST	ENDS
_DATA	SEGMENT
?global_writable_var@@3HA DD 09H		; global_writable_var
_DATA	ENDS

// GCC : a.s
.globl global_writable_var                  ; attributes of the variable
        .data
        .align 4
        .type   global_writable_var, @object
        .size   global_writable_var, 4
global_writable_var:
        .long   9                           ; variable definition
.globl global_const_var
        .section        .rodata
        .align 4
        .type   global_const_var, @object
        .size   global_const_var, 4
global_const_var:
        .long   9

The key thing to notice here is that the compiler puts constant data into a separate areas (try adding more variables). Typically these areas (known as the _data and _const segments on windows, known as .data and .rodata sections on unix) will have different memory protection bits enabled – thus writes to global_const_var would cause a hardware exception. Also if you’re compiling a shared library or have several executables running, then each time the library is loaded the data segment is duplicated, but the constant section may be shared between all instances.

Thus if you have a library (dll) which is linked many times (as is very common in linux for instance – try “ldd /path/to/executable” on some programs), the overhead can really add up.

See also:

Blog at WordPress.com.