Anybody who writes #pragma pack(1) may as well just wear a sign on their forehead that says “I hate RISC”

This post has been republished via RSS; it originally appeared at: Microsoft Developer Blogs - Feed.

When you use #pragma pack(1), this changes the default structure packing to byte packing, removing all padding bytes normally inserted to preserve alignment.

Consider these two structures:

// no #pragma pack in effect.
struct S
{
    int32_t total;
    int32_t a, b;
};

#pragma pack(1)

struct P
{
    int32_t total;
    int32_t a, b;
};

Both structures have identical layouts because the members are already at their natural alignment, Therefore, you would expect these two structures to be equivalent.

But they're not.

Changing the default structure packing has another consequence: It changes the alignment of the structure itself. In this case, the #pragma pack(1) declares that the structure P can itself be placed at any byte boundary, instead of requiring it to be placed on a 4-byte boundary.

struct ExtraS
{
    char c;
    S s;
    char d;
};

struct ExtraP
{
    char c;
    P p;
    char d;
};

Even though the structures S and P have the same layout, the difference in alignment means that the structures ExtraS and ExtraP end up quite different.

The ExtraS structure starts with a char, then adds three bytes of padding, followed by the S structure, then another char, and three more bytes of padding to bring the entire structure back up to 4-byte alignment. This ensures that an array of ExtraS structures will properly align all of the embedded S objects.

By comparison, the ExtraP structure starts out the same way, with a single char, but this time, there is no padding before the P because the P is byte-aligned. Similarly, there is no trail padding at the end of the structure because the P is byte-aligned and therefore does not need to be kept at a particular alignment in the case of an array of ExtraP objects.

  00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13
ExtraS c padding s.total s.a s.b d padding
ExtraP c p.total p.a p.b d

The effect is more noticeable if you have an array of these objects.

  00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27
ExtraS c padding s.total s.a s.b d padding c padding s.total s.a s.b d padding
ExtraP c p.total p.a p.b d c p.total p.a p.b d

Observe that in the array of ExtraS objects, the s.total, s.a, and s.b are always four-byte aligned. But in the array of ExtraP objects, there is no consistent alignment for the members of p.

The possibility that any P structure could be misaligned has significant consequences for code generation, because all accesses to members must handle the case that the address is not properly aligned.

void UpdateS(S* s)
{
 s->total = s->a + s->b;
}

void UpdateP(P* p)
{
 p->total = p->a + p->b;
}

Despite the structures S and P having exactly the same layout, the code generation is different because of the alignment.

UpdateS UpdateP
Intel Itanium
adds  r31 = r32, 4
adds  r30 = r32  8 ;;
ld4   r31 = [r31]
ld4   r30 = [r30] ;;












add   r31 = r30, r31 ;;
st4   [r32] = r31








br.ret.sptk.many rp
adds  r31 = r32, 4
adds  r30 = r32  8 ;;
ld1   r29 = [r31], 1
ld1   r28 = [r30], 1 ;;
ld1   r27 = [r31], 1
ld1   r26 = [r30], 1 ;;
dep   r29 = r27, r29, 8, 8
dep   r28 = r26, r28, 8, 8
ld1   r25 = [r31], 1
ld1   r24 = [r30], 1 ;;
dep   r29 = r25, r29, 16, 8
dep   r28 = r24, r28, 16, 8
ld1   r27 = [r31]
ld1   r26 = [r30] ;;
dep   r29 = r27, r29, 24, 8
dep   r28 = r26, r28, 24, 8 ;;
add   r31 = r28, r29 ;;
st1   [r32] = r31
adds  r30 = r32, 1
adds  r29 = r32, 2 
extr  r28 = r31, 8, 8
extr  r27 = r31, 16, 8 ;;
st1   [r30] = r28
st1   [r29] = r27, 1
extr  r26 = r31, 24, 8 ;;
st1   [r29] = r26
br.ret.sptk.many.rp
Alpha AXP
ldl   t1, 4(a0)




ldl   t2, 8(a0)




addl  t1, t1, t2
stl   t1, (a0)









ret   zero, (ra), 1
ldq_u t1, 4(a0)
ldq_u t3, 7(a0)
extll t1, a0, t1
extlh t3, a0, t3
bis   t1, t3, t1
ldq_u t2, 8(a0)
ldq_u t3, 11(a0)
extll t2, a0, t2
extlh t3, a0, t3
bis   t2, t3, t2
addl  t1, t1, t2
ldq_u t2, 3(a0)
ldq_u t5, (a0)
inslh t1, a0, t4
insll t1, a0, t3
msklh t2, a0, t2
mskll t5, a0, t5
bis   t2, t4, t2
bis   t5, t3, t5
stq_u t2, 3(a0)
stq_u t5, (a0)
ret   zero, (ra), 1
MIPS R4000
lw    t0, 4(a0)

lw    t1, 8(a0)

addu  t0, t0, t1

jr    ra
sw    t0, (a0)
lwl   t0, 7(a0)
lwr   t0, 4(a0)
lwl   t1, 11(a0)
lwr   t1, 8(a0)
addu  t0, t0, t1
swl   t0, 3(a0)
jr    ra
swr   t0, (a0)
PowerPC 600
lwz    r4, 4(r3)






lwz    r5, 8(r3)






addu   r4, r4, r5
stw    r4, (r3)






blr
lbz    r4, 4(r3)
lbz    r9, 5(r3)
rlwimi r4, r9, 8, 16, 23
lbz    r9, 6(r3)
rlwimi r4, r9, 16, 8, 15
lbz    r9, 7(r3)
rlwimi r4, r9, 24, 0, 7
lbz    r5, 8(r3)
lbz    r9, 9(r3)
rlwimi r5, r9, 8, 16, 23
lbz    r9, 10(r3)
rlwimi r5, r9, 16, 8, 15
lbz    r9, 11(r3)
rlwimi r5, r9, 24, 0, 7
addu   r4, r4, r5
stb    r4, (r3)
rlwimi r9, r4, 24, 0, 31
stb    r9, 1(r3)
rlwimi r9, r4, 16, 0, 31
stb    r9, 2(r3)
rlwimi r9, r4, 8, 0, 31
stb    r9, 3(r3)
blr
SuperH-3
mov.l @(4, r4), r2












mov.l @(8, r4), r3












add   r3, r2






rts
mov.l r2, @r4
mov.b  @(7, r4), r1
shll8  r1
mov.b  @(6, r4), r2
extu.b r2, r2
or     r2, r1
shll8  r1
mov.b  @(5, r4), r2
extu.b r2, r2
or     r2, r1
shll8  r1
mov.b  @r4, r2
extu.b r2, r2
or     r1, r2
mov.b  @(7, r4), r1
shll8  r1
mov.b  @(6, r4), r3
extu.b r3, r3
or     r3, r1
shll8  r1
mov.b  @(5, r4), r3
extu.b r3, r3
or     r3, r1
shll8  r1
mov.b  @r4, r3
extu.b r3, r3
or     r1, r3
add    r3, r2
mov.b  r2, @r4
shlr8  r2
mov.b  r2, @(1, r4)
shlr8  r2
mov.b  r2, @(2, r4)
shlr8  r2
rts
mov.b  r2, @(3, r4)

Observe that for some RISC processors, the code size explosion is quite significant. This may in turn affect inlining decisions.

Moral of the story: Don't apply #pragma pack(1) to structures unless absolutely necessary. It bloats your code and inhibits optimizations.

Bonus chatter: Once you make this mistake, you can't go back. You allowed the structure to be byte-aligned, and if you remove the spurious #pragma pack(1), you are making the structure more strictly aligned, which will be a breaking change for any clients which used the byte-packed version.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.