C Struct Padding — 3-Byte Pad Corrupted 50% Packets
A 3-byte padding mismatch between x86 and ARM silently corrupted 50% network packets.
- A struct gives each member its own memory slot; a union makes all members share one block.
- Structs are for data that coexists; unions for data that is mutually exclusive.
- Padding aligns members to CPU boundaries — sizeof(struct) often exceeds sum of its members.
- Union type-punning is undefined behavior unless reading via char/unsigned char.
- Always pair a union with an enum tag to track the active member.
- Reorder struct fields largest-to-smallest to minimize padding and save memory.
Every real-world program deals with grouped data. A game needs to track a player's name, health, score, and position together. A network driver needs to interpret the same 4 bytes as either an IPv4 address, a 32-bit integer, or four individual octets depending on context. Trying to manage all of that with loose individual variables is like trying to run a hospital with sticky notes instead of patient records — technically possible, catastrophically unmanageable. Structures and unions are C's answer to that chaos.
The problem they solve is fundamentally about organisation and memory semantics. A struct gives you a custom data type that bundles related variables under one name, each with its own guaranteed memory slot. A union takes that idea and flips the memory model — all members share the same block of memory, which means you get type-reinterpretation and memory efficiency at the cost of only being able to use one member at a time. These aren't just syntax features; they're tools that let you model the real world accurately in code.
By the end of this article you'll understand exactly how struct and union memory layouts work, when each is the right tool, how to combine them for practical patterns like tagged unions, and the exact mistakes that trip up even experienced C developers. You'll also be able to confidently answer the interview questions that separate candidates who've read about C from those who've actually used it.
Structs: Grouping Related Data With Dedicated Memory
A struct (short for structure) lets you define a composite data type — a single named container that holds multiple members, each with its own type. The compiler allocates memory for every member independently, so all fields exist simultaneously and can be read or written in any order.
The real power isn't just convenience — it's that a struct becomes a first-class type. You can pass it to functions, return it, put it in arrays, and point to it. This lets you model domain concepts directly. A 'Player' struct isn't just three variables that happen to be related; it's a single coherent entity your code can reason about.
Under the hood, struct members are laid out sequentially in memory, but the compiler is allowed to insert padding bytes between members to satisfy alignment requirements of the target CPU. This means sizeof(struct Player) might be larger than you expect, and it's the first thing you need to internalise before you do anything serious with structs in systems programming or binary file I/O.
Use structs whenever you have data that naturally belongs together and needs all its fields present at the same time — think database records, configuration objects, game entities, or network packet headers.
Memory Layout and Padding — Why sizeof Surprises You
This is the section most tutorials skip, and it's the one that causes the most real-world bugs. CPUs are picky about alignment — a 4-byte int wants to live at a memory address that's divisible by 4. A double wants an address divisible by 8. When the compiler lays out struct members sequentially, it inserts invisible padding bytes to honour these constraints.
Consider a struct with a char (1 byte) followed by an int (4 bytes). The char sits at offset 0, but the int needs to start at offset 4 — so 3 bytes of padding are inserted silently. The struct's total size also gets padded at the end so that arrays of the struct keep every element aligned.
This matters enormously in three situations: serialising structs to binary files or network packets (padding bytes contain garbage), computing offsets manually, and squeezing memory in embedded systems. The fix in the first two cases is either reordering your members largest-to-smallest (which often eliminates padding naturally) or using __attribute__((packed)) / #pragma pack — but only when you truly need it, because unaligned access is slower on most architectures and outright illegal on some.
Unions: One Memory Location, Many Interpretations
A union looks syntactically identical to a struct but operates on a completely different principle: all members share the same starting address and the same block of memory. The union's size equals the size of its largest member. Writing to one member and reading from a different one reinterprets the raw bytes — which is either a powerful tool or a disaster, depending on whether you do it intentionally.
The classic legitimate use cases are: type-punning (reinterpreting the raw bytes of a float as a uint32_t, for example), memory-mapped hardware registers where the same address has different meanings, and building tagged unions (also called discriminated unions) where a type tag tells you which member is currently valid.
The illegitimate use — writing member A and reading member B expecting a meaningful 'conversion' — is undefined behaviour in C for most type combinations. The exception is char/unsigned char, which you're always allowed to use to inspect raw bytes.
Combining Structs and Unions — Building Real Data Structures
In production C code, structs and unions almost always appear together. A pure union with no tag is hard to use safely. A struct with no unions is sometimes wasteful. Combine them and you get expressive, memory-efficient data models.
A common real-world pattern is a variant record — a struct that represents one of several possible entity types, where the correct interpretation depends on a discriminator field. This pattern powers everything from protocol buffer implementations to expression trees in compilers.
Another key pattern is bit fields inside structs, which let you pack boolean flags and small integers into individual bits rather than full bytes. This is critical in embedded systems where a microcontroller might have only 2KB of RAM.
Bit Fields and Packed Structs: Fine-Grained Control of Memory Layout
Bit fields let you specify the exact number of bits each member occupies. They're invaluable for hardware register maps, protocol flags, and any scenario where every byte counts. The syntax unsigned int flag : 1; declares a 1-bit field. Multiple bit fields can be packed into the same underlying storage unit.
However, bit fields are highly implementation-defined. The compiler decides whether fields are allocated from left to right or right to left, whether they span storage unit boundaries, and whether int bit fields are signed or unsigned. This makes them non-portable across compilers and even across compiler versions.
Packed structs (__attribute__((packed)) or #pragma pack(1)) force the compiler to remove all padding. They guarantee byte-exact layout, which is essential for wire protocols and binary file formats. The cost: every member access becomes an unaligned memory access. On x86 this is slow; on ARM prior to v6 it crashes. Always benchmark before deploying packed structs in hot paths.
| Feature / Aspect | struct | union |
|---|---|---|
| Memory allocation | Each member gets its own dedicated memory slot | All members share a single memory block |
| Total size | Sum of all member sizes + padding bytes | Size of the largest single member |
| Simultaneous members | All members are valid and accessible at all times | Only the last-written member is valid |
| Primary use case | Grouping related data that all needs to coexist | Type-punning, variant types, memory-mapped registers |
| Safety | Inherently safe — no conflicts between members | Unsafe unless paired with a type tag (discriminator) |
| Padding behaviour | Padding inserted between members for alignment | Padding added only at the end to round up to largest member's alignment |
| Array of elements | Common and straightforward — each element is independent | Possible but unusual — all elements share the same size |
| Nested usage | Can contain unions as members (tagged union pattern) | Can contain structs as members (anonymous struct inside union) |
| Typical domains | Application data models, protocol headers, game entities | Embedded systems, compilers, network protocol parsers |
Key Takeaways
- A struct allocates independent memory for every member — all fields coexist. A union allocates memory for only its largest member — all fields overlap. This single difference defines every use case for each.
- The compiler inserts silent padding bytes between struct members for CPU alignment. Reordering fields largest-to-smallest typically reduces or eliminates padding, which matters at scale and in embedded systems.
- A bare union is almost always a bug waiting to happen. Always pair a union with an enum tag inside a struct — this creates a tagged union (discriminated union) that's the only safe pattern for using unions in application code.
- Never memcpy or memcmp raw structs across a network boundary or to a binary file — padding bytes hold uninitialised garbage. Serialise field-by-field or zero-initialise the entire struct with = {0} before populating it.
- Packed structs and bit fields give you byte-exact control but at the cost of portability and speed. Use them only when the wire format or hardware forces it; otherwise, optimize alignment naturally.
Common Mistakes to Avoid
- Reading a union member that wasn't the last written
Symptom: You write to union.float_value and read union.int_value expecting an implicit conversion. The program outputs garbage or crashes with undefined behavior.
Fix: Always track the active union member with an enum tag. Only read the member that matches the current tag. For type-punning, use memcpy to unsigned char buffer instead. - Using memcmp or memcpy on padded structs for equality or serialization
Symptom: Two structs with identical field values may fail memcmp due to uninitialized padding bytes. Sending raw struct over network transmits garbage data, potentially violating protocol.
Fix: Zero-initialize struct with = {0} to clear padding. Write field-by-field comparison and serialization functions that ignore padding. - Assuming pointer cast between struct types with same first field is safe
Symptom: Casting between unrelated struct pointer types and reading through the wrong type leads to undefined behavior, even if they share a common first field.
Fix: Use a proper tagged union or a void* with an explicit type enum instead of relying on undefined pointer casting. - Applying __attribute__((packed)) to every struct thinking it saves memory everywhere
Symptom: Unaligned memory accesses on ARM cause bus errors or trap handlers, degrading performance by 10x. The struct size shrinks but the code runs slower.
Fix: Only pack structs that need exact layout (network/disk protocols). For internal data, optimize by reordering fields largest-to-smallest instead. Profile before and after packing.
Interview Questions on This Topic
- QExplain memory alignment and padding. Why might a struct containing a char and a double occupy 16 bytes instead of 9?Mid-levelReveal
- QImplement a 'Tagged Union' to represent a generic Shape that can be either a Circle (radius) or a Rectangle (width, height). Write an
area()function for it.SeniorReveal - QHow do you minimize memory usage in a struct without using bit-fields or compiler-specific pragmas?Mid-levelReveal
- QWhat is the difference between a 'packed' struct and a standard struct, and what are the performance trade-offs of using 'packed'?SeniorReveal
- QWhat is the output of sizeof(U) if union U { int a; double b; char c[10]; }? Explain the logic involving alignment requirements.JuniorReveal
- QWhen would you use a union instead of a struct, and what safety measures would you put in place?Mid-levelReveal
Frequently Asked Questions
What is the difference between a struct and a union in C?
A struct allocates separate memory for each member, so all fields exist simultaneously and can be read or written independently. A union allocates one shared block of memory sized for its largest member, meaning only one member holds a valid value at any given time. Structs model entities with multiple concurrent properties; unions model a single value that can be interpreted as different types.
Why is sizeof(struct) larger than the sum of its members?
The compiler inserts padding bytes between struct members to satisfy CPU alignment requirements — for example, a 4-byte int must start at an address divisible by 4. There may also be trailing padding at the end so that arrays of the struct keep each element correctly aligned. You can see exact offsets using the offsetof macro from stddef.h.
Can I use a union to convert between types, like writing a float and reading an int?
This is called type-punning and the rules are nuanced. In C, reading a union member that wasn't the last one written is technically undefined behaviour for most type combinations, meaning the compiler is not required to give you a predictable result. The one guaranteed exception is reading through an unsigned char array, which always gives you the raw bytes. For deliberate type-punning (like inspecting the bit pattern of a float), use memcpy into an unsigned char buffer instead — it's always defined behaviour and modern compilers optimise it to zero overhead.
How do bit-fields work within a C struct?
Bit-fields allow you to specify the exact number of bits each member should occupy. For example, 'int flag : 1;' allocates exactly 1 bit for that integer. This is highly useful for mapping hardware registers or saving memory on boolean flags, though it can impact access speed due to the extra CPU instructions required to mask and shift bits.
What are anonymous structs and unions, and when would you use them?
C11 introduced anonymous struct and union members. They allow nested members to be accessed directly without a name. For example, if you have a struct containing an anonymous union, you can write data.i instead of data.u.i. This is useful for flattening a tagged union where the tag and union are at the same level, reducing verbosity. Use sparingly — it can make the layout less obvious.
That's C Basics. Mark it forged?
5 min read · try the examples if you haven't