C Strings — Null Terminator Forgotten: The 3 AM Pager
Forgotten null terminator after strncpy? Production crashes with longer names.
20+ years shipping performance-critical C and C++ systems. Everything here is grounded in real deployments.
- C strings are char arrays terminated by a null byte (\0).
- No built-in length field — strlen() walks the array O(n) each time.
- Buffer overflows happen when copying to an undersized destination.
- Use fgets() instead of gets() or scanf("%s") for safe input.
- sizeof gives total array bytes; strlen gives character count — they differ by 1.
Imagine you're writing letters on a long strip of paper, one letter per box, and at the very end you draw a big red STOP sign so whoever's reading knows the message is finished. That's exactly how C stores text — one character per memory slot, with a special invisible 'stop' character at the end. Without that stop sign, your program wouldn't know where your message ends and would keep reading random garbage off the paper.
Every program that talks to a human needs text. Whether it's a login prompt, an error message, a username, or a file path — text is everywhere. In languages like Python or JavaScript, strings are cosy, fully managed objects that do a lot of heavy lifting for you. C, on the other hand, hands you the raw tools and trusts you to build the house yourself. That might sound scary, but understanding how C handles text under the hood makes you a dramatically better programmer in any language.
The core problem C strings solve is deceptively simple: how do you store a sequence of characters in memory and then find where that sequence ends? Memory is just a giant numbered grid of bytes. There's no built-in concept of 'a word' or 'a sentence'. C's answer is a convention called the null-terminated string — store your characters in consecutive memory slots and place a special zero-value byte at the end as a sentinel. Every standard library function that works with strings relies on this single rule.
By the end of this article you'll know exactly how C strings are stored in memory, how to declare and initialise them correctly, how to manipulate them using the standard library, and — most importantly — how to avoid the buffer overflows and undefined behaviour that trip up even experienced developers. You'll be reading real code, seeing real output, and walking away with a mental model that actually sticks.
What a C String Actually Is in Memory
A C string is not a special type — it's just a pointer to a sequence of 'char' values stored in contiguous memory, where the last character is always '\0' (the null terminator, ASCII value 0). That's it. There's no hidden length field, no magic object — just raw bytes in a row.
Think of RAM as a long street of numbered houses. Each house holds one character. When C stores the word 'Hello', it rents five houses in a row — one for 'H', one for 'e', one for 'l', one for 'l', one for 'o' — and then immediately rents one more house where it places a STOP sign (the '\0'). So 'Hello' actually occupies 6 bytes, not 5.
This is why the length of a string and the memory it needs are different numbers. strlen() counts the characters before the stop sign. sizeof() tells you the total space including the stop sign. Confusing these two is one of the most common beginner mistakes, so burn that distinction into your memory right now.
Whenever a standard library function like printf or strcpy reads a C string, it starts at the first character and keeps going until it hits that '\0'. That's the contract every piece of C string code relies on. Break that contract — forget the null terminator — and your program wanders into memory it doesn't own.
sizeof() to get the number of characters in a string — use strlen(). sizeof gives you the byte size of the array variable, not the logical length. They're only the same for single-character strings by coincidence. This mix-up causes off-by-one bugs that are incredibly hard to track down.Three Ways to Declare a String — and Which One to Use When
C gives you three different ways to create a string, and each one behaves differently in memory. Picking the wrong one at the wrong time is a classic source of bugs.
The first way is a character array initialised with a string literal: 'char name[] = "Alice";'. The compiler figures out the right size, copies the characters including the null terminator into stack memory, and gives you a mutable buffer you can change. This is the go-to choice when you need to modify the string later.
The second way is to give the array an explicit size: 'char name[50] = "Alice";'. Now you've got 50 bytes reserved, with 'Alice\0' at the start and the rest zeroed out. This is what you want when you're planning to read user input into the buffer — you're pre-allocating the space.
The third way is a pointer to a string literal: 'const char *message = "Hello";'. This does NOT copy the string into a regular variable. Instead, the string 'Hello\0' lives in a read-only section of your program's memory, and 'message' is just a pointer to it. Trying to modify this string causes undefined behaviour — the program might crash, might silently corrupt data, or might appear to work fine on your machine and explode on someone else's. Always mark these 'const'.
The Essential String Functions You'll Use Every Day
C's standard library ships with a set of string functions in <string.h> that cover the operations you'll need constantly — measuring length, copying, joining, comparing, and searching. They're thin, fast, and they all depend on that null terminator contract we talked about.
strlen(s) walks the string from the start until it hits '\0' and returns how many steps it took. O(n) — it actually loops through every character each time you call it, so don't call it inside a loop's condition if you can avoid it.
strcpy(destination, source) copies every character from source into destination, including the final '\0'. The danger: it blindly trusts that destination is big enough. If it isn't, you've just written past the end of your buffer — a classic buffer overflow. Prefer strncpy or snprintf for safer copying.
strcmp(a, b) returns 0 if the strings are identical, a negative number if a comes before b alphabetically, and a positive number if a comes after b. Do NOT use == to compare strings in C — it compares pointer addresses, not content.
strstr(haystack, needle) finds the first occurrence of 'needle' in 'haystack' and returns a pointer to it, or NULL if not found. It's O(n*m) in the worst case, but fine for short strings.
Complete C String Functions Reference Table
Below is a comprehensive reference of the most commonly used functions from <string.h>. Each function operates on null-terminated strings unless noted. Remember: buffer sizes must include space for the terminating null byte.
| Function | Signature | Purpose |
|---|---|---|
| strlen | size_t strlen(const char *s) | Returns number of characters before '\0'. O(n). |
| strcpy | char strcpy(char dest, const char *src) | Copies src to dest including '\0'. Unsafe if dest smaller than src. |
| strncpy | char strncpy(char dest, const char *src, size_t n) | Copies at most n chars. Does NOT null-terminate if src length >= n. Use with manual termination. |
| strcat | char strcat(char dest, const char *src) | Appends src to end of dest. Unsafe if combined length exceeds buffer. |
| strncat | char strncat(char dest, const char *src, size_t n) | Appends at most n chars from src. Always null-terminates. |
| strcmp | int strcmp(const char s1, const char s2) | Lexicographic comparison. Returns 0 if equal, <0 if s1 < s2, >0 if s1 > s2. |
| strncmp | int strncmp(const char s1, const char s2, size_t n) | Compares at most n characters. |
| strchr | char strchr(const char s, int c) | Finds first occurrence of char c in s. Returns pointer or NULL. |
| strrchr | char strrchr(const char s, int c) | Finds last occurrence of char c. Use for path separators, file extensions. |
| strstr | char strstr(const char haystack, const char *needle) | Finds first occurrence of substring needle. |
| strspn | size_t strspn(const char s, const char accept) | Returns length of initial segment consisting only of chars in accept. |
| strcspn | size_t strcspn(const char s, const char reject) | Returns length of initial segment with no chars from reject. Use to strip trailing newline. |
| strtok | char strtok(char str, const char *delim) | Tokenizes string. Modifies original string. Not thread-safe; use strtok_r instead. |
| memset | void memset(void s, int c, size_t n) | Fills first n bytes of s with byte c. Use to reset buffers. |
| memcpy | void memcpy(void dest, const void *src, size_t n) | Copies n bytes regardless of null bytes. Faster than strcpy for binary data. |
| memmove | void memmove(void dest, const void *src, size_t n) | Like memcpy but handles overlapping regions safely. |
For formatted string operations, use sprintf, snprintf, sscanf (covered later in this article). For thread safety, prefer the _r variants of strtok and strerror.
String Tokenization with strtok()
Tokenization is the process of splitting a string into smaller pieces called tokens, based on a set of delimiter characters. In C, strtok() does this in a stateful, destructive way. It's part of <string.h> and is widely used for parsing CSV lines, command arguments, or any delimited data.
- First call: pass the string to be tokenized and a string of delimiters. strtok scans from the start, skipping leading delimiters, then returns a pointer to the first token. It replaces the first delimiter after the token with '\0', modifying the original string.
- Subsequent calls: pass NULL as the first argument. strtok continues from the saved position inside the library (static variable) and returns the next token.
- Returns NULL when no more tokens are found.
Because strtok uses internal static state, it's not thread-safe. In multi-threaded code, use strtok_r (POSIX) which takes an explicit save pointer. Also, because it modifies the input string, you must work on a mutable copy — never pass a string literal.
Common delimiters for CSV are "," but you can pass multiple: strtok(str, ", \t") treats comma, space, and tab as delimiters.
strtok("a,b,c", ",") causes undefined behaviour (typically a segfault). Always work on a mutable char array or a malloc'd copy. Also, strtok skips consecutive delimiters — use strsep if you need empty token detection.char *saveptr; token = strtok_r(str, delim, &saveptr);. This is safe in multithreaded contexts.Formatted Strings with sprintf() and sscanf()
The printf/scanf family of functions aren't just for console I/O. sprintf() writes formatted output to a string buffer, and sscanf() reads formatted input from a string. They give you the power of printf-style formatting without touching stdout/stdin — perfect for building log messages, parsing configuration strings, or converting data between representations.
sprintf(dest, format, ...) works exactly like printf but writes into dest instead of stdout. It null-terminates the result automatically. The danger: if the formatted result exceeds the buffer size, you get a buffer overflow. Always use snprintf(dest, size, format, ...) which writes at most size-1 characters plus the null terminator, and returns the number of characters that would have been written if the buffer were large enough. Check that return value to detect truncation.
sscanf(src, format, ...) reads from the string src and parses values according to the format string. It returns the number of items successfully assigned. Use it to parse structured input like "id=42, name=alice" — but beware of format string mismatches causing parsing failures.
Both functions support all the format specifiers: %d, %f, %s, %c, %x, etc. For sscanf, width specifiers are critical: "%19s" reads at most 19 chars into a char[20] buffer. Always use width specifiers to prevent buffer overflows.
Reading Strings from the User Safely with fgets
This is where beginners cause the most damage. The classic first instinct is to use scanf("%s", buffer) to read a string from the keyboard. It works — until your user types more characters than your buffer holds, and now you've written past the end of your array into memory you don't own. That's a buffer overflow, and it's one of the most exploited classes of security vulnerabilities in the history of software.
fgets is the safe alternative. It takes three arguments: the buffer to write into, the maximum number of bytes to read (including the null terminator), and the stream to read from (stdin for keyboard input). It will never write more than that maximum, so your buffer stays intact.
One quirk: fgets includes the newline character (' ') if space allows. So if the user types "hello" and presses Enter, the buffer will contain "hello \0". You almost always want to strip that newline before processing. The idiomatic way: buffer[strcspn(buffer, \"\ \")] = 0; which replaces the first newline with a null terminator.", "code": { "language": "c", "filename": "safe_string_input.c", "code": "#include <stdio.h> #include <string.h>
int main(void) { char input_buffer[32];
printf(\"Enter code tag: \");
// fgets is safe; prevents reading more than 32 bytes if (fgets(input_buffer, sizeof(input_buffer), stdin)) { // Strip the trailing newline often left by enter key input_buffer[strcspn(input_buffer, \"\ \")] = 0; printf(\"Processing: [%s]\ \", input_buffer); }
return 0; }", "output": "Enter code tag: feature-request Processing: [feature-request]" }, "callout": { "type": "warning", "title": "Watch Out: Never Use gets()", "text": "gets() was removed from the C11 standard because it cannot be used safely — there is no way to tell it your buffer size, so any input longer than the buffer causes undefined behaviour. Every major OS lists gets-based code as a security vulnerability. Use fgets(buffer, sizeof(buffer), stdin) every single time." }, "production_insight": "scanf(\"%s\") is as dangerous as gets() if not constrained. It writes past the buffer without limit. Use fgets() always. Rule: if you see scanf(\"%s\") in a code review, flag it immediately.", "decision_tree": { "title": "Input Reading Decision", "items": [ { "condition": "Reading a line of text from stdin", "result": "Use fgets(buf, sizeof(buf), stdin) and strip newline." }, { "condition": "Reading formatted values (ints, floats)", "result": "Use scanf() but with width specifiers, e.g., scanf(\"%32s\", buf)." }, { "condition": "Reading from a file", "result": "Use fgets() for lines; fread() for raw data." } ] }, "key_takeaway": "fgets() is your only safe option for text input. Always strip the newline after fgets. Never, ever use gets(). It will exploit your users." }, { "heading": "String Input Functions: scanf vs fgets vs gets — Safety Comparison", "content": "Choosing the wrong input function can introduce a buffer overflow vulnerability. Here's a head-to-head comparison of the three common functions used to read C strings from stdin.
| Aspect | scanf(\"%s\", buf) | fgets(buf, n, stdin) | gets(buf) (removed) |
|---|---|---|---|
| Buffer overflow protection | None — no size limit | Yes — reads at most n-1 chars | None — no size parameter |
| Handles spaces in input | No — stops at whitespace | Yes — reads until newline or EOF | Yes — reads until newline |
| Includes trailing newline | No | Yes (if space) | Yes |
| First-class null termination | Yes (adds \\\0) | Yes (adds \\\0) | Yes (adds \\\0) |
| Return value | Number of items assigned | Pointer to buffer or NULL | Pointer to buffer or NULL |
| Error handling on EOF | Returns EOF | Returns NULL | Returns NULL |
| ISO Standard compliance | Yes | Yes | Removed in C11 |
| Security for production | Avoid unless width specifier used (e.g., \"%31s\") | Recommended for lines | Never use |
| Thread safety | Yes | Yes | Yes (but still unsafe) |
| Typical use case | Simple single-word tokens | Full lines, config strings | Legacy code only (migrate) |
- gets() is banned. Never write it.
- scanf(\"%s\") is equally dangerous without a width specifier. If you must use it, write scanf(\"%255s\", buf) for a char[256] buffer.
- fgets is the only safe general-purpose line reader. Remember to strip the newline.
- For production, use fgets and then sscanf to parse out individual fields — that combination is safe and flexible.", "production_insight": "Code reviews should flag any use of
gets()immediately (block the merge). scanf(\"%s\") without width should be a warning. Enforce clang-tidy checks or custom regex in CI. The most common CVE in embedded systems comes from unbounded reads — fgets is your first layer of defense.", "key_takeaway": "fgets is the only safe function for reading lines. Use scanf with width specifiers only for single tokens. Never usegets(). Always validate input length before processing." }, { "heading": "Common Pitfalls and Debugging Strategies for C Strings", "content": "Even experienced C developers hit string bugs. The most insidious ones involve off-by-one errors, improperly terminated buffers, and mixing array sizes with pointer sizes. Here's a breakdown of the patterns that cause production outages.
Off-by-one: You allocate char buf[10] for a 10-character string, but you need 11 (10 chars + null). This is the classic BUFSIZ+1 mistake. Always allocate expected_length + 1.
Pointer decay: When you pass an array to a function, sizeof(arr) inside the function gives you the pointer size, not the array size. This breaks any code that uses sizeof to bound a copy. Solution: pass the array size as a separate parameter.
Uninitialized buffers: A local char buf[100]; contains garbage. If you don't null-terminate before using it with string functions, they'll read past the intended data. Always initialize with = {0} or buf[0] = '\\\0'.
Strcat without checking space: strcat appends to the destination. If the destination already contains data, the total must fit. Use strncat(dest, src, sizeof(dest) - strlen(dest) - 1) or better, snprintf. Note: strncat takes the number of characters to append, not the total buffer size — different from strncpy!", "code": { "language": "c", "filename": "debugging_patterns.c", "code": "#include <stdio.h> #include <string.h>
void io_thecodeforge_safe_concat(char dest, size_t dest_size, const char src) { size_t dest_len = strlen(dest); size_t available = dest_size - dest_len - 1; strncat(dest, src, available); dest[dest_size - 1] = '\\\0'; // safety }
int main(void) { char buf[64] = \"Hello \"; io_thecodeforge_safe_concat(buf, sizeof(buf), \"World!\"); printf(\"%s\ \", buf); return 0; }", "output": "Hello World!" }, "callout": { "type": "mental_model", "title": "The +1 Rule", "hook": "Every string buffer needs space for the null terminator — always allocate one extra byte.", "bullets": [ "If you need to store N characters, allocate N+1 bytes.", "strlen returns N, sizeof gives N+1 (only for arrays).", "fgets reads at most N-1 characters, then adds \\\0 (N total).", "snprintf returns the number of bytes that would be written (excluding \\\0) — check if >= buffer size." ] }, "production_insight": "The most subtle string bug: using sizeof on a pointer passed to a function. Inside the function, sizeof(ptr) yields pointer size (8 bytes), not array size. Always pass the buffer size explicitly as a parameter. Rule: char *str, size_t str_size should be your default parameter pattern.", "key_takeaway": "Off-by-one, pointer decay, and uninitialised buffers are the top three killers. Always allocate +1 for null. Pass sizes explicitly around functions. Initialize all buffers to zero." }, { "heading": "Practice Problems: Sharpen Your C String Skills", "content": "The best way to internalise null-terminated string semantics is through hands-on coding. Try these problems — they simulate real production scenarios and interview questions.
1. Safe String Reversal (In-Place) Write a function void reverse_str(char *s) that reverses a null-terminated string in place. Do not allocate additional buffers. Handle empty strings. Use only pointer arithmetic, no array indexing. Test with \"hello\" → \"olleh\". Hint: Find the end using strlen, then swap from both ends.
2. CSV Field Extractor Write a function int get_field(const char csv, int field_index, char out, size_t out_size) that extracts the nth comma-separated field from a line and copies it into out. Return 0 on success, -1 if field index out of range or truncation occurs. Use sscanf or manual parsing. Ensure null termination. Test with \"name,age,city\", 2, out → \"city\". Hint: Use strchr in a loop to skip fields.
3. Remove All Occurrences of a Character Write void remove_char(char *str, char ch) that removes every occurrence of a given character from the string. Modify the string in place — no extra buffer. Example: remove_char(\"banana\", 'a') → \"bnn\". Hint: Use a read pointer and a write pointer.
4. Parse HTTP Header Line Given a string like \"Content-Length: 4096\", extract the numeric value and return it as an int. Use sscanf with careful validation. Return -1 if format is invalid. Test: parse_content_length(\"Content-Length: 1024\\r\ \") → 1024. Hint: Use sscanf with \"%*s %d\" or better, skip whitespace manually.
5. Custom strncpy with Guaranteed Null Termination Implement a function char safe_strncpy(char dest, const char *src, size_t n) that copies at most n-1 characters and always null-terminates. It should behave like strncpy but guarantee termination. Return dest. Test with src longer than dest.
For each problem, write a main() that calls your function and prints results. Run under valgrind or with AddressSanitizer to catch memory errors.", "production_insight": "These problems model real-world tasks: string transformation, parsing, and safe copying. In production, you'll encounter CSV parsing, URL decoding, and configuration parsing daily. Practice these until they become second nature — they are the bread and butter of systems programming.", "key_takeaway": "Hands-on practice is the only way to master C strings. Focus on in-place modification, safe copying, and parsing with bounded buffers. Write tests that include edge cases like empty strings, long strings, and null inputs." } ]
The Buffer Overflow You Just Wrote — And Why strncat Won't Save You
You already know strcat is dangerous. It doesn't check destination capacity. One wrong estimate and you're writing past allocated memory, corrupting adjacent variables or worse — rewriting the return address on the stack. That's not theory. That's a root-shell exploit waiting to happen.
So you switch to strncat. Problem solved? Wrong. strncat is subtle in the worst way: it only appends up to n characters, but it always writes a null terminator. That means if your destination buffer is 16 bytes and you already have 12 bytes of string, strncat(dest, src, 4) will write exactly 5 bytes (4 chars + null). Now you're at byte 17. Buffer overflow.
The fix is not more strn*. It's bounded-length copies with explicit size tracking. Use memmove or memcpy plus manual null termination. Maintain a running offset. Check it against the buffer size before every write. One function call. One check. No surprises.
Production trap: strncat's third argument is the maximum number of characters to append, not the total buffer size. Every junior gets this wrong. At least once. Uncomm only.
Why Your strcmp Broke in Production — Encoding and Locale
You wrote a perfectly ordinary login check: if(strcmp(input_password, stored_hash) == 0). Worked fine on your machine. Then the user in Munich typed 'ß' in their password, and the comparison silently failed. strcmp compares bytes, not characters. In UTF-8, 'ß' is two bytes (0xC3 0x9F). strcmp will treat it as two separate bytes. If your stored version was written by a function that normalizes differently, you get a mismatch.
Worse news: strcoll instead of strcmp. Deploys to a server with French locale, and suddenly 'côte' and 'cote' are equal. That might be correct for collation, but if you're checking passwords, auth tokens, or session IDs, it's a backdoor. Different locales mean different comparison rules.
For security-sensitive comparisons, use strcmp with fixed-byte encoding (e.g., hex or base64 strings). Or use memcmp for fixed-length buffers. And if you're hashing passwords — which you should be — you don't need locale-aware comparison. You're comparing hex digests. Pure bytes. No surprises.
If you must compare user-facing text with locale in play, document which locale and use strcoll explicitly. Only then. Never rely on the default locale changing silently between builds or deployments.
Looping Over C Strings: The for() and while() Patterns That Actually Matter
C strings are null-terminated arrays. Looping without iterators or abstractions is the only way to parse, transform, or validate them efficiently. The for loop with index is for when you need position-dependent logic, like reversing or rewriting in place. The while loop with pointer arithmetic is for scanning until null — used in production parsers, tokenizers, and custom strcpy implementations. Both patterns rely on the null terminator, not a separate length counter. The trap: forgetting to allocate space for the null terminator when building strings via loops, or accidentally running past it when the input is malformed. Always check for null before dereferencing. Loops over C strings are also the fastest path for operations like counting vowels, stripping whitespace, or implementing strstr manually when the standard library isn't an option. They're bare metal, explicit, and the foundation of every embedded or systems-level C program.
Parsing Strings with stringstream: From Delimiters to Type Conversion in C++
stringstream from <sstream> is C++'s answer to C's sscanf and strtok, but cleaner and safer. It wraps a string in a stream interface so you can extract formatted data, split on whitespace, and convert between types without manual pointer juggling. Use it for parsing CSV rows, reading config files line by line, or deserializing numeric fields. The getline(ss, token, delimiter) overload splits on any single character — perfect for comma-separated or tab-separated values. Unlike strtok, it preserves the original string and is reentrant by default. The cost is heap allocations for each extraction, so for hot loops you'd stick with C-string manual parsing, but for 99% of application-level work, stringstream is the safer, more readable choice. It also supports std::hex and std::boolalpha for non-decimal or boolean parsing without extra code.
Null Terminator Forgotten: The 3 AM Pager
printf() or strcat() on dest read past the buffer into adjacent memory, corrupting stack frames.dest[sizeof(dest) - 1] = '\0'; after any strncpy or strlcpy call. Better yet: use snprintf() which guarantees null termination as long as the buffer size is correct.- strncpy does NOT null-terminate if the source fills the destination.
- After every bounded string copy, manually ensure the last byte is 0.
- Treat every string buffer as potentially not null-terminated until you prove otherwise.
fgets() — it's included in the buffer. Strip it with buffer[strcspn(buffer, "\n")] = 0;.malloc(strlen(x)) without the +1 for the null terminator.gcc -fsanitize=address -g -o myprog myprog.c && ./myprogtail -100 /var/log/syslog | grep segfault20+ years shipping performance-critical C and C++ systems. Everything here is grounded in real deployments.
That's C Basics. Mark it forged?
19 min read · try the examples if you haven't