std::vector Reallocation — push_back Dangling References
Incorrect order matching and segfaults from dangling std::vector references after push_back.
- std::vector manages a contiguous dynamic array on the heap
- Capacity grows exponentially (typically 1.5x or 2x) when full
- Reallocation invalidates all iterators, pointers, and references
- Amortized O(1) push_back but O(n) worst-case per reallocation
- Use reserve() to pre-allocate and eliminate repeated reallocations
- Production pitfall: holding iterators across push_back leads to undefined behavior
Imagine you're packing books into a bookshelf. A regular array is like buying a fixed-size shelf — you decide upfront how many books fit and you're stuck with that. A vector is like a magical expanding shelf: it starts small, and whenever it runs out of room it quietly moves everything to a bigger shelf behind the scenes. You just keep adding books without ever worrying about the shelf size yourself.
Every non-trivial C++ program needs to store and manipulate collections of data. Whether you're building a game engine tracking active enemies, a web server queuing incoming requests, or a trading system buffering price ticks, you need a container that's fast, flexible, and predictable. std::vector is the default answer to that need — and for good reason. It's the most widely used container in the entire C++ Standard Library.
At its heart, the vector solves the 'fixed-size' problem of traditional arrays while maintaining their greatest strength: contiguous memory. In this guide, we'll peel back the abstraction to see how the buffer actually grows, why your iterators might suddenly 'die', and how to write high-performance code that the CPU's cache will love.
How Vectors Actually Work — The Dynamic Array Under the Hood
A std::vector is a wrapper around a dynamically allocated array. It tracks three things: a pointer to the heap-allocated buffer, the current number of elements (size), and the total allocated capacity. When you push_back() an element and size equals capacity, the vector allocates a brand-new buffer (typically 1.5x or 2x the old capacity depending on the implementation), copies or moves all existing elements into it, then destroys the old buffer. This is called a reallocation.
That reallocation is the most important thing to understand about vectors. It's O(n) work — every existing element must be moved. It's also the event that invalidates every iterator, pointer, and reference you held into the vector. This isn't a bug; it's the fundamental trade-off that lets vectors remain a contiguous block of memory, which is what makes them cache-friendly and blazing fast for iteration.
So the mental model is: a vector is a resizable contiguous array. 'Resizable' is the feature. 'Contiguous' is the performance guarantee. Both matter.
push_back() at O(1). If the vector grew by 1 slot each time, every insertion would be O(n) — a 10,000-element vector would do ~50 million copy operations total just to fill up.reserve() and emplace_back() — Writing Vector Code That Doesn't Waste Time
Now that you know reallocation is expensive, you can prevent it. If you know (or can estimate) how many elements you'll store, call reserve() before your loop. It pre-allocates the buffer without changing size, so no reallocations happen during insertion. This is not a micro-optimisation — on a vector of 1 million objects it's the difference between milliseconds and seconds.
The second upgrade is emplace_back() over push_back(). push_back() takes an already-constructed object and copies or moves it into the vector. emplace_back() takes constructor arguments and builds the object directly inside the vector's buffer — zero copies, zero temporaries. For types with expensive constructors (strings, custom objects), emplace_back() is the right default.
emplace_back() your muscle memory for vector insertion. The only time you'd prefer push_back() is when you already have a constructed object you want to move in — and even then, std::move() with push_back() is equivalent to emplace_back(). For new construction, emplace_back() is always the cleaner choice.reserve() is the top vector performance bug in production.Iterating, Modifying and Erasing — Patterns That Actually Appear in Production
Iterating a vector is straightforward, but modifying it while iterating is where most bugs are born. The erase() method removes an element by iterator and returns an iterator to the element that took its place. If you ignore that return value and keep using your old iterator, you're in undefined behaviour territory — the iterator is now invalid.
The erase-remove idiom is the idiomatic C++ pattern for removing elements that match a condition without hand-rolling an index-shuffling loop. std::remove() (from <algorithm>) shuffles all 'keep' elements to the front and returns an iterator pointing to the start of the 'trash' region. You then call erase() on that range. It's a single-pass O(n) operation and it's in every production codebase.
erase() inside a manual for-loop and increment your iterator unconditionally, you'll skip the element that slid into the erased position. Always reassign: it = vec.erase(it) and only increment it in the else branch. The erase-remove idiom avoids this trap entirely — prefer it whenever possible.erase(): it = vec.erase(it).Vectors of Objects vs Vectors of Pointers — Choosing the Right Layout
This is the decision that separates intermediate C++ developers from seniors. You have two options: store objects directly (std::vector<Player>) or store pointers to heap objects (std::vector<Player*> or std::vector<std::unique_ptr<Player>>). They have completely different performance profiles.
Storing objects directly keeps everything in a single contiguous block of memory. Iterating fires up the CPU's prefetcher and tears through the cache. Storing pointers enables polymorphism but results in 'pointer chasing' across the heap, which kills performance on large collections due to cache misses.
Advanced Vector Techniques: shrink_to_fit, data(), and Custom Allocators
Once you master the basics, you can fine-tune memory and performance. is a non-binding request to reduce capacity to fit size. It may or may not actually free memory — implementations differ, but it's essential after a large batch of removals if you need to return memory to the OS. shrink_to_fit() returns a raw pointer to the underlying buffer, enabling C-style API interop. Custom allocators let you control where the vector allocates memory (e.g., arena allocators for real-time systems).data()
Using `data()` is powerful but dangerous: as soon as the vector reallocates, that pointer is dangling. Reserve ahead or never grow after taking .data()
Iterator dangling after push_back: how a trading system corrupted its order book
reserve() to prevent any reallocation during normal operation.- Never hold iterators, pointers, or references into a std::vector across any operation that could grow the container.
- If you need stable references, consider std::deque or a node-based container like std::list.
- Always
reserve()upfront when the maximum size is known; this eliminates reallocation entirely.
reserve() too early or too late? Is the growth factor causing fragmentation? Monitor with valgrind or heaptrack.vec.data() to get raw pointer only if you guarantee no reallocation (e.g., after reserve()).Key takeaways
reserve() before filling a vector when you know the approximate sizeemplace_back() over push_back() for new element constructionpush_back() would otherwise create and move.erase()) for conditional deletionCommon mistakes to avoid
3 patternsStoring an iterator or pointer into a vector, then modifying the vector
push_back() or insert() that triggers reallocation, every iterator, pointer, and reference into that vector is dangling. Using them is undefined behaviour — your program may crash, silently corrupt data, or 'work' in debug and explode in release.reserve() upfront so no reallocation occurs during the insertions. Prefer integer indices to access elements.Using operator[] with an out-of-bounds index
at(), operator[] does zero bounds checking. Accessing scores[scores.size()] doesn't throw; it reads whatever memory happens to be there. This is a silent data corruption bug that's notoriously hard to track down.at() during development and testing. Switch to [] only in hot loops after you've verified correctness, or enable address sanitiser (compile with -fsanitize=address) to catch it automatically.Calling erase() in a range-for loop
Interview Questions on This Topic
Explain the 'Amortized O(1)' time complexity of push_back(). If reallocation is O(n), why is the overall operation considered constant time?
Frequently Asked Questions
That's STL. Mark it forged?
3 min read · try the examples if you haven't