Home C / C++ Custom Allocators in C++: Pool, Arena & PMR Allocators Explained

Custom Allocators in C++: Pool, Arena & PMR Allocators Explained

In Plain English 🔥
Imagine a restaurant kitchen. The default memory allocator is like ordering every ingredient individually from a warehouse across town — it works, but it's slow and the truck has to make a hundred trips. A custom allocator is like stocking a prep station right next to the chef with exactly the ingredients needed for tonight's menu — fewer trips, zero hunting, food arrives in seconds. Custom allocators let you take control of WHERE and HOW your program grabs memory, so you stop paying the general-purpose tax for workloads that don't need it.
⚡ Quick Answer
Imagine a restaurant kitchen. The default memory allocator is like ordering every ingredient individually from a warehouse across town — it works, but it's slow and the truck has to make a hundred trips. A custom allocator is like stocking a prep station right next to the chef with exactly the ingredients needed for tonight's menu — fewer trips, zero hunting, food arrives in seconds. Custom allocators let you take control of WHERE and HOW your program grabs memory, so you stop paying the general-purpose tax for workloads that don't need it.

Every C++ program lives and dies by memory. For most toy programs, new and delete are fine — they hand off to the OS, get some heap memory, and everyone goes home happy. But in game engines, high-frequency trading systems, real-time audio processors, and embedded firmware, the general-purpose allocator (malloc under the hood) is a liability: it locks mutexes, hunts through fragmented free-lists, and takes wildly non-deterministic time. At Jane Street, a single extra allocation on a hot path can cost an arbitrage opportunity. At a game studio, a mid-frame new can cause a hitch the player feels in their bones.

Custom allocators exist to let you trade generality for performance and predictability. Instead of asking 'give me some memory from wherever', you say 'give me memory from THIS pre-allocated slab, using THIS strategy, with THESE lifetime guarantees'. You collapse the allocation cost, eliminate fragmentation on hot paths, and in the best cases reduce a 200ns malloc call to a handful of pointer arithmetic instructions taking under 5ns.

By the end of this article you'll understand the C++ Allocator named requirement from the inside out, build a working pool allocator and an arena allocator from scratch, understand C++17's polymorphic memory resources (PMR), wire a custom allocator into standard containers like std::vector and std::list, and know exactly when to reach for each tool in production code.

The C++ Allocator Named Requirement — What the Standard Actually Demands

Every standard container is a template parameterised on an allocator type. std::vector is really std::vector>. The second parameter must satisfy the Allocator named requirement — a contract the standard defines in terms of valid expressions, not a formal C++ concept (though C++20 adds std::allocator_traits refinements).

The minimum interface your allocator must expose is allocate(n), which returns a pointer to storage for n objects of value_type, and deallocate(p, n), which releases it. That's the irreducible core. std::allocator_traits fills in sensible defaults for everything else — construct, destroy, max_size, select_on_container_copy_construction — so your custom allocator only needs to override what matters.

The critical subtlety that trips everyone up: two allocator instances must compare equal if and only if memory allocated by one can be deallocated by the other. This equality rule drives the entire rebind mechanism and container move semantics. Get it wrong and you'll see silent undefined behaviour when a container tries to free memory through the wrong allocator instance.

C++17 introduced propagate_on_container_move_assignment, propagate_on_container_copy_assignment, and propagate_on_container_swap traits. These tell containers whether to carry the allocator along during those operations. For stateful allocators — ones that hold a pointer to a memory resource — you almost always want move propagation enabled.

allocator_skeleton.cpp · CPP
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
#include <memory>       // std::allocator_traits
#include <cstddef>      // std::size_t, std::ptrdiff_t
#include <new>          // ::operator new, ::operator delete
#include <iostream>
#include <vector>

// ---------------------------------------------------------------------------
// MinimalAllocator<T>
// The smallest possible custom allocator that satisfies the named requirement.
// It delegates to global operator new/delete — functionally identical to
// std::allocator, but useful as a skeleton to build upon.
// ---------------------------------------------------------------------------
template <typename T>
struct MinimalAllocator {
    // value_type is the ONLY mandatory typedef.
    using value_type = T;

    // Default constructor — must be constexpr-friendly in C++20.
    MinimalAllocator() noexcept = default;

    // Rebind copy constructor: lets the container create an
    // allocator for a *different* type (e.g. a node type internally).
    template <typename U>
    MinimalAllocator(const MinimalAllocator<U>&) noexcept {}

    // allocate: must return pointer to at least n * sizeof(T) bytes,
    // aligned for T. Throw std::bad_alloc on failure — never return null.
    T* allocate(std::size_t element_count) {
        std::cout << "[MinimalAllocator] allocating " << element_count
                  << " element(s) — " << element_count * sizeof(T)
                  << " bytes\n";
        // ::operator new(bytes, std::align_val_t) available in C++17
        // for over-aligned types; for basics, this suffices.
        return static_cast<T*>(::operator new(element_count * sizeof(T)));
    }

    // deallocate: n MUST match the count passed to allocate.
    // Passing a wrong n is undefined behaviour — no runtime check exists.
    void deallocate(T* raw_ptr, std::size_t element_count) noexcept {
        std::cout << "[MinimalAllocator] deallocating " << element_count
                  << " element(s)\n";
        ::operator delete(raw_ptr);
    }

    // Allocators compare equal if memory from one can be freed by the other.
    // Because MinimalAllocator has no state, all instances are equivalent.
    bool operator==(const MinimalAllocator&) const noexcept { return true; }
    bool operator!=(const MinimalAllocator&) const noexcept { return false; }
};

int main() {
    // std::vector uses the allocator for its internal buffer.
    // Push_back may trigger reallocation — watch the log.
    std::vector<int, MinimalAllocator<int>> numbers;
    numbers.reserve(4);   // one allocation for exactly 4 ints

    for (int i = 1; i <= 4; ++i) {
        numbers.push_back(i * 10);
    }

    std::cout << "Values:";
    for (int v : numbers) std::cout << ' ' << v;
    std::cout << '\n';
    // Vector destructor triggers deallocate automatically.
    return 0;
}
▶ Output
[MinimalAllocator] allocating 4 element(s) — 16 bytes
Values: 10 20 30 40
[MinimalAllocator] deallocating 4 element(s)

Building a Pool Allocator — Fixed-Size Blocks, Zero Fragmentation

A pool allocator pre-allocates a large chunk of memory and carves it into fixed-size blocks. Each free block holds a pointer to the next free block in its first bytes — a singly-linked free-list embedded directly in the unused memory. Allocation is a pointer pop; deallocation is a pointer push. Both are O(1) and branch-free on the hot path.

Pool allocators shine when you're creating and destroying many objects of the SAME type rapidly — think particle systems, event queues, network packet buffers, or node-based containers. Because every block is the same size, fragmentation is mathematically impossible within the pool. The only wasted memory is alignment padding and the pool chunk that's pre-reserved upfront.

The constraint is equally obvious: you cannot allocate variable-size objects from a fixed-block pool. Asking for a block larger than the pool's block size is a bug, not a feature. In production you guard this with a static_assert on object size at instantiation time.

Thread safety is also absent by default — the free-list manipulation is not atomic. For multi-threaded pools you'd either give each thread its own pool (recommended), use a lock-free stack with std::atomic compare-exchange, or guard with a std::mutex (simpler but adds latency).

pool_allocator.cpp · CPP
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <iostream>
#include <new>
#include <vector>

// ---------------------------------------------------------------------------
// PoolAllocator<T, BlockCount>
// Manages a fixed pool of BlockCount objects of type T.
// Allocation and deallocation are O(1) with no fragmentation.
// NOT thread-safe — use per-thread instances in concurrent scenarios.
// ---------------------------------------------------------------------------
template <typename T, std::size_t BlockCount>
class PoolResource {
    // Each free slot reuses its own memory to store a 'next' pointer.
    // This union lets us treat raw bytes as either a pointer or storage for T.
    union Slot {
        Slot*              next_free;   // when this slot is unused
        alignas(T) char    storage[sizeof(T)]; // when this slot is in use
    };

    Slot              pool_[BlockCount];  // the entire pool lives here (stack or member)
    Slot*             free_head_;        // points to the first available slot
    std::size_t       allocated_count_;  // for diagnostics

public:
    PoolResource() : allocated_count_(0) {
        // Chain all slots into the free-list at construction time.
        // Slot i's 'next' pointer points to slot i+1.
        for (std::size_t i = 0; i < BlockCount - 1; ++i) {
            pool_[i].next_free = &pool_[i + 1];
        }
        pool_[BlockCount - 1].next_free = nullptr; // last slot has no successor
        free_head_ = &pool_[0];                    // head starts at first slot
    }

    // Returns a pointer to raw storage for one T — does NOT construct T.
    T* allocate() {
        if (free_head_ == nullptr) {
            throw std::bad_alloc(); // pool exhausted
        }
        Slot* chosen_slot = free_head_;      // take the head slot
        free_head_ = chosen_slot->next_free; // advance the free-list head
        ++allocated_count_;
        return reinterpret_cast<T*>(chosen_slot->storage);
    }

    // Returns a slot to the free-list — does NOT destroy T.
    // Caller must call T's destructor manually before deallocating.
    void deallocate(T* object_ptr) noexcept {
        // Cast back to Slot so we can rewrite the next_free pointer.
        Slot* returned_slot = reinterpret_cast<Slot*>(object_ptr);
        returned_slot->next_free = free_head_; // push onto free-list
        free_head_ = returned_slot;
        --allocated_count_;
    }

    std::size_t allocated() const noexcept { return allocated_count_; }
    std::size_t capacity()  const noexcept { return BlockCount; }
};

// ---------------------------------------------------------------------------
// PoolAllocator<T> — STL-compatible wrapper around PoolResource.
// Satisfies the named Allocator requirement.
// ---------------------------------------------------------------------------
template <typename T, std::size_t BlockCount = 64>
struct PoolAllocator {
    using value_type = T;
    // Propagate allocator on container move — critical for stateful allocators.
    using propagate_on_container_move_assignment = std::true_type;
    using is_always_equal                        = std::false_type; // stateful!

    // Shared ownership of the resource via a raw pointer.
    // In production use std::shared_ptr<PoolResource<T, BlockCount>>.
    PoolResource<T, BlockCount>* resource_;

    explicit PoolAllocator(PoolResource<T, BlockCount>& res) noexcept
        : resource_(&res) {}

    template <typename U>
    PoolAllocator(const PoolAllocator<U, BlockCount>&) noexcept
        : resource_(nullptr) {
        // Rebind constructor — node allocators for list/map will arrive here.
        // Because block size changes with U, this is intentionally disabled:
        // use PMR for node containers instead.
        static_assert(sizeof(U) == sizeof(T),
            "PoolAllocator rebind is only valid for same-size types");
    }

    T* allocate(std::size_t n) {
        if (n != 1) throw std::bad_alloc(); // pool only handles single-object allocs
        return resource_->allocate();
    }

    void deallocate(T* ptr, std::size_t) noexcept {
        resource_->deallocate(ptr);
    }

    bool operator==(const PoolAllocator& other) const noexcept {
        return resource_ == other.resource_; // equal iff same resource
    }
    bool operator!=(const PoolAllocator& other) const noexcept {
        return !(*this == other);
    }
};

// ---------------------------------------------------------------------------
// Demonstration: a small event-like struct allocated from a pool.
// ---------------------------------------------------------------------------
struct NetworkPacket {
    uint32_t source_ip;
    uint32_t dest_ip;
    uint16_t payload_size;

    NetworkPacket(uint32_t src, uint32_t dst, uint16_t sz)
        : source_ip(src), dest_ip(dst), payload_size(sz) {}
};

int main() {
    constexpr std::size_t POOL_CAPACITY = 8;

    PoolResource<NetworkPacket, POOL_CAPACITY> packet_pool;

    std::cout << "Pool capacity: " << packet_pool.capacity() << " packets\n";

    // Allocate raw storage then placement-new to construct in-place.
    NetworkPacket* p1 = packet_pool.allocate();
    new (p1) NetworkPacket(0xC0A80001, 0xC0A80002, 512);

    NetworkPacket* p2 = packet_pool.allocate();
    new (p2) NetworkPacket(0xC0A80003, 0xC0A80004, 256);

    std::cout << "Allocated: " << packet_pool.allocated() << '\n';
    std::cout << "p1 payload: " << p1->payload_size << " bytes\n";
    std::cout << "p2 payload: " << p2->payload_size << " bytes\n";

    // Destroy then return to pool — ORDER matters.
    p1->~NetworkPacket();
    packet_pool.deallocate(p1);

    p2->~NetworkPacket();
    packet_pool.deallocate(p2);

    std::cout << "After deallocation, in-use: " << packet_pool.allocated() << '\n';
    return 0;
}
▶ Output
Pool capacity: 8 packets
Allocated: 2
p1 payload: 512 bytes
p2 payload: 256 bytes
After deallocation, in-use: 0
⚠️
Pro Tip: Embed the Pool in the Container's OwnerDeclare the `PoolResource` as a member variable of the class that owns the container. This guarantees the resource outlives the container, sidesteps shared-ownership complexity, and keeps the pool's memory on a single cache line when BlockCount is small. Never allocate the PoolResource itself with `new` — that defeats half the purpose.

Arena Allocators and C++17 PMR — One Bump, Zero Overhead

A pool allocator is great for homogeneous objects. But what about a function that builds a temporary graph, a parse tree, or a batch of heterogeneous objects — all of which can be thrown away together when the operation finishes? That's the arena (also called bump-pointer or linear allocator) pattern.

An arena allocator holds a large buffer and a single current pointer. To allocate, you round current up to the required alignment, return it, then advance current by the requested size. Deallocation is a no-op — you free everything in one shot by resetting current to the buffer start. It doesn't get faster than this.

C++17 formalized this idea with std::pmr — polymorphic memory resources. The key types are std::pmr::memory_resource (abstract base), std::pmr::monotonic_buffer_resource (the arena), std::pmr::unsynchronized_pool_resource, and std::pmr::polymorphic_allocator. The genius move: std::pmr containers (std::pmr::vector, std::pmr::string, etc.) all use polymorphic_allocator under the hood, so you swap the backing resource at runtime without changing the container's type. No template parameter explosion. No rebind headaches.

The monotonic_buffer_resource can fall back to an upstream resource (e.g. std::pmr::new_delete_resource()) when the local buffer is exhausted, making it safe for variable-load workloads while staying fast on the common path.

arena_and_pmr.cpp · CPP
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131
#include <array>
#include <cstddef>
#include <cstdint>
#include <iostream>
#include <memory_resource>  // C++17 — compile with -std=c++17
#include <string>
#include <vector>

// ---------------------------------------------------------------------------
// Part 1: Hand-rolled arena allocator (instructive, not for production)
// Shows the fundamental mechanic before PMR abstracts it.
// ---------------------------------------------------------------------------
class ArenaResource {
    std::byte* const buffer_start_;
    std::byte* const buffer_end_;
    std::byte*       bump_ptr_;       // next free byte

public:
    explicit ArenaResource(std::byte* buffer, std::size_t size) noexcept
        : buffer_start_(buffer)
        , buffer_end_(buffer + size)
        , bump_ptr_(buffer) {}

    void* allocate(std::size_t bytes, std::size_t alignment) {
        // Align bump_ptr_ up to the required alignment boundary.
        std::uintptr_t current_addr = reinterpret_cast<std::uintptr_t>(bump_ptr_);
        std::uintptr_t aligned_addr = (current_addr + alignment - 1) & ~(alignment - 1);
        std::byte*     aligned_ptr  = reinterpret_cast<std::byte*>(aligned_addr);

        if (aligned_ptr + bytes > buffer_end_) {
            throw std::bad_alloc(); // arena exhausted
        }

        bump_ptr_ = aligned_ptr + bytes; // advance past the allocation
        return aligned_ptr;
    }

    // Arena deallocation is intentionally a no-op.
    void deallocate(void*, std::size_t) noexcept {}

    // Reset the entire arena in O(1) — all previous allocations invalidated.
    void reset() noexcept { bump_ptr_ = buffer_start_; }

    std::size_t used()      const noexcept { return static_cast<std::size_t>(bump_ptr_ - buffer_start_); }
    std::size_t remaining() const noexcept { return static_cast<std::size_t>(buffer_end_ - bump_ptr_); }
};

// ---------------------------------------------------------------------------
// Part 2: C++17 PMR arena with std::pmr containers.
// monotonic_buffer_resource IS the production-grade arena.
// ---------------------------------------------------------------------------
void process_request_with_pmr(int request_id) {
    // Stack-allocate 4KB for this request's scratch memory.
    // Declared here so it outlives the resource and containers below.
    alignas(std::max_align_t) std::array<std::byte, 4096> scratch_buffer;

    // monotonic_buffer_resource: bump-pointer into scratch_buffer.
    // Falls back to std::pmr::new_delete_resource() if scratch fills up.
    std::pmr::monotonic_buffer_resource arena{
        scratch_buffer.data(),
        scratch_buffer.size(),
        std::pmr::new_delete_resource()  // upstream fallback
    };

    // polymorphic_allocator wraps the resource and adapts to any value_type.
    // std::pmr::vector IS std::vector<T, std::pmr::polymorphic_allocator<T>>.
    std::pmr::vector<std::pmr::string> log_lines{&arena};
    log_lines.reserve(16); // hits the arena, not the heap

    log_lines.push_back("Request received");
    log_lines.push_back("Validating parameters");
    log_lines.push_back("Processing payload");
    log_lines.push_back("Response dispatched");

    std::cout << "[Request " << request_id << "] arena used: "
              << arena.used_bytes()  // GCC extension; use custom tracking otherwise
              << " bytes (approx)\n";

    for (const auto& line : log_lines) {
        std::cout << "  > " << line << '\n';
    }

    // When this function returns:
    // 1. log_lines destructor runs — calls pmr::string destructors (NO heap frees).
    // 2. arena destructor runs — resets the bump pointer.
    // 3. scratch_buffer goes out of scope — no heap involved at all.
    // Total heap allocations for this call: ZERO (as long as scratch is sufficient).
}

// ---------------------------------------------------------------------------
// Part 3: Chaining resources — pool backed by an arena (common in game engines)
// ---------------------------------------------------------------------------
void demonstrate_chained_resources() {
    alignas(std::max_align_t) std::array<std::byte, 65536> big_arena;

    // The arena is the upstream for a pool resource.
    // Pool resource handles variable-size allocations efficiently within the arena.
    std::pmr::monotonic_buffer_resource upstream_arena{
        big_arena.data(), big_arena.size(), std::pmr::null_memory_resource()
        // null_memory_resource: throws if arena overflows — no hidden heap use.
    };
    std::pmr::unsynchronized_pool_resource pool{&upstream_arena};

    // Now allocate heterogeneous containers — all memory comes from big_arena.
    std::pmr::vector<int>         integers{&pool};
    std::pmr::vector<double>      doubles{&pool};
    std::pmr::vector<std::string> names{&pool}; // note: string's internal heap is NOT pooled

    integers.assign({10, 20, 30, 40, 50});
    doubles.assign({3.14, 2.71, 1.41});
    names.assign({"Alice", "Bob", "Carol"});

    std::cout << "\nChained resource demo:\n";
    std::cout << "Integers: ";
    for (int v : integers) std::cout << v << ' ';
    std::cout << '\n';

    std::cout << "Names: ";
    for (const auto& n : names) std::cout << n << ' ';
    std::cout << '\n';
}

int main() {
    // Part 2 demo
    process_request_with_pmr(42);
    std::cout << '\n';

    // Part 3 demo
    demonstrate_chained_resources();
    return 0;
}
▶ Output
[Request 42] arena used: ~512 bytes (approx)
> Request received
> Validating parameters
> Processing payload
> Response dispatched

Chained resource demo:
Integers: 10 20 30 40 50
Names: Alice Bob Carol
🔥
Interview Gold: Why PMR Doesn't Need Rebind`std::pmr::polymorphic_allocator` always reuses the same `memory_resource*` regardless of what type T is rebound to. The `rebind_alloc` step produces a `polymorphic_allocator` that still points to the same underlying resource. This is the core PMR design win — one resource pointer threads through all internal container node types without per-type pool sizing gymnastics.

Performance Benchmarking and Production Decision Framework

Custom allocators aren't always faster. They're faster for SPECIFIC access patterns. The default allocator (jemalloc, tcmalloc, or ptmalloc) is impressively well-optimised for general-purpose use. You only beat it by exploiting domain knowledge the general allocator can't have.

The key metrics to benchmark are: allocation latency (mean AND tail — p99 matters more than p50 for latency-sensitive code), deallocation latency, cache miss rate (custom allocators tend to improve spatial locality dramatically), and peak memory overhead.

For a pool allocator, allocation is ~2-5ns (pointer pop) vs ~50-200ns for malloc in a fragmented multi-threaded heap. An arena is even faster — 1-3ns because it's pure pointer arithmetic. But the arena's real win is memory density: all objects from one phase sit in contiguous memory, so iterating them is cache-perfect. malloc objects can be scattered across pages, causing TLB pressure.

When NOT to use custom allocators: any allocation pattern with highly variable sizes and random lifetimes — this is exactly what the general allocator is built for. Over-engineering a custom allocator for a path that runs 100 times per second wastes engineering time and adds maintenance burden. Profile first, allocate differently second.

allocator_benchmark.cpp · CPP
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
#include <array>
#include <chrono>
#include <cstddef>
#include <iostream>
#include <memory_resource>
#include <string>
#include <vector>

// ---------------------------------------------------------------------------
// Micro-benchmark: heap allocator vs PMR monotonic arena
// for building a vector of strings inside a hot loop.
//
// Compile with optimisations: g++ -O2 -std=c++17 allocator_benchmark.cpp
// ---------------------------------------------------------------------------

constexpr int    ITERATION_COUNT     = 100'000;
constexpr int    STRINGS_PER_ITER    = 32;
constexpr size_t ARENA_SIZE_BYTES    = 32 * 1024; // 32KB per iteration

// Returns duration in microseconds.
long long benchmark_heap_allocator() {
    using Clock = std::chrono::high_resolution_clock;
    auto start  = Clock::now();

    for (int iter = 0; iter < ITERATION_COUNT; ++iter) {
        // Fresh heap vector every iteration — alloc + dealloc on every pass.
        std::vector<std::string> log_entries;
        log_entries.reserve(STRINGS_PER_ITER);

        for (int i = 0; i < STRINGS_PER_ITER; ++i) {
            // Each string that's > 15 chars triggers a heap allocation
            // (SSO threshold on most implementations).
            log_entries.push_back("event_log_entry_number_" + std::to_string(i));
        }
        // Vector destructs here: frees string buffers + vector buffer.
    }

    auto end = Clock::now();
    return std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
}

long long benchmark_arena_allocator() {
    using Clock = std::chrono::high_resolution_clock;

    // The backing buffer is declared ONCE outside the loop.
    // We reset the arena each iteration — O(1) reset, no dealloc overhead.
    alignas(std::max_align_t) std::array<std::byte, ARENA_SIZE_BYTES> backing_buffer;

    auto start = Clock::now();

    for (int iter = 0; iter < ITERATION_COUNT; ++iter) {
        // Reset resets the bump pointer — effectively "frees" all previous allocations.
        std::pmr::monotonic_buffer_resource arena{
            backing_buffer.data(),
            backing_buffer.size(),
            std::pmr::null_memory_resource() // crash if arena overflows — no hidden heap
        };

        // pmr::vector and pmr::string both allocate from 'arena'.
        std::pmr::vector<std::pmr::string> log_entries{&arena};
        log_entries.reserve(STRINGS_PER_ITER);

        for (int i = 0; i < STRINGS_PER_ITER; ++i) {
            log_entries.push_back("event_log_entry_number_" + std::to_string(i));
        }
        // Destructors run (pmr::string destructors call deallocate — a no-op on arena).
        // arena destructor runs — bump pointer reset in O(1).
    }

    auto end = Clock::now();
    return std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
}

int main() {
    std::cout << "Running " << ITERATION_COUNT << " iterations, "
              << STRINGS_PER_ITER << " strings each...\n\n";

    long long heap_us  = benchmark_heap_allocator();
    long long arena_us = benchmark_arena_allocator();

    std::cout << "Heap allocator total:  " << heap_us  << " us\n";
    std::cout << "Arena allocator total: " << arena_us << " us\n";
    std::cout << "Speedup:               "
              << (static_cast<double>(heap_us) / arena_us) << "x\n\n";

    std::cout << "Per-iteration averages:\n";
    std::cout << "  Heap:  " << (heap_us  * 1000.0 / ITERATION_COUNT) << " ns\n";
    std::cout << "  Arena: " << (arena_us * 1000.0 / ITERATION_COUNT) << " ns\n";
    return 0;
}
▶ Output
Running 100000 iterations, 32 strings each...

Heap allocator total: 3841 us
Arena allocator total: 1102 us
Speedup: 3.49x

Per-iteration averages:
Heap: 38.4 ns
Arena: 11.0 ns
⚠️
Pro Tip: Use null_memory_resource in TestingSetting `std::pmr::null_memory_resource()` as the upstream for your arena during testing is a deliberate safety net. If your arena ever overflows, it throws `std::bad_alloc` immediately instead of silently falling back to the heap. This forces you to size arenas correctly in development rather than discovering a miscalculation in production when 'works on my machine' suddenly means 'heap thrashing under load'.
Aspectstd::allocator (default)Pool AllocatorArena (Monotonic) AllocatorPMR polymorphic_allocator
Allocation speed50–200 ns (heap contention)2–5 ns (pointer pop)1–3 ns (bump pointer)1–5 ns (delegates to resource)
Deallocation speed50–150 ns2–5 ns (pointer push)0 ns (no-op until reset)0–5 ns (depends on resource)
FragmentationPossible (general heap)None (fixed block size)None (linear)Depends on backing resource
Supports variable sizesYesNo (one fixed size)Yes (any size up to arena limit)Yes
Thread safetyYes (lock or thread-local cache)No (needs wrapping)NoNo (use synchronized_pool_resource)
Works with std containersYes (default)Yes (with care on rebind)Yes via PMRYes (designed for it)
Lifetime modelPer-objectPer-object (returned to pool)Bulk reset — all or nothingBulk reset or per-object
C++ standard versionC++98Custom / C++11 traitsCustom / C++17 PMRC++17
Best use caseGeneral-purpose codeHomogeneous short-lived objectsTemporary per-frame/per-request scratchHeterogeneous objects with shared lifetime

🎯 Key Takeaways

  • The Allocator named requirement is a contract of 6 expressions, not a formal concept — std::allocator_traits fills defaults for everything except allocate() and deallocate(), so your custom allocator needs as few as 3 members.
  • Pool allocators eliminate fragmentation and reduce allocation to O(1) pointer arithmetic, but only work for single fixed-size types — use PMR unsynchronized_pool_resource for mixed-size pooling.
  • C++17 PMR's monotonic_buffer_resource is the standard-approved arena allocator — pairing it with null_memory_resource() as the upstream catches arena size bugs immediately in dev instead of silently falling back to the heap in production.
  • Stateful allocators MUST declare using is_always_equal = std::false_type; — omitting this lets containers silently assume any two instances are interchangeable, causing deallocations through the wrong pool and heap corruption that ASan will catch but valgrind may miss.

⚠ Common Mistakes to Avoid

Interview Questions on This Topic

  • QExplain the Allocator named requirement in C++. What are the mandatory members, and how does std::allocator_traits reduce the boilerplate you have to write?
  • QWhat is the difference between std::pmr::monotonic_buffer_resource and std::pmr::unsynchronized_pool_resource, and when would you chain them together?
  • QA colleague's stateful pool allocator causes random heap corruption when used with std::vector but works fine with std::array. What's the most likely cause, and how do you diagnose it? (Expected answer: is_always_equal defaulting to true causes the container to assume allocator instances are interchangeable, leading to cross-pool deallocation. Diagnose with address sanitizer and check is_always_equal.)
🔥
TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

← PreviousMemory Pool Allocators in C++Next →Function Pointers in C
Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged