DSA Intermediate

Group Anagrams Using Hashing — The Smart O(n·k) Solution

📅 March 2026 ⏱ 8 min read 🎯 Intermediate

In Plain English 🔥

Imagine dumping a bag of Scrabble tiles on a table and noticing that 'LISTEN', 'SILENT', and 'ENLIST' all use the exact same seven tiles — they're secretly the same word in disguise. Grouping anagrams is exactly that: finding all words in a list that are secretly made of the same letters. The trick is inventing a 'fingerprint' for each word that stays identical no matter how the letters are shuffled, then using that fingerprint as a filing-cabinet label to sort the words into groups.

⚡ Quick Answer

Anagram grouping shows up more often in the real world than you'd think. Search engines use it to match synonyms and spelling variants. DNA analysis tools cluster gene sequences that are permutations of one another. Even autocorrect relies on letter-frequency fingerprints to suggest words. If you've ever searched 'best pasta recipe' and also gotten results for 'best tapas recipe', you've brushed up against anagram-aware text processing in the wild.

The naive approach — comparing every word against every other word — works, but it scales like a nightmare. With just 10,000 words you're looking at 100 million comparisons. The real problem is that we keep re-doing work we've already done. Hashing solves this by collapsing the expensive comparison into a single, cheap key lookup. Once you reduce 'LISTEN' and 'SILENT' to the same key, grouping them is O(1) — it's just putting a book on the right shelf.

By the end of this article you'll understand two distinct hashing strategies for this problem (sorted key vs. frequency array key), know exactly when to reach for each one, be able to implement both from scratch, and walk into an interview ready to discuss trade-offs, edge cases, and optimisations with confidence.

Building the Mental Model — What Makes a Good Anagram Key?

Before writing a single line of code, let's nail down the insight that makes this problem tractable.

Two words are anagrams if and only if they contain exactly the same characters with exactly the same frequencies. 'RACE' and 'CARE' both have one R, one A, one C, one E. That's it — that's the entire definition.

So a perfect key is any representation that captures character frequencies and nothing else. Order shouldn't matter. Two properties must hold: (1) anagrams must produce the identical key, and (2) non-anagrams must produce different keys. If your key satisfies both, you can use a HashMap to bucket words by key in a single linear pass.

There are two popular ways to build this key:

Sort the letters alphabetically. 'RACE' → 'ACER'. 'CARE' → 'ACER'. Same key, same bucket.
Build a 26-element frequency array, then serialise it into a string like '1#0#1#1#1#0#...' (one count per letter a–z).

Both are valid. Sorting costs O(k log k) per word where k is the word length. The frequency array costs O(k) per word. For short words the difference is negligible; for very long strings the frequency array wins. Understanding this trade-off is exactly what separates a good interview answer from a great one.

AnagramKeyDemo.java · JAVA

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849

import java.util.Arrays;

public class AnagramKeyDemo {

    /**
     * Strategy 1: sort the characters of the word to produce a canonical key.
     * "listen" -> "eilnst",  "silent" -> "eilnst"  (same key!)
     * Time per word: O(k log k) where k = word length
     */
    public static String sortedKey(String word) {
        char[] characters = word.toLowerCase().toCharArray(); // normalise case first
        Arrays.sort(characters);                              // sort the chars alphabetically
        return new String(characters);                        // turn the sorted array back into a string key
    }

    /**
     * Strategy 2: count character frequencies and serialise the counts.
     * "listen" -> "0#0#0#0#1#0#0#0#1#0#0#1#0#1#0#0#0#0#1#1#0#0#0#0#0#0"
     * Time per word: O(k) — faster for long strings
     */
    public static String frequencyKey(String word) {
        int[] letterCounts = new int[26]; // one slot per letter: index 0 = 'a', index 25 = 'z'

        for (char ch : word.toLowerCase().toCharArray()) {
            letterCounts[ch - 'a']++; // subtract 'a' to convert char to 0-based index
        }

        // Serialise the array into a string using '#' as a delimiter.
        // Without a delimiter, counts like [12, 3] and [1, 23] would both become "123" — a collision!
        StringBuilder keyBuilder = new StringBuilder();
        for (int count : letterCounts) {
            keyBuilder.append(count).append('#');
        }
        return keyBuilder.toString();
    }

    public static void main(String[] args) {
        String[] testWords = {"listen", "silent", "enlist", "race", "care", "hello"};

        System.out.println("Word       | Sorted Key  | Frequency Key (truncated)");
        System.out.println("-----------|-------------|---------------------------");

        for (String word : testWords) {
            String sorted = sortedKey(word);
            String freq   = frequencyKey(word).substring(0, 20) + "..."; // trim for display
            System.out.printf("%-10s | %-11s | %s%n", word, sorted, freq);
        }
    }
}

▶ Output

Word | Sorted Key | Frequency Key (truncated)
-----------|-------------|---------------------------
listen | eilnst | 0#0#0#0#1#0#0#0#1#0#0#1#0...
silent | eilnst | 0#0#0#0#1#0#0#0#1#0#0#1#0...
enlist | eilnst | 0#0#0#0#1#0#0#0#1#0#0#1#0...
race | acer | 0#0#1#0#1#0#0#0#0#0#0#0#0...
care | acer | 0#0#1#0#1#0#0#0#0#0#0#0#0...
hello | ehllo | 0#0#0#0#1#0#0#1#0#0#0#2#0...

⚠️

Watch Out: The Delimiter TrapIf you serialise your frequency array without a delimiter — just concatenating the numbers — counts like [1,2,3] and [12,3] and [1,23] all produce the string '123'. That's a hash collision that silently groups non-anagrams together. Always use a separator character like '#' between counts.

The Full Solution — Grouping Words With a HashMap in One Pass

Now that we have a reliable key function, the grouping algorithm itself is almost embarrassingly simple. One HashMap, one loop, done.

The map's keys are our anagram fingerprints. The map's values are lists of words that share that fingerprint. We iterate through every word once, compute its key, and append the word to the correct list. At the end, we return all the lists.

Total time complexity: O(n · k log k) using sorted keys, or O(n · k) using frequency keys, where n is the number of words and k is the average word length. Space complexity: O(n · k) to store all words in the map.

Notice we never compare two words against each other. We never nest loops. That's the power of hashing: it transforms a pairwise comparison problem into a single-pass bucketing problem.

One practical thing worth noting — Java's getOrDefault on a HashMap is your friend here. It lets you avoid an explicit null-check before adding to the list, which keeps the core logic clean and readable. Production code is read far more often than it's written, so clarity matters.

GroupAnagrams.java · JAVA

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106

import java.util.*;

public class GroupAnagrams {

    /**
     * Groups a list of words into sublists where every word in a sublist
     * is an anagram of the others.
     *
     * Approach: use a sorted-character string as a HashMap key.
     * Time:  O(n * k log k)  — n words, each sorted in O(k log k)
     * Space: O(n * k)        — storing all words in the map
     *
     * @param words the input array of lowercase strings
     * @return a list of groups, each group containing anagram siblings
     */
    public static List<List<String>> groupAnagrams(String[] words) {

        // Each unique sorted-key maps to a list of words that share it
        Map<String, List<String>> anagramBuckets = new HashMap<>();

        for (String word : words) {
            // Step 1: compute the canonical key for this word
            String canonicalKey = buildSortedKey(word);

            // Step 2: if no bucket exists for this key yet, create one on the fly
            // getOrDefault avoids a manual null-check — clean and readable
            anagramBuckets
                .computeIfAbsent(canonicalKey, k -> new ArrayList<>())
                .add(word);
        }

        // Return all bucket values as a list of lists
        return new ArrayList<>(anagramBuckets.values());
    }

    /** Sorts the characters of a word to produce a stable canonical key. */
    private static String buildSortedKey(String word) {
        char[] chars = word.toCharArray();
        Arrays.sort(chars);
        return new String(chars);
    }

    // ---------------------------------------------------------------
    // Bonus: O(n*k) frequency-array version — better for long strings
    // ---------------------------------------------------------------
    public static List<List<String>> groupAnagramsFast(String[] words) {
        Map<String, List<String>> anagramBuckets = new HashMap<>();

        for (String word : words) {
            String canonicalKey = buildFrequencyKey(word);
            anagramBuckets
                .computeIfAbsent(canonicalKey, k -> new ArrayList<>())
                .add(word);
        }

        return new ArrayList<>(anagramBuckets.values());
    }

    /** Produces a frequency-count key — O(k), no sorting needed. */
    private static String buildFrequencyKey(String word) {
        int[] charFrequencies = new int[26];

        for (char ch : word.toCharArray()) {
            charFrequencies[ch - 'a']++; // map 'a'->0, 'b'->1, ... 'z'->25
        }

        // Serialise with '#' delimiter to avoid count-boundary collisions
        StringBuilder key = new StringBuilder();
        for (int frequency : charFrequencies) {
            key.append(frequency).append('#');
        }
        return key.toString();
    }

    public static void main(String[] args) {
        String[] words = {"eat", "tea", "tan", "ate", "nat", "bat"};

        System.out.println("Input words: " + Arrays.toString(words));
        System.out.println();

        List<List<String>> groupedBySortedKey = groupAnagrams(words);
        System.out.println("Grouped (sorted-key approach):");
        for (List<String> group : groupedBySortedKey) {
            System.out.println("  " + group);
        }

        System.out.println();

        List<List<String>> groupedByFrequency = groupAnagramsFast(words);
        System.out.println("Grouped (frequency-key approach):");
        for (List<String> group : groupedByFrequency) {
            System.out.println("  " + group);
        }

        // Edge case: single word, no anagram siblings
        String[] singleWord = {"hello"};
        List<List<String>> singleResult = groupAnagrams(singleWord);
        System.out.println();
        System.out.println("Single word input: " + singleResult);

        // Edge case: all words are anagrams of each other
        String[] allAnagrams = {"abc", "bca", "cab", "bac"};
        List<List<String>> allSameGroup = groupAnagrams(allAnagrams);
        System.out.println("All anagrams input: " + allSameGroup);
    }
}

▶ Output

Input words: [eat, tea, tan, ate, nat, bat]

Grouped (sorted-key approach):
[eat, tea, ate]
[tan, nat]
[bat]

Grouped (frequency-key approach):
[eat, tea, ate]
[tan, nat]
[bat]

Single word input: [[hello]]
All anagrams input: [[abc, bca, cab, bac]]

⚠️

Pro Tip: Prefer computeIfAbsent Over Manual Null ChecksUsing `map.computeIfAbsent(key, k -> new ArrayList<>()).add(word)` is a single atomic operation in Java's HashMap — it's cleaner, less error-prone, and communicates intent immediately. The older `if (map.containsKey(key)) { map.get(key).add(word) } else { ... }` pattern is four lines of noise hiding one idea.

Real-World Patterns — When Grouping Anagrams Actually Comes Up

This problem isn't just a coding exercise — the underlying pattern (hash by canonical form, group by hash) appears constantly in production software.

Search query normalisation: search engines often cluster queries that are anagram variants or phonetic equivalents so that a single indexed result can serve multiple query forms. The sorted-key trick is a lightweight first-pass filter before heavier NLP kicks in.

Duplicate detection in data pipelines: imagine you're ingesting CSV files from multiple sources. Column headers might arrive in different orders — 'name, age, city' vs 'age, city, name'. Sorting the headers and hashing them is a fast way to detect that two files share the same schema before attempting a merge.

Bioinformatics: DNA subsequences that are permutations of each other sometimes bind to the same protein receptor. Grouping them by character frequency is a preprocessing step in similarity analysis.

The deeper lesson: any time you need to group items that are 'equivalent under reordering', the canonical-key-plus-HashMap pattern is your go-to. You're not solving 'group anagrams' — you're solving 'equivalence class partitioning', which is one of the most reusable ideas in computer science.

SchemaGrouper.java · JAVA

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061

import java.util.*;

/**
 * Real-world example: detect which CSV file schemas are equivalent
 * (same columns, different order) using the anagram-grouping pattern.
 *
 * This is the same algorithm as group anagrams — just applied to
 * comma-separated column headers instead of single words.
 */
public class SchemaGrouper {

    /**
     * Groups file schemas that contain the same columns (regardless of order).
     * Each schema is a comma-separated string of column names.
     *
     * @param schemas array of schema strings, e.g. "name,age,city"
     * @return groups of schema strings that are column-equivalent
     */
    public static Map<String, List<String>> groupEquivalentSchemas(String[] schemas) {
        Map<String, List<String>> equivalenceGroups = new HashMap<>();

        for (String schema : schemas) {
            // Split on comma, sort the column names, rejoin — same idea as sorted char key
            String[] columns = schema.split(",");
            Arrays.sort(columns);                         // sort column names alphabetically
            String canonicalSchema = String.join(",", columns); // canonical fingerprint

            equivalenceGroups
                .computeIfAbsent(canonicalSchema, k -> new ArrayList<>())
                .add(schema);                             // bucket the original (unsorted) schema
        }

        return equivalenceGroups;
    }

    public static void main(String[] args) {
        // Simulating schemas arriving from different data sources
        String[] incomingSchemas = {
            "name,age,city",       // source A
            "age,city,name",       // source B — same columns, different order
            "city,name,age",       // source C — same again
            "id,email,phone",      // source D — different schema entirely
            "phone,id,email",      // source E — same as D, reordered
            "name,salary"          // source F — unique schema
        };

        Map<String, List<String>> groups = groupEquivalentSchemas(incomingSchemas);

        System.out.println("Equivalent schema groups:");
        System.out.println("=========================");

        int groupNumber = 1;
        for (Map.Entry<String, List<String>> entry : groups.entrySet()) {
            System.out.println("Group " + groupNumber + " (canonical: " + entry.getKey() + ")");
            for (String schema : entry.getValue()) {
                System.out.println("   -> " + schema);
            }
            groupNumber++;
        }
    }
}

▶ Output

Equivalent schema groups:
=========================
Group 1 (canonical: age,city,name)
-> name,age,city
-> age,city,name
-> city,name,age
Group 2 (canonical: email,id,phone)
-> id,email,phone
-> phone,id,email
Group 3 (canonical: name,salary)
-> name,salary

🔥

Interview Gold: The Generalisation QuestionInterviewers love to ask 'what else could you apply this to?' Be ready to say: this pattern solves any problem where you need to partition a set into equivalence classes defined by invariants under permutation. Schema matching, duplicate file detection, and anagram grouping are all the same algorithm wearing different clothes.

Aspect	Sorted-Key Approach	Frequency-Array Key Approach
Time per word	O(k log k) — sort dominates	O(k) — single linear scan
Key format	Sorted string e.g. 'aert'	Serialised counts e.g. '1#0#1#0#1#...'
Key length	Exactly k characters	Always 52 characters (26 counts + 26 '#')
Collision risk	None — sort is deterministic	None — if delimiter used correctly
Readability	High — key is human-readable	Low — key is opaque to humans
Best for	Short words, interview clarity	Long strings, performance-critical paths
Unicode support	Works with any sortable chars	Must extend array size beyond 26 for Unicode
Implementation complexity	Very low — Arrays.sort + new String	Low — count loop + StringBuilder

🎯 Key Takeaways

The core insight is canonical form: two anagrams must produce the same key — any key strategy that guarantees this turns a pairwise O(n²) problem into a single-pass O(n·k) problem.
Sorted-key is cleaner and safer for interviews; frequency-array key is faster for long strings — know both and be able to justify which you'd pick given constraints.
Always use a delimiter when serialising a numeric array into a string key — omitting it causes silent hash collisions that produce wrong answers with no error message.
The 'hash by canonical form, group by hash' pattern generalises far beyond anagrams — it solves any equivalence-class partitioning problem, including schema matching, duplicate detection, and bioinformatics clustering.

⚠ Common Mistakes to Avoid

✕Mistake 1: Skipping case normalisation — 'Listen' and 'listen' produce different sorted keys ('Leiinst' vs 'eilnst') so they land in different buckets instead of the same group. Fix it by calling word.toLowerCase() before computing any key, and document that your function assumes lowercase input.
✕Mistake 2: No delimiter in the frequency-array key — concatenating raw counts like [1,2,3] produces '123', which is identical to [12,3] and [1,23], silently merging non-anagram groups. Fix it by appending a separator character — e.g., '#' — after each count: '1#2#3#' vs '12#3#' vs '1#23#'.
✕Mistake 3: Using a 2D nested loop to compare every word pair — this O(n² · k) approach fails catastrophically on large inputs (LeetCode's 10,000-word test cases will time out). The moment you find yourself writing 'for each word, compare against all other words', stop and ask 'can I hash this instead?'

Interview Questions on This Topic

QWalk me through the time and space complexity of your solution — and how would it change if the input words could be up to 10,000 characters long instead of 20?
QYour current solution uses sorted characters as the key. Can you think of an alternative key that avoids sorting entirely and achieves O(n·k) overall? What are the trade-offs?
QWhat happens to your algorithm if the input contains Unicode characters like accented letters or emojis? How would you adapt the frequency-array approach to handle them?

Frequently Asked Questions

What is the most efficient way to group anagrams in Java?

The most efficient approach is to use a HashMap where each key is a character-frequency fingerprint of the word (built in O(k) time) and each value is a list of words sharing that fingerprint. This gives O(n·k) total time complexity and O(n·k) space, which beats the sorted-key approach's O(n·k log k) on long strings.

Why do we sort the characters to create the anagram key?

Sorting removes the order of characters, leaving only their identity — which is exactly what two anagrams share. When you sort 'listen' and 'silent' you get 'eilnst' both times. That shared sorted form becomes the HashMap key, so both words land in the same bucket automatically.

Does the group anagrams algorithm work for words with repeated characters?

Yes, and both key strategies handle repeats correctly. The sorted-key approach keeps all duplicates in place — 'aab' sorts to 'aab', never 'ab'. The frequency-array approach explicitly counts each character, so two 'a's give a count of 2 in slot 0. There's no special-casing needed for repeated characters.

🔥

TheCodeForge Editorial Team Verified Author

Written and reviewed by senior developers with real-world experience across enterprise, startup and open-source projects. Every article on TheCodeForge is written to be clear, accurate and genuinely useful — not just SEO filler.

About Our Team Editorial Standards

Forged with 🔥 at TheCodeForge.io — Where Developers Are Forged