SQL Injection — GBK Encoding Bypass Exposed 340k Records
340k user records exposed via GBK encoding bypass of addslashes().
20+ years shipping production PHP systems at scale. Notes here come from systems that actually shipped.
- SQL injection works by breaking out of a string value and injecting SQL syntax — raw string concatenation is the root cause
- The only real fix: prepared statements, which separate SQL structure from data at the protocol level
- PDO default has ATTR_EMULATE_PREPARES = true — this still concatenates in PHP and reintroduces injection risk; set it to false
- MySQLi bind_param() type string: s=string, i=integer, d=decimal, b=blob — wrong type does not prevent injection but causes data corruption
- Defence in depth: prepared statements + input validation + htmlspecialchars() on output + least-privilege DB user
- Biggest mistake: using PDO::quote() and thinking it is as safe as prepared statements — it is not
Imagine you run a library. A visitor fills out a slip asking for 'Harry Potter'. Simple — you find the book. Now imagine a sneaky visitor writes: 'Harry Potter. Also, delete every book in the library.' If your librarian follows those instructions literally, disaster strikes. SQL injection is exactly that: an attacker sneaks extra database commands into a normal input field, and a naive PHP script obeys them blindly. Prepared statements are like hiring a smarter librarian who separates the request form from the instructions — so no matter what someone writes on the slip, it's always treated as a book title, never a command.
Every year, SQL injection ranks in the OWASP Top 10 most critical web security risks — and it has for over two decades. It's not a niche academic threat. It's the vulnerability behind some of the most embarrassing data breaches in history: stolen user credentials, exposed credit card numbers, entire databases wiped overnight. The scary part? A single unguarded PHP script connecting to MySQL is all it takes. One $_GET variable dropped raw into a query, and an attacker owns your database.
The core problem is trust. Early PHP database code treated user input as part of the SQL command itself. When you concatenate a username from a form directly into a query string, you're handing the attacker a microphone on your database server. They can close your query early, append their own commands, comment out the rest of your SQL, and walk away with whatever they want. The fix isn't complicated — but it requires understanding why the old approach was broken, not just memorising a new syntax.
By the end of this article you'll understand exactly how SQL injection works under the hood, why prepared statements eliminate the vulnerability at the protocol level (not just cosmetically), and how to write production-grade PHP database code using both PDO and MySQLi. You'll also know the three mistakes that catch even experienced developers off guard — and how interviewers probe for this knowledge.
Why Parameterized Queries Are Not Optional for SQL Injection Prevention
SQL injection prevention in PHP is the practice of ensuring user-supplied input cannot alter the intended structure of SQL queries. The core mechanic is separating SQL logic from data: using prepared statements with bound parameters (via PDO or MySQLi) so the database engine treats input as literal values, never as executable code. Without this separation, any input—like a single quote or a GBK-encoded multibyte character—can break out of its string context and rewrite the query.
Prepared statements work by sending the query template to the database first, then sending the parameters separately. This means the database already knows the query's shape before it sees any user data. Even if an attacker passes 1 OR 1=1 or a GBK byte sequence that tricks mysql_real_escape_string() into producing a valid escape character, the database never interprets that data as SQL syntax. The key property: escaping functions are context-dependent and can fail with certain character sets (e.g., GBK), but prepared statements are immune to encoding tricks.
Use prepared statements for every database query that includes user input—GET/POST parameters, headers, cookies, file names, anything. In production, a single raw query concatenated with user data is a breach waiting to happen. The 340k-record leak referenced in the title occurred because the application used with a GBK connection, allowing an attacker to craft a multibyte sequence that consumed the escape character and injected a quote. Prepared statements would have prevented that entirely.mysql_real_escape_string()
mysql_real_escape_string() with SET NAMES GBK, thinking it was safe. An attacker sent a crafted byte sequence (0xbf27) that consumed the backslash and injected a single quote, leading to full table extraction.How SQL Injection Actually Works — And Why Raw Queries Are Dangerous
To prevent something you have to understand it first. Let's look at what actually happens when PHP builds a query by gluing strings together.
Suppose you have a login form. A user types their email and password, and your PHP script checks the database. The vulnerable version looks harmless until you realise that $_POST['email'] is completely user-controlled. An attacker doesn't type their email — they type a carefully crafted string that changes the meaning of your SQL.
The input ' OR '1'='1 turns a normal login check into a query that returns every user in the table. The attacker is now logged in as the first user — likely an admin. Even worse, the input '; DROP TABLE users; -- could wipe your entire users table if your database user has that permission. The double-dash -- is a SQL comment that silences the rest of your original query so there's no syntax error to trip the attacker up.
This works because MySQL receives one big string and has no idea where your intended SQL ends and the attacker's injected SQL begins. It's all just text to the parser. That's the fundamental flaw — and the fix is separating the query structure from the data, permanently, at the driver level.
mysqli_real_escape_string() is not a complete fix. It escapes quotes but fails against certain character encodings and doesn't protect against all injection vectors. Prepared statements are the only reliable defence — they make injection structurally impossible, not just harder.The Right Fix: PDO Prepared Statements That Make Injection Structurally Impossible
Prepared statements fix SQL injection at the protocol level. Here's what that means in practice: when you use a prepared statement, your PHP code sends the SQL template to MySQL first — with placeholders where data will go. MySQL parses and compiles that template into an execution plan. Then, separately, your data values are sent. By the time the data arrives, the query structure is already locked in. MySQL treats everything in those data slots as a literal value, never as SQL syntax — no matter what characters it contains.
PDO (PHP Data Objects) is the recommended approach because it's database-agnostic. You can switch from MySQL to PostgreSQL or SQLite by changing one line. MySQLi works too, but it's MySQL-only. Both use the same prepared statement concept under the hood.
There are two placeholder styles in PDO: named placeholders (:email) and positional placeholders (?). Named placeholders are more readable, especially when you have many parameters. Positional are slightly more concise. Pick one style and stay consistent within a project.
Notice in the code below that receives an array of actual values. PDO sends those values over a separate channel from the SQL text. Even if the value contains execute()'; DROP TABLE users; --, MySQL sees it as a string value for the email column — not SQL commands. The injection attempt is completely inert.
false forces PDO to use MySQL's native prepared statement protocol, where query and data travel separately at the network level. Always set it.MySQLi Prepared Statements and Binding — When You're Working With MySQLi
PDO is the modern default, but you'll encounter MySQLi — especially in WordPress plugins, legacy codebases, and older hosting environments. MySQLi's prepared statement API is slightly more verbose, using bind_param() to explicitly declare the data types of each parameter.
The type string in is critical and often misunderstood. Each character represents a parameter: bind_param()s for string, i for integer, d for double (float), b for blob. If you bind an integer as a string, MySQL coerces it — usually harmless, but it's sloppy. More importantly, getting this right proves you understand what's actually happening at the driver level.
MySQLi also supports which returns a familiar result object, but it requires the get_result()mysqlnd driver (which is standard on PHP 7+). The alternative works without bind_result()mysqlnd but is clunkier for multi-column results.
The security guarantee is identical to PDO: the SQL template and the data are sent to MySQL separately. Injection is structurally impossible regardless of what the user submits.
% wildcards to your PHP variable before binding — not inside the SQL template. Writing LIKE '%?%' is invalid syntax. The correct pattern is $pattern = '%' . $userInput . '%' then bind $pattern. The wildcards become part of the data value, which is exactly right.bind_param() type string (ssid) must match your actual column types — wrong types cause silent data corruption even if injection is prevented.bind_param() type strings must match column types exactly — s/i/d/b mismatch causes silent data corruption.Beyond Queries: Input Validation, Output Encoding, and Least Privilege
Prepared statements are your primary defence — but a real production application uses defence in depth. Think of it as layers: if one layer fails, the next one catches it.
Input Validation means rejecting data that doesn't fit your expected format before it touches the database. If a field expects a UK postcode, validate it with a regex. If it expects an integer ID, cast it with (int) immediately after reading from $_GET or $_POST. This isn't a substitute for prepared statements — it's an additional filter that catches garbage data early and makes your code's intent explicit.
Output Encoding is separate but critical. converts htmlspecialchars()<, >, &, and quotes into HTML entities. This prevents stored XSS — where an attacker injects JavaScript into your database that then executes in other users' browsers. Always encode output when rendering to HTML. Never echo database values raw.
Database Least Privilege is the most overlooked layer. Your PHP application's database user should only have the permissions it actually needs. A read-heavy app needs SELECT only. A typical CRUD app needs SELECT, INSERT, UPDATE, DELETE — but never DROP, CREATE, or GRANT. If an attacker somehow bypasses everything else, a restricted DB user stops them from deleting tables or creating backdoor accounts.
htmlspecialchars() prevents XSS, input validation rejects malformed data, and a least-privilege DB user limits blast radius if something does get through.htmlspecialchars() prevents XSS — these are separate layers, both required.The Hidden Danger: Dynamic Queries in ORDER BY, LIMIT, and Table Names
You think prepared statements cover everything? They don't. Parameters only work for data values — not for SQL structural elements like column names, table names, or ORDER BY directions. When you inject user input into these positions, prepared statements can't help you. The attacker owns your query structure.
The real problem: developers assume parameterization is a magic bullet. It's not. That LIMIT and OFFSET in your pagination? Still vulnerable. That sortable table column? Also vulnerable. The fix is strict whitelisting. Map user input to known valid values before it ever touches SQL. If the user wants to sort by 'email', you check: is 'email' in your approved list of column names? Yes — use it. No — default to 'id'. Same for LIMIT. Cast it to an integer. If it fails, use a sensible default.
This isn't about being clever. It's about accepting that parameterized queries have sharp edges. Know where they end and build hard checks for what's left.
Defense in Depth: Why Input Validation Is Your Second Wall, Not Your First
I hear this every week: 'But we validate all inputs, so SQL injection is impossible.' Wrong. Input validation is a mitigation, not a cure. You validate for business rules, not security. The real cure is structural separation of code and data — prepared statements. Validation stops garbage. Prepared statements stop injection.
Here's the order I teach juniors: First, parameterize everything that can be parameterized. Second, validate what can't — column names, sort orders, file paths. Third, encode output for its context (HTML entities for HTML, hex for URLs). Fourth, apply least privilege to your database user. That application user does not need DROP TABLE or ALTER privileges.
Why this order? Because validation fails sometimes. A regex that allows 'Robert'); DROP TABLE Students;--' through? It happens. But if that string never becomes SQL code because you used a parameter, it's just a weird name in the database. Validation is your backup parachute, not your primary flight system. Build your layers in the right order.
A Login Bypass Exposed 340,000 User Records via Classic SQL Injection
addslashes() and a length check — the developer assumed this was sufficient.- addslashes() and
mysqli_real_escape_string()are not injection-proof — they are workarounds, not fixes - Prepared statements are the only solution that makes injection structurally impossible
- Set PDO::ATTR_EMULATE_PREPARES to false — the default still builds SQL strings in PHP
query() or exec() calls with string concatenation alongside your prepared statement code.Key takeaways
htmlspecialchars() (XSS), input validation (business logic), least-privilege DB user (blast radius reduction) — all four layers are required.Common mistakes to avoid
3 patternsSetting PDO::ATTR_EMULATE_PREPARES to true (or leaving it as the default)
Forgetting htmlspecialchars() when rendering database output to HTML
Building dynamic column names or ORDER BY clauses from user input without whitelisting
Interview Questions on This Topic
What is the difference between parameterised queries and input sanitisation, and why is sanitisation alone insufficient?
Frequently Asked Questions
20+ years shipping production PHP systems at scale. Notes here come from systems that actually shipped.
That's PHP & MySQL. Mark it forged?
7 min read · try the examples if you haven't