PKI Explained: How Certificates Actually Work in Production
- A certificate is a signed claim, not a secret β its value comes entirely from the trustworthiness of the CA that signed it, which is why a misconfigured trust store is a far bigger security risk than a weak password on the keystore file itself.
- Missing Intermediate CA certificate in your server config is the single most common PKI incident in production β it passes all your own tests because your tooling does AIA fetching, then silently breaks every mobile client in production.
- Reach for short-lived certificates (24-hour TTL) with automated rotation the moment you're managing more than a handful of internal service identities β OCSP and CRL are the right answer to the wrong question at that scale.
A fintech startup I consulted for lost six hours of payment processing because a certificate issued by their internal CA expired at 2:47 AM on a Tuesday. Their monitoring caught nothing β the service didn't crash, it just silently rejected every TLS handshake with PKIX path building failed: unable to find valid certification path to requested target. Six engineers stared at perfectly healthy application logs while $340k in transactions queued. PKI didn't fail loudly. It failed quietly, at the edges, in a way nobody had written a runbook for.
PKI β Public Key Infrastructure β is the trust plumbing underneath every HTTPS connection, every signed JWT, every mTLS service mesh, and every code-signing pipeline you've ever touched. It answers one deceptively hard question: how do two strangers on a network prove to each other that they are who they claim to be, without having met before? Symmetric keys don't scale β you can't pre-share a secret with every website on the internet. PKI solves this with asymmetric cryptography layered over a hierarchy of trusted authorities. Get it right and it's invisible. Get it wrong and you're the 3 AM war room.
After this you'll be able to read a certificate chain and understand exactly what each field means and why it matters, trace a TLS handshake step by step and know where it can break, build and rotate certificates in a real service without downtime, debug the six most common certificate errors without guessing, and design an internal PKI for a microservices environment that won't bite you six months later.
Asymmetric Cryptography: The Math That Makes Trust Possible
Before PKI existed, encrypting traffic between two servers meant pre-sharing a secret key out-of-band β email it, phone it in, bake it into a config file checked into git (yes, this still happens). That doesn't scale, and it means anyone who intercepts the key exchange owns all past and future traffic. The entire premise of PKI is that you can publish a key openly, and doing so doesn't compromise you.
Asymmetric cryptography gives you a key pair: a public key you broadcast freely and a private key you guard with your life. Anything encrypted with the public key can only be decrypted by the matching private key. More importantly for PKI, anything signed with the private key can be verified by anyone holding the public key β without the verifier ever touching the private key. That second property is what makes certificates work.
A certificate is just a structured document that says: 'This public key belongs to example.com, and I, DigiCert, am signing this claim with my own private key.' Your browser holds DigiCert's public key (via the trust store), verifies DigiCert's signature, and concludes the public key genuinely belongs to example.com. The private key for example.com never travels over the wire. Not once. That's the whole trick.
RSA-2048 was the default for years. Today you should default to ECDSA P-256 β smaller keys, faster handshakes, equivalent or better security. RSA-4096 is not meaningfully more secure than RSA-2048 against current threats, but it's noticeably slower. Don't reach for it unless a compliance checkbox forces your hand.
package io.thecodeforge.dsa; import java.security.*; import java.security.spec.ECGenParameterSpec; import java.util.Base64; /** * Demonstrates the asymmetric signing primitive that underlies every * certificate verification in PKI. This is NOT a full PKI implementation β * it's the cryptographic foundation you need to understand before certificates * make sense. * * Production context: a payment gateway signing webhook payloads so the * receiving merchant can verify the payload wasn't tampered with in transit. */ public class AsymmetricSigningDemo { public static void main(String[] args) throws Exception { // --- KEY GENERATION --- // ECDSA with P-256 curve: the modern default. Prefer this over RSA-2048 // for new systems. Smaller key, faster ops, same effective security. KeyPairGenerator keyPairGenerator = KeyPairGenerator.getInstance("EC"); keyPairGenerator.initialize(new ECGenParameterSpec("secp256r1"), new SecureRandom()); KeyPair gatewayKeyPair = keyPairGenerator.generateKeyPair(); PublicKey gatewayPublicKey = gatewayKeyPair.getPublic(); PrivateKey gatewayPrivateKey = gatewayKeyPair.getPrivate(); // Simulate: gateway publishes its public key to merchants during onboarding. // This key is not secret β it's meant to be distributed. System.out.println("Gateway Public Key (share this openly):"); System.out.println(Base64.getEncoder().encodeToString(gatewayPublicKey.getEncoded())); System.out.println(); // --- SIGNING (happens inside the gateway before dispatching webhook) --- String webhookPayload = "{\"event\":\"payment.captured\",\"amount\":4999,\"currency\":\"GBP\"}"; Signature signer = Signature.getInstance("SHA256withECDSA"); signer.initSign(gatewayPrivateKey); // private key NEVER leaves this service signer.update(webhookPayload.getBytes()); byte[] signature = signer.sign(); String encodedSignature = Base64.getEncoder().encodeToString(signature); System.out.println("Webhook payload : " + webhookPayload); System.out.println("Signature (send in X-Gateway-Signature header): " + encodedSignature); System.out.println(); // --- VERIFICATION (happens inside the merchant's webhook handler) --- // The merchant only needs the public key β never touches the private key. Signature verifier = Signature.getInstance("SHA256withECDSA"); verifier.initVerify(gatewayPublicKey); // public key used for verification verifier.update(webhookPayload.getBytes()); boolean isAuthentic = verifier.verify(Base64.getDecoder().decode(encodedSignature)); System.out.println("Signature valid? " + isAuthentic); // true: payload is genuine // --- TAMPER DETECTION --- // Simulate a man-in-the-middle modifying the amount String tamperedPayload = "{\"event\":\"payment.captured\",\"amount\":1,\"currency\":\"GBP\"}"; Signature tamperedVerifier = Signature.getInstance("SHA256withECDSA"); tamperedVerifier.initVerify(gatewayPublicKey); tamperedVerifier.update(tamperedPayload.getBytes()); // different bytes β different hash boolean tamperedResult = tamperedVerifier.verify(Base64.getDecoder().decode(encodedSignature)); System.out.println("Tampered payload valid? " + tamperedResult); // false: tampering detected } }
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE[...base64 encoded public key...]
Webhook payload : {"event":"payment.captured","amount":4999,"currency":"GBP"}
Signature (send in X-Gateway-Signature header): MEYCIQDn[...base64 encoded signature...]
Signature valid? true
Tampered payload valid? false
Certificate Chains and Trust Stores: Why Your Browser Trusts Anything at All
Here's what most explanations skip: a certificate by itself proves nothing. I could generate a certificate right now that says 'This is google.com' β takes about 30 seconds with OpenSSL. The certificate is cryptographically valid. But it's meaningless unless someone your browser already trusts has signed it.
This is the chain of trust. Every certificate is signed by a Certificate Authority (CA). That CA's certificate is signed by a Root CA. Root CA certificates are self-signed β they vouch for themselves β and they're valuable precisely because your OS vendor (Microsoft, Apple, Linux distro maintainers) manually vetted them and baked them into the trust store. On a JVM, that's the cacerts file inside your JRE. On Linux it's /etc/ssl/certs. On macOS it's the Keychain. These are the hardcoded mayors.
In practice, Root CAs don't sign end-entity certificates directly β the Root CA private key is kept offline in a literal hardware vault. Instead they sign Intermediate CA certificates, which do the day-to-day signing. This is deliberate: if an Intermediate gets compromised, you revoke that Intermediate without touching the Root. The chain looks like: Root CA β Intermediate CA β Your Certificate.
When your service presents a certificate during a TLS handshake, it must send the full chain β its own certificate plus every Intermediate. The Root is omitted because the client already has it in the trust store. Miss an Intermediate and you get the SSL_ERROR_RX_RECORD_TOO_LONG error that burns junior engineers for half a day. It's not the record β it's an incomplete chain causing an unexpected handshake failure.
package io.thecodeforge.dsa; import javax.net.ssl.*; import java.io.FileInputStream; import java.security.KeyStore; import java.security.cert.*; import java.util.Arrays; /** * Production utility: inspect the certificate chain returned by a live TLS * endpoint. Run this against your own services during CI to catch chain * issues before they reach production. * * Real-world use: added to a fintech deployment pipeline after an Intermediate * CA cert was accidentally omitted from the Nginx config, causing mobile clients * (which don't do AIA fetching) to fail while desktop browsers succeeded. */ public class CertificateChainInspector { public static void main(String[] args) throws Exception { String targetHost = "api.example.com"; // replace with your service host int targetPort = 443; // Build an SSL context backed by the default JVM trust store (cacerts). // This reflects exactly what your service-to-service calls will see. SSLContext sslContext = SSLContext.getDefault(); SSLSocketFactory socketFactory = sslContext.getSocketFactory(); // Open the TLS connection and capture the certificate chain presented // by the server β this is what the handshake actually receives. try (SSLSocket sslSocket = (SSLSocket) socketFactory.createSocket(targetHost, targetPort)) { sslSocket.startHandshake(); // triggers the full TLS handshake SSLSession session = sslSocket.getSession(); Certificate[] peerCertificates = session.getPeerCertificates(); System.out.printf("Chain length: %d (should be 2 or 3 β if 1, Intermediate is missing)%n", peerCertificates.length); System.out.println(); // Index 0 is always the leaf (end-entity) certificate. // Index 1 is the Intermediate CA. // Index 2 (if present) is a second Intermediate or the Root. for (int i = 0; i < peerCertificates.length; i++) { X509Certificate cert = (X509Certificate) peerCertificates[i]; System.out.printf("=== Certificate [%d] ===%n", i); System.out.println("Subject : " + cert.getSubjectX500Principal().getName()); System.out.println("Issuer : " + cert.getIssuerX500Principal().getName()); System.out.println("Valid from : " + cert.getNotBefore()); System.out.println("Expires : " + cert.getNotAfter()); // track this in alerting // Key usage tells you what this certificate is authorised to do. // End-entity certs should NOT have keyCertSign set β only CAs should. boolean[] keyUsage = cert.getKeyUsage(); if (keyUsage != null) { System.out.println("Key Usage : " + Arrays.toString(keyUsage)); // keyUsage[5] == true means keyCertSign β a red flag on a leaf cert if (keyUsage[5]) { System.out.println(" β WARNING: keyCertSign is set β this cert can sign other certs"); } } // SAN (Subject Alternative Names) is what modern TLS checks for hostname matching. // CN matching was deprecated in RFC 2818 β if there's no SAN, expect errors in // Chrome 58+ and Java 8u181+ with the 'No subject alternative names present' message. try { Collection<List<?>> sans = cert.getSubjectAlternativeNames(); if (sans != null) { System.out.println("SANs :"); for (List<?> san : sans) { // Type 2 = dNSName, Type 7 = iPAddress System.out.println(" type=" + san.get(0) + " value=" + san.get(1)); } } else { System.out.println(" β WARNING: No SANs β hostname validation will fail in modern clients"); } } catch (CertificateParsingException e) { System.out.println(" Could not parse SANs: " + e.getMessage()); } System.out.println(); } } } }
=== Certificate [0] ===
Subject : CN=api.example.com
Issuer : CN=DigiCert TLS RSA SHA256 2020 CA1, O=DigiCert Inc, C=US
Valid from : Mon Jan 15 00:00:00 UTC 2024
Expires : Wed Feb 12 23:59:59 UTC 2025
Key Usage : [true, false, false, false, false, false, false, false, false]
SANs :
type=2 value=api.example.com
type=2 value=www.api.example.com
=== Certificate [1] ===
Subject : CN=DigiCert TLS RSA SHA256 2020 CA1, O=DigiCert Inc, C=US
Issuer : CN=DigiCert Global Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US
Valid from : Wed Sep 23 00:00:00 UTC 2020
Expires : Mon Sep 22 23:59:59 UTC 2030
Key Usage : [true, false, true, false, false, true, false, false, false]
β WARNING: keyCertSign is set β this cert can sign other certs
=== Certificate [2] ===
Subject : CN=DigiCert Global Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US
Issuer : CN=DigiCert Global Root CA, OU=www.digicert.com, O=DigiCert Inc, C=US
Valid from : Fri Nov 10 00:00:00 UTC 2006
Expires : Mon Nov 10 00:00:00 UTC 2031
mTLS and Internal PKI: Certificate Management in a Real Microservices Mesh
Server-side TLS proves the server is who it claims to be. Mutual TLS (mTLS) goes both ways β the client also presents a certificate, and the server validates it. This is the right authentication model for service-to-service communication inside a microservices architecture. No shared secrets, no API keys rotated manually, no JWT signing keys sitting in environment variables.
The problem is that mTLS at scale requires issuing, rotating, and revoking potentially thousands of short-lived certificates β one per service identity. Managing this by hand is how you end up with a spreadsheet of certificate expiry dates that someone stops updating six months in. The answer is an internal CA, and the operational answer to the scale problem is automating issuance using something like HashiCorp Vault PKI Secrets Engine or cert-manager on Kubernetes.
Short-lived certificates are the modern answer to revocation. Traditional CRL (Certificate Revocation List) and OCSP (Online Certificate Status Protocol) are both operationally painful β CRLs are downloaded periodically and can be stale, OCSP requires a real-time round-trip that adds latency and creates a liveness dependency. If you issue certificates with a 24-hour TTL and rotate them automatically, a compromised certificate expires before you've finished your incident response. Revocation becomes a non-issue.
I've seen teams fight for months trying to get OCSP stapling right on Nginx inside a Kubernetes cluster, only to eventually migrate to 24-hour cert rotation via Vault and throw away the entire OCSP infrastructure. Don't start with OCSP for internal PKI. Start with short TTLs and automated rotation.
package io.thecodeforge.dsa; import javax.net.ssl.*; import java.io.FileInputStream; import java.net.URI; import java.net.http.*; import java.security.KeyStore; import java.time.Duration; /** * Production pattern: an order-processing service calling an inventory service * using mTLS. Both services have certificates issued by the same internal CA. * The calling service presents its own certificate β the server validates it * against the internal CA trust store, not a public CA. * * Prerequisite: generate keystores with: * keytool -genkeypair -alias order-service -keyalg EC -groupname secp256r1 * -keystore order-service-keystore.p12 -storetype PKCS12 -validity 1 * (validity 1 day β short-lived certificates, rotated via Vault or cert-manager) * * The internal CA cert goes into the truststore: * keytool -import -alias internal-ca -file internal-ca.crt * -keystore internal-ca-truststore.p12 -storetype PKCS12 */ public class MutualTlsServiceClient { // Paths to keystores β in production these come from Vault agent injection // or a Kubernetes secret mounted as a volume, NOT from environment variables. private static final String KEYSTORE_PATH = "/etc/certs/order-service-keystore.p12"; private static final String TRUSTSTORE_PATH = "/etc/certs/internal-ca-truststore.p12"; // In production: read from Vault or a secrets manager, never hardcoded. private static final char[] KEYSTORE_PASSWORD = "changeit".toCharArray(); private static final char[] TRUSTSTORE_PASSWORD = "changeit".toCharArray(); public static void main(String[] args) throws Exception { HttpClient mtlsClient = buildMtlsHttpClient(); HttpRequest inventoryRequest = HttpRequest.newBuilder() .uri(URI.create("https://inventory-service.internal:8443/api/v1/stock/SKU-99821")) .timeout(Duration.ofSeconds(5)) .header("Accept", "application/json") .GET() .build(); HttpResponse<String> response = mtlsClient.send( inventoryRequest, HttpResponse.BodyHandlers.ofString() ); System.out.println("Status : " + response.statusCode()); System.out.println("Body : " + response.body()); } private static HttpClient buildMtlsHttpClient() throws Exception { // --- KEYSTORE: our identity --- // Contains the order-service's private key and its certificate. // Presented to the inventory service during the TLS handshake. KeyStore identityKeystore = KeyStore.getInstance("PKCS12"); try (FileInputStream keystoreStream = new FileInputStream(KEYSTORE_PATH)) { identityKeystore.load(keystoreStream, KEYSTORE_PASSWORD); } KeyManagerFactory keyManagerFactory = KeyManagerFactory.getInstance( KeyManagerFactory.getDefaultAlgorithm() // SunX509 ); keyManagerFactory.init(identityKeystore, KEYSTORE_PASSWORD); // --- TRUSTSTORE: who we trust --- // Contains only the internal CA certificate, NOT the public CA bundle. // This prevents any publicly-trusted cert from impersonating an internal service. KeyStore internalTruststore = KeyStore.getInstance("PKCS12"); try (FileInputStream truststoreStream = new FileInputStream(TRUSTSTORE_PATH)) { internalTruststore.load(truststoreStream, TRUSTSTORE_PASSWORD); } TrustManagerFactory trustManagerFactory = TrustManagerFactory.getInstance( TrustManagerFactory.getDefaultAlgorithm() // PKIX ); trustManagerFactory.init(internalTruststore); // scoped to internal CA only // --- SSL CONTEXT --- // TLSv1.3 only. Drop TLS 1.0 and 1.1 β they're deprecated and broken. // TLS 1.2 is acceptable if you have legacy services that don't support 1.3 yet. SSLContext sslContext = SSLContext.getInstance("TLSv1.3"); sslContext.init( keyManagerFactory.getKeyManagers(), // our certificate presented to server trustManagerFactory.getTrustManagers(), // CAs we accept server certs from null // SecureRandom: null means JVM default (acceptable for most cases) ); return HttpClient.newBuilder() .sslContext(sslContext) .connectTimeout(Duration.ofSeconds(3)) .version(HttpClient.Version.HTTP_2) // HTTP/2 over TLS 1.3 β no overhead .build(); } }
Body : {"sku":"SKU-99821","available":142,"reserved":18,"warehouse":"LHR-1"}
Certificate Rotation Without Downtime: The Operational Part Nobody Teaches
Getting PKI conceptually right is the easy part. Operating it without 3 AM incidents is where engineers earn their keep. Certificates expire β that's not a bug, it's the security model working as intended. Your job is to make rotation invisible to traffic.
The biggest operational mistake I see is treating certificate rotation as a one-time manual task. It is not. In a production system with dozens of services, certificate rotation must be automated and observable. Every certificate in your infrastructure needs an expiry date in your monitoring system, with alerts at 30 days, 14 days, and 7 days. The cert that killed that fintech's payments? Nobody had an alert. They found out from Stripe's webhook delivery failure logs.
The secret to zero-downtime rotation is dual-cert support during the transition window. Load balancers and web servers support presenting a new certificate while the old one is still valid β the server picks based on the SNI hostname from the client. During rotation, both certs are active simultaneously. Traffic gradually cuts over as clients reconnect. Once the old cert is within 24 hours of expiry with near-zero active sessions, you pull it.
For services using mTLS client certificates, rotation is trickier because both sides must trust both old and new certificates during the window. The server's truststore needs to contain both the old client cert and the new one during rotation. Automate this with cert-manager's certificate rotation hooks or Vault's PKI rotate endpoint, and build a /health/cert endpoint into every service that returns expiry dates β make it part of your readiness probe.
package io.thecodeforge.dsa; import java.io.FileInputStream; import java.security.KeyStore; import java.security.cert.X509Certificate; import java.time.Duration; import java.time.Instant; import java.util.*; /** * Production health check: expose certificate expiry for every cert in the * service's keystore as a structured health endpoint. * * Wire this into Spring Boot Actuator, Dropwizard HealthCheck, or your custom * /health endpoint. Feed the output into your APM (Datadog, New Relic, etc.) * as a custom metric: cert.days_until_expiry, tagged by alias. * * Rule: if any cert expires within CRITICAL_THRESHOLD_DAYS, the health check * returns DEGRADED and PagerDuty fires. This is non-negotiable. */ public class CertificateExpiryHealthIndicator { // Alert at 30 days warning, page at 7 days critical. // These numbers came from a post-incident review β 14 days wasn't enough // buffer for the procurement cycle at one enterprise client. private static final long WARNING_THRESHOLD_DAYS = 30; private static final long CRITICAL_THRESHOLD_DAYS = 7; public static void main(String[] args) throws Exception { String keystorePath = "/etc/certs/order-service-keystore.p12"; char[] keystorePassword = "changeit".toCharArray(); List<CertificateExpiryReport> report = inspectKeystore(keystorePath, keystorePassword); HealthStatus overallStatus = HealthStatus.HEALTHY; for (CertificateExpiryReport entry : report) { System.out.printf("Alias: %-30s | Expires: %s | Days remaining: %d | Status: %s%n", entry.alias(), entry.expiresAt(), entry.daysRemaining(), entry.status()); // Escalate overall status to the worst individual status if (entry.status() == HealthStatus.CRITICAL) { overallStatus = HealthStatus.CRITICAL; } else if (entry.status() == HealthStatus.WARNING && overallStatus != HealthStatus.CRITICAL) { overallStatus = HealthStatus.WARNING; } } System.out.println(); System.out.println("Overall certificate health: " + overallStatus); // In production: if CRITICAL, throw an exception to fail the readiness probe. // This prevents Kubernetes from sending traffic to a service with an expired cert // that will cause every downstream mTLS handshake to fail. if (overallStatus == HealthStatus.CRITICAL) { throw new CertificateCriticalException( "One or more certificates expire within " + CRITICAL_THRESHOLD_DAYS + " days. Rotation required immediately." ); } } private static List<CertificateExpiryReport> inspectKeystore( String keystorePath, char[] password) throws Exception { KeyStore keystore = KeyStore.getInstance("PKCS12"); try (FileInputStream inputStream = new FileInputStream(keystorePath)) { keystore.load(inputStream, password); } List<CertificateExpiryReport> reports = new ArrayList<>(); Enumeration<String> aliases = keystore.aliases(); while (aliases.hasMoreElements()) { String alias = aliases.nextElement(); // Only inspect certificate entries β skip private key entries without certs if (!keystore.isCertificateEntry(alias) && !keystore.isKeyEntry(alias)) continue; X509Certificate cert = (X509Certificate) keystore.getCertificate(alias); if (cert == null) continue; Instant expiresAt = cert.getNotAfter().toInstant(); long daysRemaining = Duration.between(Instant.now(), expiresAt).toDays(); HealthStatus status; if (daysRemaining <= 0) { status = HealthStatus.EXPIRED; // already dead } else if (daysRemaining <= CRITICAL_THRESHOLD_DAYS) { status = HealthStatus.CRITICAL; } else if (daysRemaining <= WARNING_THRESHOLD_DAYS) { status = HealthStatus.WARNING; } else { status = HealthStatus.HEALTHY; } reports.add(new CertificateExpiryReport(alias, expiresAt, daysRemaining, status)); } return reports; } // Structured report per certificate entry in the keystore record CertificateExpiryReport( String alias, Instant expiresAt, long daysRemaining, HealthStatus status ) {} enum HealthStatus { HEALTHY, WARNING, CRITICAL, EXPIRED } static class CertificateCriticalException extends RuntimeException { CertificateCriticalException(String message) { super(message); } } }
Alias: internal-ca | Expires: 2030-09-22T23:59:59Z | Days remaining: 2057 | Status: HEALTHY
Overall certificate health: CRITICAL
Exception in thread "main" io.thecodeforge.dsa.CertificateExpiryHealthIndicator$CertificateCriticalException: One or more certificates expire within 7 days. Rotation required immediately.
| Aspect | Public CA (DigiCert, Let's Encrypt) | Internal CA (Vault, cert-manager) |
|---|---|---|
| Trust scope | Trusted by all browsers and OS trust stores globally | Trusted only by systems you configure β internal services only |
| Certificate cost | Let's Encrypt: free; DigiCert OV/EV: $100-$1000+/year | Infrastructure cost only β issuance itself is free at scale |
| Issuance speed | Let's Encrypt: seconds (ACME); DV OV: minutes to days | Milliseconds via Vault API or cert-manager CertificateRequest |
| Revocation mechanism | CRL + OCSP β publicly accessible, some lag | Short TTL + immediate Vault lease revocation β no lag |
| Wildcard certificates | Supported (DNS-01 ACME challenge required for Let's Encrypt) | Supported β but short-lived per-service certs are safer |
| Suitable for mTLS client auth | Technically yes β operationally painful at scale | Yes β the correct tool for service-to-service identity |
| Certificate lifetime | 90 days (Let's Encrypt); 1 year max per CA/Browser Forum | Configurable β recommended 24 hours for internal services |
| Key compromise response | Submit revocation request, wait for CRL/OCSP propagation | Revoke Vault lease immediately β expires naturally within TTL |
| Compliance (PCI, SOC2) | Required for public-facing HTTPS; auditors expect public CA | Acceptable for internal traffic β document your CA controls |
| Observability tooling | SSL Labs, crt.sh, browser DevTools | Vault audit log, cert-manager Prometheus metrics, custom health checks |
π― Key Takeaways
- A certificate is a signed claim, not a secret β its value comes entirely from the trustworthiness of the CA that signed it, which is why a misconfigured trust store is a far bigger security risk than a weak password on the keystore file itself.
- Missing Intermediate CA certificate in your server config is the single most common PKI incident in production β it passes all your own tests because your tooling does AIA fetching, then silently breaks every mobile client in production.
- Reach for short-lived certificates (24-hour TTL) with automated rotation the moment you're managing more than a handful of internal service identities β OCSP and CRL are the right answer to the wrong question at that scale.
- The JVM's cacerts trust store is not static infrastructure β it gets updated with every JRE release, which means upgrading your JRE can silently remove trust for a Root CA your internal services depend on. Pin your internal CA explicitly in a separate truststore and never rely on it being in cacerts.
β Common Mistakes to Avoid
- βMistake 1: Configuring the JVM HttpClient or OkHttp with the default system trust store for internal mTLS calls β symptom: PKIX path building failed: unable to find valid certification path to requested target even though the internal CA cert is clearly present β fix: create a custom SSLContext with TrustManagerFactory initialized against your internal CA truststore, not TrustManagerFactory.getDefaultAlgorithm() against the default keystore
- βMistake 2: Omitting the Intermediate CA certificate when configuring Nginx or Tomcat β symptom: works fine in Chrome and curl (both do AIA fetching), fails on all Android and iOS clients with javax.net.ssl.SSLHandshakeException: java.security.cert.CertPathValidatorException: Trust anchor for certification path not found β fix: concatenate leaf cert + intermediate cert in your ssl_certificate PEM file, in that order; verify chain completeness with openssl s_client -connect host:443 -showcerts
- βMistake 3: Using CN (Common Name) instead of SAN (Subject Alternative Name) for hostname binding in new certificates β symptom: javax.net.ssl.SSLPeerUnverifiedException: No subject alternative names present in Java 8u181+, and ERR_CERT_COMMON_NAME_INVALID in Chrome 58+ β fix: always include the hostname in the SAN extension (dNSName) when generating CSRs; CN alone has been deprecated since RFC 2818 in 2000 but tools still let you do it
- βMistake 4: Storing keystore passwords in environment variables or application.properties β symptom: no immediate error, but credentials surface in process listings (ps aux), Docker inspect, and log aggregation when the app prints its config at startup β fix: use Vault Agent sidecar injection or Kubernetes External Secrets to mount the password as a file, read it once at startup, then zero out the char[] immediately after KeyStore.load()
Interview Questions on This Topic
- QWalk me through what happens, step by step, when a Java service makes an HTTPS call to an external API and the TLS handshake fails with PKIX path building failed. What are the three most likely root causes in production, and how do you diagnose each one without restarting the service?
- QYour new microservices architecture has 80 services doing service-to-service calls. You need to choose between API key authentication and mTLS for service identity. The security team wants mTLS. The platform team says the operational overhead is too high. How do you resolve this, what does your certificate lifecycle automation look like, and what certificate TTL do you choose and why?
- QA developer on your team says: 'I configured OCSP stapling in Nginx for our internal services so we can revoke certificates immediately if a key is compromised.' What's wrong with this approach for an internal microservices mesh, and what would you replace it with?
- QYour internal CA root certificate is valid for 10 years. One of your engineers suggests making it 20 years to reduce operational burden. What are the security and operational arguments for and against a long-lived Root CA certificate, and how does the offline key storage model affect your answer?
Frequently Asked Questions
Why does my HTTPS connection work in the browser but fail in my Java application with PKIX path building failed?
Your browser uses the OS trust store and performs AIA fetching to download missing Intermediate CAs automatically β your JVM does neither by default. The JVM checks only its own cacerts trust store (or whatever SSLContext you configured), and if the server's certificate chain is incomplete or the Root CA isn't in cacerts, it hard-fails. Fix it by either importing the missing CA into your JVM's cacerts with keytool -import, or building a custom SSLContext backed by a truststore that contains the CA. For internal CAs, always use a custom truststore β never import internal CAs into the global cacerts.
What's the difference between a keystore and a truststore in Java TLS?
A keystore holds your own private key and certificate β your identity. A truststore holds certificates of CAs you're willing to trust β your list of acceptable identities. In a standard HTTPS client you only need a truststore. In mTLS you need both: the truststore to validate the server's cert, and the keystore to present your own cert to the server. The JVM uses the same KeyStore class for both β the distinction is purely how you wire it into KeyManagerFactory versus TrustManagerFactory.
How do I rotate a TLS certificate in production without dropping active connections?
Add the new certificate to your load balancer or reverse proxy alongside the old one before the old one expires β most tools (Nginx, HAProxy, AWS ALB) support multiple certificates per listener and select by SNI. Keep both active simultaneously for at least one full connection timeout window (typically 15-30 minutes). Once traffic has drained from the old cert's sessions, remove it. For mTLS client certs, the server's truststore must contain both old and new CA certs during the rotation window. Automate the whole sequence with cert-manager or Vault's PKI Secrets Engine β manual rotation is how you create 3 AM incidents.
What actually happens when a Root CA certificate expires, and how do you plan for it?
When a Root CA expires, every certificate it signed β and every Intermediate CA it signed β becomes immediately invalid from the perspective of clients that enforce expiry on the trust anchor. This is not theoretical: in May 2021, the AddTrust External CA Root expired and broke thousands of services whose clients validated the full chain including the root, despite the same root cross-signing under a newer CA. The fix isn't just renewing the root β you must update every client's trust store to include the new root before the old one expires. For internal PKI, start planning Root CA rotation at least 12 months before expiry: new root issuance, cross-signing with the old root, rolling out the new root to all truststores, then cutting over. Don't wait for the 30-day alert.
Developer and founder of TheCodeForge. I built this site because I was tired of tutorials that explain what to type without explaining why it works. Every article here is written to make concepts actually click.