Visual reference for microservices architecture — API gateway, services, message bus, per-service databases, service discovery and observability.
| Pattern | Purpose | Key Rule |
|---|---|---|
| API Gateway | Single entry point for all clients | Handles auth, rate limiting, routing, SSL termination |
| Service per Database | Each service owns its data store | No service directly queries another's DB |
| Event-Driven / Message Bus | Async communication between services | Producer emits events; consumers react independently |
| Service Discovery | Services find each other dynamically | Client-side (Eureka) or server-side (AWS ALB) lookup |
| Circuit Breaker | Prevent cascade failures | Open circuit after N failures; half-open to retry |
| Saga Pattern | Distributed transactions across services | Choreography (events) or Orchestration (central coordinator) |
| Strangler Fig | Migrate monolith incrementally | Route traffic to new services one feature at a time |
| Type | Protocol | Use When |
|---|---|---|
| Synchronous | REST / gRPC / GraphQL | Immediate response required (queries, user-facing requests) |
| Asynchronous | Kafka / RabbitMQ / SQS | Fire-and-forget, eventual consistency acceptable |
| Streaming | Kafka / gRPC streaming | Continuous data (logs, events, real-time pipelines) |
| Service Mesh | Istio / Linkerd (sidecar) | mTLS, retries, observability without app code changes |
| Strategy | Trade-off | Pattern |
|---|---|---|
| Database per service | No shared state | Each service picks its own DB type (SQL, NoSQL, cache) |
| CQRS | Separate read/write models | Write to command store; project to read-optimised view |
| Event Sourcing | Full audit trail, replayable | Store events not current state; rebuild from log |
| Shared cache | Cross-service read speed | Redis/Memcached for session, rate limits, hot data |
| Pillar | Tool Examples | What to Track |
|---|---|---|
| Metrics | Prometheus + Grafana | Request rate, error rate, latency (RED), CPU/memory (USE) |
| Logs | ELK Stack / Loki | Structured JSON logs with correlation ID per request |
| Traces | Jaeger / Zipkin / OTEL | Full request path across services with span timings |
| Health checks | /health, /ready endpoints | Liveness (is it alive?) vs Readiness (can it serve traffic?) |
| Pattern | Risk | Rollback |
|---|---|---|
| Blue-Green | Low — instant switch | Swap traffic back to old environment |
| Canary | Low — gradual rollout | Reduce canary weight to 0% |
| Rolling | Medium — mixed versions briefly | Pause rollout, re-deploy old version |
| Feature Flags | Low — code ships dark | Toggle flag off instantly |
| A/B Testing | Low — intentional split | Route all to control variant |
| Pattern | What It Does | Tool / Standard |
|---|---|---|
| mTLS (mutual TLS) | Both client and server authenticate with certificates | Istio, Linkerd — automatic in service mesh |
| JWT / OAuth2 | Stateless auth token passed in Authorization header | Keycloak, Auth0, AWS Cognito |
| API Gateway Auth | Validate token once at gateway, forward identity | Kong, AWS API Gateway, Nginx |
| Secrets Management | Inject secrets at runtime, never bake into images | HashiCorp Vault, AWS Secrets Manager, K8s Secrets |
| Zero-Trust Networking | Never trust, always verify — even internal traffic | Network policies + mTLS + RBAC |
| Rate Limiting | Protect services from abuse and DDoS | API Gateway, Redis token bucket, Envoy |
| Input Validation | Validate at each service boundary, not just gateway | Never assume upstream has sanitised input |
| Level | What to Test | Tool Examples |
|---|---|---|
| Unit | Business logic in isolation, no I/O | JUnit, pytest, Jest — mock all dependencies |
| Integration | Service + its own DB/cache/queue | Testcontainers — spin up real dependencies |
| Contract (Consumer-Driven) | API contract between consumer and provider | Pact — catch breaking changes before deploy |
| Component | Single service end-to-end with mocked downstream | WireMock, MockServer |
| E2E | Full user journey across services | Playwright, Cypress, Postman/Newman — keep suite small |
| Chaos / Resilience | Service behaviour under failure conditions | Chaos Monkey, Gremlin, Toxiproxy |
| Performance / Load | Throughput, latency, autoscaling behaviour | k6, Gatling, Locust |