This problem appears in multiple sheets. Depth expectations increase as you progress:
Interview Prompt
Design an API Gateway (like Kong or AWS API Gateway).
Clarifying Questions (ask before designing)
| Question | Why it matters |
|---|---|
| Are we designing the data plane or the control plane? | The data plane must be ultra-low latency; the control plane is a standard CRUD system. |
| What scale of traffic must it handle? | Determines if we need a highly optimized proxy core (like Nginx/Envoy) or if a Node.js/Go proxy is sufficient. |
| What cross-cutting features are required? | Rate limiting, auth, and logging require different integrations with external services (Redis, IAM, ELK). |
Scope
In scope
- Data plane (Request routing and proxying)
- Control plane (Configuration management)
- Plugin execution model (Auth, Rate Limiting)
- High availability and scaling
Out of scope (state explicitly)
- Detailed implementation of the backend microservices
- Billing systems for monetization
Assumptions
- Must add minimal latency (single digit milliseconds) to requests
- Must support dynamic configuration without dropping connections
- Will sit at the edge of the network handling millions of RPS
These foundational concepts underpin the patterns used in this problem. Review them before deep-diving into component-level trade-offs.
- Routing: Route incoming HTTP requests to the correct backend microservice based on URL paths and headers.
- Authentication & Authorization: Validate tokens (e.g., JWT) before forwarding requests.
- Rate Limiting: Prevent abuse by limiting the number of requests per user/IP over a time window.
- Load Balancing & Retry: Distribute traffic across healthy backend instances and retry failed requests.
- Dynamic Configuration: Routes and limits must be updatable without restarting the gateway (zero downtime).
- Ultra-Low Latency: The gateway sits in the critical path of every request. Overhead must be < 2ms.
- High Throughput: Must handle tens of thousands of concurrent connections efficiently.
- Fault Tolerance: The gateway must not crash if backend services fail (Circuit Breaking).
- Scalability: Easily horizontally scalable by adding more gateway nodes behind an L4 Load Balancer.
Modern API Gateways separate the Control Plane (configuration and management UI) from the Data Plane (the highly optimized proxy servers actually handling the traffic).
1. The Data Plane (Asynchronous I/O)
To achieve high throughput (10,000+ requests/sec per node), gateways like Nginx (which Kong is based on) or Envoy do not use a thread-per-connection model (like older Apache or Tomcat servers), which would exhaust CPU and RAM via context-switching overhead. Instead, they use an Event Loop architecture (epoll/kqueue) with non-blocking I/O. A single worker thread handles thousands of concurrent connections efficiently, dramatically reducing memory overhead.
2. Dynamic Configuration (Control Plane) ⭐
If an engineering team deploys a new microservice, the gateway needs to know the new route. If we have 50 gateway nodes, restarting them to update a YAML config file drops active connections and causes an outage. Modern gateways solve this by separating the Control Plane from the Data Plane. Gateway nodes maintain a long-polling watch (via gRPC/xDS) on a highly available configuration store (like etcd, Consul, or PostgreSQL).
// Control Plane saves new route to etcd/PostgreSQL:
PUT /v1/kv/gateway/routes/payment-service
{
"path": "/api/v1/payments/*",
"backend": "http://payment-service.internal:8080",
"plugins": ["rate-limit", "jwt-auth"]
}
// All Gateway Data Plane nodes hold a long-polling watch on /gateway/routes/
// They receive the update instantly and reload their routing tables in RAM
// WITHOUT restarting the process (Zero downtime).3. Distributed Rate Limiting
Rate limiting prevents DDoS attacks and API abuse. It must be enforced globally across all gateway nodes to prevent an attacker from bypassing the limit by hitting different nodes. We use a centralized Redis cluster for this.
Optimization (Local Caching vs Accuracy): Doing a synchronous Redis round-trip for every request adds 1-2ms of latency. To optimize, gateways often use asynchronous batching: they maintain local counters in RAM and flush them to Redis every second. This trades absolute rate-limit accuracy for much lower latency.
4. Authentication (Stateless JWT)
Making a database call to the User Service to validate a session token for every single API request would instantly crash the User Service database. Instead, the gateway uses stateless JSON Web Tokens (JWT).
Because the JWT is cryptographically signed via asymmetric keys (RSA256), the gateway can verify the token's authenticity entirely offline. It then trusts the claims (like user_id and roles) and safely injects them as HTTP headers (e.g., X-User-Id) for downstream microservices, ensuring internal services never have to worry about Auth.
5. Circuit Breaker Pattern
If the "Order Service" goes down, the gateway might queue up thousands of requests waiting for it to respond, eventually exhausting the gateway's own connection pool and causing a catastrophic cascading failure across the entire API platform.
- The Circuit Breaker tracks failure rates (e.g., 50% failures in 10 seconds).
- If the threshold is crossed, the circuit Opens. The gateway instantly rejects new requests to the Order Service with
503 Service Unavailablewithout even trying the network call, giving the backend time to recover. - After a timeout, it shifts to Half-Open, allowing a small percentage of test requests through. If they succeed, it Closes the circuit.
The Gateway itself exposes two entirely separate network interfaces: the Data Plane (port 8000) for routing user traffic, and the Control Plane Admin API (port 8001) for internal configuration.
// Control Plane Admin API: Add a new route
POST /admin/api/routes
{
"name": "payment-service-route",
"paths": ["/api/v1/payments"],
"methods": ["GET", "POST"],
"service": {
"host": "payment.internal",
"port": 8080
},
"plugins": [
{ "name": "rate-limiting", "config": { "minute": 100 } },
{ "name": "jwt" }
]
}The Gateway requires two completely different data storage systems: a persistent store for routing configuration, and a high-throughput, volatile store for rate limiting state.
// 1. Control Plane Metadata (PostgreSQL / etcd)
// Stores the declarative state of the Gateway
Table routes {
id UUID,
path_prefix VARCHAR,
upstream_service_id UUID
}
Table plugins {
id UUID,
route_id UUID,
plugin_type VARCHAR, // "rate_limit", "auth"
config JSONB // e.g., {"limit": 100, "window": "1m"}
}
// 2. Data Plane Rate Limiting State (Redis)
// Uses highly efficient Lua scripts for atomic increment + expiry
Key: "rate_limit:user_123:2023-10-01T12:05" (String or Hash)
Value: 42 (Current request count for this minute)
TTL: 60 seconds| Failure Case | System Solution Design |
|---|---|
| Redis Rate Limiter Crash | The Gateway must be configured to 'Fail Open'. If Redis is unreachable, allow the request through rather than taking down the entire API. |
| Control Plane Crash | Data Plane nodes cache their routing rules heavily in local memory. If the Control Plane goes down, traffic keeps flowing normally; we just temporarily lose the ability to add new routes. |
| Backend Service Timeout | Gateway enforces strict timeouts. If a backend replica times out, the Gateway can automatically retry idempotent requests (like HTTP GET) against a different healthy backend replica. |
API Gateway vs Service Mesh (Istio)
An API Gateway handles "North-South" traffic (external clients entering the datacenter). A Service Mesh handles "East-West" traffic (internal microservices talking to each other). While they share technologies (Envoy is often used for both), they serve different purposes. Gateways focus on Edge security (WAF, OAuth), while Service Meshes focus on internal mTLS, distributed tracing, and complex routing between hundreds of internal pods.
Staff interviews expect you to articulate how the system evolves under real growth — not jump straight to the final architecture.
Phase 1: Simple Reverse Proxy
A monolithic Nginx or HAProxy instance with a static configuration file.
Key components: Nginx · Static Config File
Move to next phase when: Restarting Nginx to add a route causes downtime; lacks advanced rate limiting.
Phase 2: Dynamic Gateway
Introduce a database to hold configurations and a control plane API. The gateway dynamically pulls routes from the DB.
Key components: Control Plane API · PostgreSQL DB · Redis (Rate Limiting) · Dynamic Data Plane
Move to next phase when: The database becomes a bottleneck; custom business logic is needed at the edge.
Phase 3: Extensible & Distributed
Adopt a robust proxy core (Envoy/Kong), separate control/data planes via gRPC streams (xDS), and support Wasm/Lua plugins.
Key components: Envoy / Kong Data Plane · xDS Control Plane · Plugin Engine · Global Anycast Edge
Move to next phase when: Global multi-region deployments require decentralized data planes.
SLOs & Error Budgets
| Metric | Target | Rationale |
|---|---|---|
| Added Latency | 99% < 5ms | The gateway is pure overhead; it must be blazing fast. |
| Availability | 99.999% | It is the front door. If it is down, the entire platform is down. |
| Configuration Reload Time | 99% < 100ms | Hot reloads must propagate near-instantly to avoid dropping connections. |
Incident Scenarios (2am reality)
| Scenario | How you detect | Mitigation |
|---|---|---|
| Bad Configuration Deployed | Global spike in 5xx errors originating from the gateway. | The control plane must support versioned configs and instant rollbacks. Implement canary deployments for gateway configs. |
| Redis goes down (Rate Limiting Dependency) | Gateway latency spikes as Redis connections time out. | Fail open. The gateway should bypass rate limiting rather than blocking traffic, ensuring core routing remains available. |
| DDoS Attack | Massive spike in ingress traffic from suspicious IPs. | Drop traffic at the edge (CDN/WAF level). Use IP-based rate limiting plugins. Implement CAPTCHA challenges for suspect traffic. |
Cost Drivers (Staff lens)
- Compute instances (Gateway must be over-provisioned to absorb spikes)
- Network Egress/Ingress bandwidth
- Redis cluster for rate limiting and session management
Multi-Region & DR
API Gateways are heavily localized. You deploy a cluster of gateways in every region. A global load balancer (Anycast) routes the user to the nearest gateway. The Control Plane can be centralized or regionally replicated.