Design an API Gateway (Kong / Envoy)

This problem appears in multiple sheets. Depth expectations increase as you progress:

Track	What to demonstrate
Arch 50	Focus on reverse proxying, routing, and basic cross-cutting concerns like rate limiting.
Arch 75	Staff angles: Data plane vs Control plane separation, plugin architectures, and zero-downtime config reloads.

Interview Prompt

Design an API Gateway (like Kong or AWS API Gateway).

Clarifying Questions (ask before designing)

Question	Why it matters
Are we designing the data plane or the control plane?	The data plane must be ultra-low latency; the control plane is a standard CRUD system.
What scale of traffic must it handle?	Determines if we need a highly optimized proxy core (like Nginx/Envoy) or if a Node.js/Go proxy is sufficient.
What cross-cutting features are required?	Rate limiting, auth, and logging require different integrations with external services (Redis, IAM, ELK).

Scope

In scope

Data plane (Request routing and proxying)
Control plane (Configuration management)
Plugin execution model (Auth, Rate Limiting)
High availability and scaling

Out of scope (state explicitly)

Detailed implementation of the backend microservices
Billing systems for monetization

Assumptions

Must add minimal latency (single digit milliseconds) to requests
Must support dynamic configuration without dropping connections
Will sit at the edge of the network handling millions of RPS

Routing: Route incoming HTTP requests to the correct backend microservice based on URL paths and headers.
Authentication & Authorization: Validate tokens (e.g., JWT) before forwarding requests.
Rate Limiting: Prevent abuse by limiting the number of requests per user/IP over a time window.
Load Balancing & Retry: Distribute traffic across healthy backend instances and retry failed requests.
Dynamic Configuration: Routes and limits must be updatable without restarting the gateway (zero downtime).

Modern API Gateways separate the Control Plane (configuration and management UI) from the Data Plane (the highly optimized proxy servers actually handling the traffic).

Loading...

1. The Data Plane (Asynchronous I/O)

To achieve high throughput (10,000+ requests/sec per node), gateways like Nginx (which Kong is based on) or Envoy do not use a thread-per-connection model (like older Apache or Tomcat servers), which would exhaust CPU and RAM via context-switching overhead. Instead, they use an Event Loop architecture (epoll/kqueue) with non-blocking I/O. A single worker thread handles thousands of concurrent connections efficiently, dramatically reducing memory overhead.

2. Dynamic Configuration (Control Plane) ⭐

If an engineering team deploys a new microservice, the gateway needs to know the new route. If we have 50 gateway nodes, restarting them to update a YAML config file drops active connections and causes an outage. Modern gateways solve this by separating the Control Plane from the Data Plane. Gateway nodes maintain a long-polling watch (via gRPC/xDS) on a highly available configuration store (like etcd, Consul, or PostgreSQL).

JSON

// Control Plane saves new route to etcd/PostgreSQL:
PUT /v1/kv/gateway/routes/payment-service
{
  "path": "/api/v1/payments/*",
  "backend": "http://payment-service.internal:8080",
  "plugins": ["rate-limit", "jwt-auth"]
}

// All Gateway Data Plane nodes hold a long-polling watch on /gateway/routes/
// They receive the update instantly and reload their routing tables in RAM
// WITHOUT restarting the process (Zero downtime).

3. Distributed Rate Limiting

Rate limiting prevents DDoS attacks and API abuse. It must be enforced globally across all gateway nodes to prevent an attacker from bypassing the limit by hitting different nodes. We use a centralized Redis cluster for this.

Loading...

Optimization (Local Caching vs Accuracy): Doing a synchronous Redis round-trip for every request adds 1-2ms of latency. To optimize, gateways often use asynchronous batching: they maintain local counters in RAM and flush them to Redis every second. This trades absolute rate-limit accuracy for much lower latency.

4. Authentication (Stateless JWT)

Making a database call to the User Service to validate a session token for every single API request would instantly crash the User Service database. Instead, the gateway uses stateless JSON Web Tokens (JWT).

Loading...

Because the JWT is cryptographically signed via asymmetric keys (RSA256), the gateway can verify the token's authenticity entirely offline. It then trusts the claims (like user_id and roles) and safely injects them as HTTP headers (e.g., X-User-Id) for downstream microservices, ensuring internal services never have to worry about Auth.

5. Circuit Breaker Pattern

If the "Order Service" goes down, the gateway might queue up thousands of requests waiting for it to respond, eventually exhausting the gateway's own connection pool and causing a catastrophic cascading failure across the entire API platform.

The Circuit Breaker tracks failure rates (e.g., 50% failures in 10 seconds).
If the threshold is crossed, the circuit Opens. The gateway instantly rejects new requests to the Order Service with 503 Service Unavailable without even trying the network call, giving the backend time to recover.
After a timeout, it shifts to Half-Open, allowing a small percentage of test requests through. If they succeed, it Closes the circuit.

The Gateway itself exposes two entirely separate network interfaces: the Data Plane (port 8000) for routing user traffic, and the Control Plane Admin API (port 8001) for internal configuration.

HTTP

// Control Plane Admin API: Add a new route
POST /admin/api/routes
{
  "name": "payment-service-route",
  "paths": ["/api/v1/payments"],
  "methods": ["GET", "POST"],
  "service": {
    "host": "payment.internal",
    "port": 8080
  },
  "plugins": [
    { "name": "rate-limiting", "config": { "minute": 100 } },
    { "name": "jwt" }
  ]
}

The Gateway requires two completely different data storage systems: a persistent store for routing configuration, and a high-throughput, volatile store for rate limiting state.

TYPESCRIPT

// 1. Control Plane Metadata (PostgreSQL / etcd)
// Stores the declarative state of the Gateway
Table routes {
  id UUID,
  path_prefix VARCHAR,
  upstream_service_id UUID
}

Table plugins {
  id UUID,
  route_id UUID,
  plugin_type VARCHAR, // "rate_limit", "auth"
  config JSONB         // e.g., {"limit": 100, "window": "1m"}
}

// 2. Data Plane Rate Limiting State (Redis)
// Uses highly efficient Lua scripts for atomic increment + expiry
Key: "rate_limit:user_123:2023-10-01T12:05" (String or Hash)
Value: 42 (Current request count for this minute)
TTL: 60 seconds

Failure Case	System Solution Design
Redis Rate Limiter Crash	The Gateway must be configured to 'Fail Open'. If Redis is unreachable, allow the request through rather than taking down the entire API.
Control Plane Crash	Data Plane nodes cache their routing rules heavily in local memory. If the Control Plane goes down, traffic keeps flowing normally; we just temporarily lose the ability to add new routes.
Backend Service Timeout	Gateway enforces strict timeouts. If a backend replica times out, the Gateway can automatically retry idempotent requests (like HTTP GET) against a different healthy backend replica.

SLOs & Error Budgets

Metric	Target	Rationale
Added Latency	99% < 5ms	The gateway is pure overhead; it must be blazing fast.
Availability	99.999%	It is the front door. If it is down, the entire platform is down.
Configuration Reload Time	99% < 100ms	Hot reloads must propagate near-instantly to avoid dropping connections.

Incident Scenarios (2am reality)

Scenario	How you detect	Mitigation
Bad Configuration Deployed	Global spike in 5xx errors originating from the gateway.	The control plane must support versioned configs and instant rollbacks. Implement canary deployments for gateway configs.
Redis goes down (Rate Limiting Dependency)	Gateway latency spikes as Redis connections time out.	Fail open. The gateway should bypass rate limiting rather than blocking traffic, ensuring core routing remains available.
DDoS Attack	Massive spike in ingress traffic from suspicious IPs.	Drop traffic at the edge (CDN/WAF level). Use IP-based rate limiting plugins. Implement CAPTCHA challenges for suspect traffic.

Cost Drivers (Staff lens)

Compute instances (Gateway must be over-provisioned to absorb spikes)
Network Egress/Ingress bandwidth
Redis cluster for rate limiting and session management

Multi-Region & DR

API Gateways are heavily localized. You deploy a cluster of gateways in every region. A global load balancer (Anycast) routes the user to the nearest gateway. The Control Plane can be centralized or regionally replicated.

Interview Prompt

Clarifying Questions (ask before designing)

Scope

In scope

Out of scope (state explicitly)

Assumptions

1. The Data Plane (Asynchronous I/O)

2. Dynamic Configuration (Control Plane) ⭐

3. Distributed Rate Limiting

4. Authentication (Stateless JWT)

5. Circuit Breaker Pattern

API Gateway vs Service Mesh (Istio)

Phase 1: Simple Reverse Proxy

Phase 2: Dynamic Gateway

Phase 3: Extensible & Distributed

SLOs & Error Budgets

Incident Scenarios (2am reality)

Cost Drivers (Staff lens)

Multi-Region & DR