System Design Problem

Design a Code Hosting Platform (GitHub)

Commonly Asked By:GitHubGitLabAtlassianMicrosoft

  • Git Operations: Support git push, git pull, and git clone over SSH and HTTPS.
  • Web Interface: Users can browse the repository file tree, view commit history, and read file contents.
  • Pull Requests: Users can create PRs, diff code, and leave line-by-line comments.
  • Issues & Collaboration: Issue tracking, starring, and forking repositories.

GitHub separates its Web/API tier (which handles UI, PRs, Issues, and DB metadata) from its Git Storage Tier. We can structure this into three distinct layers:

Loading...
  • API / Gateway Layer: The HAProxy / Nginx load balancers and SSH entry nodes that terminate connections and route HTTP/SSH Git commands to the appropriate backend routers.
  • Service Layer: Includes the Web Fleet (Ruby on Rails apps handling the UI, Pull Requests, and Issues), the Git Router (maps repo URLs to internal storage nodes), and background CI/CD Worker fleets.
  • Data / Storage Layer: The most critical tier. It consists of PostgreSQL for relational metadata (users, PRs), Redis for caching and Sidekiq queues, and Gitaly (Git Storage Nodes) for highly-available, RPC-based storage of the actual Git commit DAGs.