SQL vs NoSQL Databases – System Design Core Concept

What:

Structured Relational databases (PostgreSQL, MySQL) mapping strict tables vs unstructured/semi-structured Non-Relational stores (Cassandra, DynamoDB, MongoDB).

Primary purpose:

Selecting the optimal data persistence paradigm based on transaction safety requirements, scalability limits, and query dynamics.

Usually used for:

Transactional records, distributed session storage, real-time message streams, and heavy full-text search indexes.

How should I think about this inside system architectures?

💍 Relational Normalization

Eliminate duplication. Normalize records into distinct primary tables and join them on-the-fly using foreign key references.

🌾 Denormalized Wide-Column

Duplicate data aggressively. In NoSQL (Cassandra), write your schemas exactly to match your read query patterns (one table per query).

🏗️ Polyglot Persistence

Never select 'one DB for everything'. Deploy PostgreSQL for financials, Redis for sessions, and Cassandra for telemetry history.

Storage Paradigm Classifications: Structuring persistence bounds:

Storage Style	Schema Model	Transaction Bound	Scaling Paradigm
Relational (SQL)	Strictly typed tabular schemas	Full ACID transaction support	Vertical scaling (rely on read replicas for reads)
Key-Value NoSQL	Schemaless key-value mappings	Atomic single-key updates	Horizontal scaling (hash partitioned arrays)
Wide-Column NoSQL	Column family groupings	Row-level atomicity	Horizontal masterless ring arrays

Benefit	Cost
Strict ACID Guarantees (relational consistency checks ensure zero double-spending or orphan references)	Horizontal Scaling Walls (splitting tables across multiple nodes requires distributed joints and 2PC coordination)
Scalable Write Concurrency (wide-column systems bypass locking completely using append-only LSM-trees, scaling linearly)	Limited Query Flexibility (unable to join tables on-the-fly; must pre-denormalize data schemas per read pattern)

Production System	Selected Engine	Architectural Rationale
Banking Ledger	PostgreSQL (SQL)	Financial updates demand strict ACID guarantees (Atomic money moves, Isolated records, and durable WAL locks).
Uber Driver Tracker	Cassandra / Redis (NoSQL)	High-velocity coordinate telemetry updates require high-write scaling, eventually matching eventual consistency.
Amazon Shopping Cart	DynamoDB (NoSQL)	Predictable latency, decentralized write availability, and absolute resilience override transaction relationships.

ACID vs BASE Paradigms

Databases represent two opposing design philosophies on the CAP theorem spectrum:

1. ACID (SQL Baseline)

Atomicity: Entire transaction succeeds or rolls back.
Consistency: Database transition matches all schema constraints.
Isolation: Concurrent transactions do not bleed into each other.
Durability: Committed transactions survive power crashes (via WAL log flush).

2. BASE (NoSQL Baseline)

Basically Available: The database prioritizes accepting write commands over immediate synchronization.
Soft State: Node replica values can drift over time without active lock enforcement.
Eventual Consistency: Replicas eventually sync to store identical states after update streams stop.

Select ACID when consistency is absolute; choose BASE when write availability is the scaling priority.

Access-Pattern-First Data Modeling

In SQL interviews, candidates often start with entity diagrams (User, Post, Comment) and hope JOINs will work at scale. In NoSQL and high-scale SQL designs, flip the process: **list every read and write query first**, then design tables around those access paths.

1. Start from Queries, Not Entities

Write down the hot paths before drawing schemas: "fetch user timeline by user_id sorted by created_at DESC", "lookup order by order_id", "list products in category X under $50". Each distinct query pattern may need its own physical storage layout — one normalized table cannot serve all of them efficiently at billion-row scale.

2. Denormalize for Read Paths

If a feed query joins Users + Posts + LikeCounts on every page load, precompute and embed like_count and author_name directly into the Post row (or a feed-specific table). You pay extra write amplification on each like, but eliminate a three-table JOIN on every read — the classic read-heavy trade-off.

3. Composite Keys for Wide-Column Stores

In Cassandra or DynamoDB, the partition key determines which node holds the data; the sort key orders rows within that partition. Design the composite key to match your query: (user_id, created_at) for a per-user timeline, (category_id, price) for category-sorted product browse. A query that does not include the partition key triggers a full cluster scan — a design failure, not a tuning problem.

# Timeline table — one query, one table
PRIMARY KEY ((user_id), created_at, post_id)
→ SELECT * FROM timeline WHERE user_id = ? ORDER BY created_at DESC LIMIT 20

# Separate table for the inverse query (posts by hashtag)
PRIMARY KEY ((hashtag), created_at, post_id)

4. When to Duplicate Data Across Tables

Different partition keys for the same entity: Store a user's posts under user_id for their profile and under hashtag for search — two write paths, zero JOINs at read time.
Read replicas of hot fields: Copy display_name into every comment row so rendering a thread never hits the Users table.
Materialized counters: Maintain a denormalized follower_count column updated asynchronously rather than COUNT(*) on every profile view.

Rule of thumb: duplicate when the read QPS on a JOIN path exceeds what a single node can serve, and the duplicated fields change infrequently or can tolerate brief staleness.