Day 09 · Week 2 · Web + Data

SQL essentials.

Spring Data JPA hides SQL behind methods like findByEmail. But the moment performance gets weird or a query returns wrong data, JPA dissolves and you are left looking at SQL. Today's job: SQL fluency you can defend in any interview.

~26 min readjoins, indexes coreinterview essential

You don't need to be a DBA. You need to read SQL, write joins without panic, recognize when an index will help, and explain a query plan well enough that an interviewer nods. That's today.

The relational model — why SQL exists

In 1970, Edgar F. Codd at IBM published a paper proposing data should be stored as relations — tables with rows and columns, each row uniquely identifiable. The novel idea: queries should be declarative ("give me users older than 30") rather than procedural ("walk this list, check each one…"). The system figures out how.

SQL (Structured Query Language) is the standardized language built on this idea. Forty-five years later, it still dominates because the model is correct: most business data is relations.

📌 The core mental model

A table is a relation. A row is a tuple of values. A column is a typed attribute. Every row has a primary key — something that uniquely identifies it. Tables relate to other tables via foreign keys. That's the whole model.

Tables, keys, and relationships

Two related tables. The user_id in orders is a foreign key referencing users.id.

Primary key (PK) — the unique identifier of a row. Almost always an auto-increment id or a UUID.
Foreign key (FK) — a column that points to another table's PK, building the relationship.
Composite key — a PK made of multiple columns together (rare but real).

The four cardinalities

Relationship	Modeled as	Example
One-to-one	FK in either table (often unique)	User ↔ UserProfile
One-to-many	FK on the "many" side	User → Orders
Many-to-many	Junction table with two FKs	User ↔ Roles via user_roles
Self-referential	FK pointing to same table	Employee.manager_id → Employee.id

The four operations · SELECT, INSERT, UPDATE, DELETE

-- SELECT — read
SELECT id, email, name FROM users
WHERE created_at > '2025-01-01'
ORDER BY created_at DESC
LIMIT 10;

-- INSERT — create
INSERT INTO users (email, name) VALUES ('a@b.com', 'Alice');

-- UPDATE — modify
UPDATE users SET name = 'Alicia' WHERE id = 1;

-- DELETE — remove
DELETE FROM users WHERE id = 1;

⚠️ The career-ending mistake

An UPDATE or DELETE without a WHERE clause modifies every row in the table. There is no undo. Always test your WHERE first with a SELECT, especially in production.

Joins — the core skill

Joins combine rows from multiple tables based on a related column. Four kinds:

The four common joins as Venn diagrams.

-- INNER JOIN — only matched rows
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id;

-- LEFT JOIN — all users, even those without orders (NULL on right)
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

-- A common report: count orders per user (including 0)
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

🧠 The 90% rule

In real codebases you'll write INNER JOIN and LEFT JOIN daily. RIGHT JOIN is rare (just flip the order of tables and use LEFT). FULL JOIN is rarer still. Master the first two, recognize the others.

Aggregation · GROUP BY and HAVING

Aggregation collapses many rows into summary rows. The functions: COUNT, SUM, AVG, MIN, MAX.

-- Total revenue per user
SELECT user_id, SUM(total) AS revenue
FROM orders
GROUP BY user_id;

-- Only show users who spent more than $100
SELECT user_id, SUM(total) AS revenue
FROM orders
GROUP BY user_id
HAVING SUM(total) > 100;

WHERE vs HAVING — the rule

WHERE filters rows before aggregation
HAVING filters groups after aggregation

Indexes — the performance lever

Without an index, the database scans every row to answer a query. With an index, it goes straight to matching rows. The cost is space + slower writes (each insert must also update the index).

A B-tree index. Queries on indexed columns descend the tree in O(log n).

When to add an index

Columns you frequently WHERE on (especially =, <, >, BETWEEN)
Columns you frequently JOIN on (often FKs)
Columns you frequently ORDER BY

When NOT to add an index

Tiny tables (the database scans them in microseconds anyway)
Columns rarely queried
Columns that change very frequently (write-heavy tables)
Low-cardinality columns (e.g. boolean) — partial indexes are sometimes better

-- Add an index
CREATE INDEX idx_users_email ON users(email);

-- Composite index — order matters!
-- This helps WHERE user_id = ? AND created_at > ?
-- It does NOT help WHERE created_at > ? alone
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

📌 The composite-index rule

A composite index (a, b, c) can be used by queries that filter on a, or a + b, or a + b + c — but not on b alone, or b + c. The leftmost prefix must always be present. This rule alone resolves 80% of "why isn't my index being used" debugging.

EXPLAIN — see what the database is doing

Every database can show you its query plan — the steps it'll take to answer your query. In Postgres:

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;

The output shows whether it used an index ("Index Scan") or scanned everything ("Seq Scan") — and how long each step took. This is the single most important tool for diagnosing slow queries in production.

NULL — the surprising one

NULL means "unknown" — not zero, not empty string. It poisons comparisons:

SELECT * FROM users WHERE deleted_at = NULL; -- 💥 returns nothing, ever
SELECT * FROM users WHERE deleted_at IS NULL; -- ✓ correct

Same with !=, <>, >, < — comparing anything to NULL yields NULL (treated as false). Always use IS NULL or IS NOT NULL.

The handful of patterns you'll write daily

-- Find duplicates
SELECT email, COUNT(*) FROM users
GROUP BY email HAVING COUNT(*) > 1;

-- Top N per group (window function — modern SQL)
SELECT * FROM (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC) AS rn
  FROM orders
) t WHERE rn <= 3; -- top 3 orders per user

-- Pagination
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 40;

⏸ Pause & reflect

Lock in today's learning

Answer in your own words. The fuzzy ones get re-read.

What's the difference between a primary key and a foreign key?
When do you use INNER JOIN vs LEFT JOIN? Give a concrete example.
What's the rule about the leftmost prefix in composite indexes?
Difference between WHERE and HAVING?
Why does WHERE col = NULL never match?
Why is adding an index not always a good idea?

End of Day 9. Tomorrow: SQL deeper — transactions, isolation levels, the four anomalies.

← previousDay 8 · HTTP, deeply tomorrow →Day 10 · SQL deeper