Day 09 · Week 2 · Web + Data

SQL essentials.

Spring Data JPA hides SQL behind methods like findByEmail. But the moment performance gets weird or a query returns wrong data, JPA dissolves and you are left looking at SQL. Today's job: SQL fluency you can defend in any interview.

~26 min readjoins, indexes coreinterview essential
You don't need to be a DBA. You need to read SQL, write joins without panic, recognize when an index will help, and explain a query plan well enough that an interviewer nods. That's today.

The relational model — why SQL exists

In 1970, Edgar F. Codd at IBM published a paper proposing data should be stored as relations — tables with rows and columns, each row uniquely identifiable. The novel idea: queries should be declarative ("give me users older than 30") rather than procedural ("walk this list, check each one…"). The system figures out how.

SQL (Structured Query Language) is the standardized language built on this idea. Forty-five years later, it still dominates because the model is correct: most business data is relations.

📌 The core mental model

A table is a relation. A row is a tuple of values. A column is a typed attribute. Every row has a primary key — something that uniquely identifies it. Tables relate to other tables via foreign keys. That's the whole model.

Tables, keys, and relationships

usersid (PK)emailnamecreated_at1 · alice@... · Alice · ...2 · bob@... · Bob · ...3 · carol@... · Carol · ...ordersid (PK)user_id (FK)totalcreated_at10 · 1 · 89.00 · ...11 · 1 · 12.50 · ...12 · 2 · 45.00 · ...references
Two related tables. The user_id in orders is a foreign key referencing users.id.

The four cardinalities

RelationshipModeled asExample
One-to-oneFK in either table (often unique)User ↔ UserProfile
One-to-manyFK on the "many" sideUser → Orders
Many-to-manyJunction table with two FKsUser ↔ Roles via user_roles
Self-referentialFK pointing to same tableEmployee.manager_id → Employee.id

The four operations · SELECT, INSERT, UPDATE, DELETE

-- SELECT — read
SELECT id, email, name FROM users
WHERE created_at > '2025-01-01'
ORDER BY created_at DESC
LIMIT 10;

-- INSERT — create
INSERT INTO users (email, name) VALUES ('a@b.com', 'Alice');

-- UPDATE — modify
UPDATE users SET name = 'Alicia' WHERE id = 1;

-- DELETE — remove
DELETE FROM users WHERE id = 1;
⚠️ The career-ending mistake

An UPDATE or DELETE without a WHERE clause modifies every row in the table. There is no undo. Always test your WHERE first with a SELECT, especially in production.

Joins — the core skill

Joins combine rows from multiple tables based on a related column. Four kinds:

INNER JOINrows in bothLEFT JOINall from leftRIGHT JOINall from rightFULL JOINall from both
The four common joins as Venn diagrams.
-- INNER JOIN — only matched rows
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id;

-- LEFT JOIN — all users, even those without orders (NULL on right)
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;

-- A common report: count orders per user (including 0)
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
🧠 The 90% rule

In real codebases you'll write INNER JOIN and LEFT JOIN daily. RIGHT JOIN is rare (just flip the order of tables and use LEFT). FULL JOIN is rarer still. Master the first two, recognize the others.

Aggregation · GROUP BY and HAVING

Aggregation collapses many rows into summary rows. The functions: COUNT, SUM, AVG, MIN, MAX.

-- Total revenue per user
SELECT user_id, SUM(total) AS revenue
FROM orders
GROUP BY user_id;

-- Only show users who spent more than $100
SELECT user_id, SUM(total) AS revenue
FROM orders
GROUP BY user_id
HAVING SUM(total) > 100;

WHERE vs HAVING — the rule

Indexes — the performance lever

Without an index, the database scans every row to answer a query. With an index, it goes straight to matching rows. The cost is space + slower writes (each insert must also update the index).

MD · HP · TA→CD→GH→LM→OP→ST→VW→Zlooking up "K"? root → "M" left, "D · H" → "H→L" leaf, scan within.
A B-tree index. Queries on indexed columns descend the tree in O(log n).

When to add an index

When NOT to add an index

-- Add an index
CREATE INDEX idx_users_email ON users(email);

-- Composite index — order matters!
-- This helps WHERE user_id = ? AND created_at > ?
-- It does NOT help WHERE created_at > ? alone
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);
📌 The composite-index rule

A composite index (a, b, c) can be used by queries that filter on a, or a + b, or a + b + cbut not on b alone, or b + c. The leftmost prefix must always be present. This rule alone resolves 80% of "why isn't my index being used" debugging.

EXPLAIN — see what the database is doing

Every database can show you its query plan — the steps it'll take to answer your query. In Postgres:

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;

The output shows whether it used an index ("Index Scan") or scanned everything ("Seq Scan") — and how long each step took. This is the single most important tool for diagnosing slow queries in production.

NULL — the surprising one

NULL means "unknown" — not zero, not empty string. It poisons comparisons:

SELECT * FROM users WHERE deleted_at = NULL; -- 💥 returns nothing, ever
SELECT * FROM users WHERE deleted_at IS NULL; -- ✓ correct

Same with !=, <>, >, < — comparing anything to NULL yields NULL (treated as false). Always use IS NULL or IS NOT NULL.

The handful of patterns you'll write daily

-- Find duplicates
SELECT email, COUNT(*) FROM users
GROUP BY email HAVING COUNT(*) > 1;

-- Top N per group (window function — modern SQL)
SELECT * FROM (
  SELECT *,
         ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC) AS rn
  FROM orders
) t WHERE rn <= 3; -- top 3 orders per user

-- Pagination
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 40;
Pause & reflect

Lock in today's learning

Answer in your own words. The fuzzy ones get re-read.

  1. What's the difference between a primary key and a foreign key?
  2. When do you use INNER JOIN vs LEFT JOIN? Give a concrete example.
  3. What's the rule about the leftmost prefix in composite indexes?
  4. Difference between WHERE and HAVING?
  5. Why does WHERE col = NULL never match?
  6. Why is adding an index not always a good idea?

End of Day 9. Tomorrow: SQL deeper — transactions, isolation levels, the four anomalies.