SQL essentials.
Spring Data JPA hides SQL behind methods like findByEmail. But the moment performance gets weird or a query returns wrong data, JPA dissolves and you are left looking at SQL. Today's job: SQL fluency you can defend in any interview.
The relational model — why SQL exists
In 1970, Edgar F. Codd at IBM published a paper proposing data should be stored as relations — tables with rows and columns, each row uniquely identifiable. The novel idea: queries should be declarative ("give me users older than 30") rather than procedural ("walk this list, check each one…"). The system figures out how.
SQL (Structured Query Language) is the standardized language built on this idea. Forty-five years later, it still dominates because the model is correct: most business data is relations.
A table is a relation. A row is a tuple of values. A column is a typed attribute. Every row has a primary key — something that uniquely identifies it. Tables relate to other tables via foreign keys. That's the whole model.
Tables, keys, and relationships
- Primary key (PK) — the unique identifier of a row. Almost always an auto-increment
idor a UUID. - Foreign key (FK) — a column that points to another table's PK, building the relationship.
- Composite key — a PK made of multiple columns together (rare but real).
The four cardinalities
| Relationship | Modeled as | Example |
|---|---|---|
| One-to-one | FK in either table (often unique) | User ↔ UserProfile |
| One-to-many | FK on the "many" side | User → Orders |
| Many-to-many | Junction table with two FKs | User ↔ Roles via user_roles |
| Self-referential | FK pointing to same table | Employee.manager_id → Employee.id |
The four operations · SELECT, INSERT, UPDATE, DELETE
-- SELECT — read
SELECT id, email, name FROM users
WHERE created_at > '2025-01-01'
ORDER BY created_at DESC
LIMIT 10;
-- INSERT — create
INSERT INTO users (email, name) VALUES ('a@b.com', 'Alice');
-- UPDATE — modify
UPDATE users SET name = 'Alicia' WHERE id = 1;
-- DELETE — remove
DELETE FROM users WHERE id = 1;An UPDATE or DELETE without a WHERE clause modifies every row in the table. There is no undo. Always test your WHERE first with a SELECT, especially in production.
Joins — the core skill
Joins combine rows from multiple tables based on a related column. Four kinds:
-- INNER JOIN — only matched rows
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id;
-- LEFT JOIN — all users, even those without orders (NULL on right)
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
-- A common report: count orders per user (including 0)
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;In real codebases you'll write INNER JOIN and LEFT JOIN daily. RIGHT JOIN is rare (just flip the order of tables and use LEFT). FULL JOIN is rarer still. Master the first two, recognize the others.
Aggregation · GROUP BY and HAVING
Aggregation collapses many rows into summary rows. The functions: COUNT, SUM, AVG, MIN, MAX.
-- Total revenue per user
SELECT user_id, SUM(total) AS revenue
FROM orders
GROUP BY user_id;
-- Only show users who spent more than $100
SELECT user_id, SUM(total) AS revenue
FROM orders
GROUP BY user_id
HAVING SUM(total) > 100;WHERE vs HAVING — the rule
- WHERE filters rows before aggregation
- HAVING filters groups after aggregation
Indexes — the performance lever
Without an index, the database scans every row to answer a query. With an index, it goes straight to matching rows. The cost is space + slower writes (each insert must also update the index).
When to add an index
- Columns you frequently
WHEREon (especially=,<,>,BETWEEN) - Columns you frequently
JOINon (often FKs) - Columns you frequently
ORDER BY
When NOT to add an index
- Tiny tables (the database scans them in microseconds anyway)
- Columns rarely queried
- Columns that change very frequently (write-heavy tables)
- Low-cardinality columns (e.g. boolean) — partial indexes are sometimes better
-- Add an index
CREATE INDEX idx_users_email ON users(email);
-- Composite index — order matters!
-- This helps WHERE user_id = ? AND created_at > ?
-- It does NOT help WHERE created_at > ? alone
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);A composite index (a, b, c) can be used by queries that filter on a, or a + b, or a + b + c — but not on b alone, or b + c. The leftmost prefix must always be present. This rule alone resolves 80% of "why isn't my index being used" debugging.
EXPLAIN — see what the database is doing
Every database can show you its query plan — the steps it'll take to answer your query. In Postgres:
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;The output shows whether it used an index ("Index Scan") or scanned everything ("Seq Scan") — and how long each step took. This is the single most important tool for diagnosing slow queries in production.
NULL — the surprising one
NULL means "unknown" — not zero, not empty string. It poisons comparisons:
SELECT * FROM users WHERE deleted_at = NULL; -- 💥 returns nothing, ever
SELECT * FROM users WHERE deleted_at IS NULL; -- ✓ correctSame with !=, <>, >, < — comparing anything to NULL yields NULL (treated as false). Always use IS NULL or IS NOT NULL.
The handful of patterns you'll write daily
-- Find duplicates
SELECT email, COUNT(*) FROM users
GROUP BY email HAVING COUNT(*) > 1;
-- Top N per group (window function — modern SQL)
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY total DESC) AS rn
FROM orders
) t WHERE rn <= 3; -- top 3 orders per user
-- Pagination
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 40;Lock in today's learning
Answer in your own words. The fuzzy ones get re-read.
- What's the difference between a primary key and a foreign key?
- When do you use INNER JOIN vs LEFT JOIN? Give a concrete example.
- What's the rule about the leftmost prefix in composite indexes?
- Difference between
WHEREandHAVING? - Why does
WHERE col = NULLnever match? - Why is adding an index not always a good idea?
End of Day 9. Tomorrow: SQL deeper — transactions, isolation levels, the four anomalies.