Learning SQL (Again)

Published on July 8, 2025

I use SQL for 16 years now. It is such a central tool, that it is useful to learn it again, fix patches in knowledge and habits and explore new ways of doing thing. I use this as an opportunity to experiment with a LLM in in human+machine curriculum and content creation. The resulting artifact will be a new deck in my Anki Decs.

This is still a work in progress. The table of content defines the journey. I first want to bring together everything I want to learn. The content is generated by Gemini 2.5 Pro and driven by me. LLMs are databases and I query them.

Part 1: Core Operations & Database Fundamentals

This part covers the foundational elements of interacting with and building a PostgreSQL database, focusing on advanced patterns and performance implications relevant to a senior developer.

Chapter 1: Advanced Data Retrieval
- 1.1. Using Common Table Expressions (WITH) for Readability and Recursion
- 1.2. Advanced Filtering: EXISTS, ANY, ALL, and Subquery Strategies
- 1.3. Sophisticated Sorting and Pagination (NULLS FIRST/LAST, OFFSET-FETCH)
- 1.4. Set Operations: UNION, INTERSECT, and EXCEPT for Complex Selections
Chapter 2: Mastering Functions and Expressions
- 2.1. Leveraging the Built-in Function Library (String, Numeric, Date/Time)
- 2.2. Conditional Logic with CASE, COALESCE, and FILTER
- 2.3. Creating Custom Functions (SQL and PL/pgSQL)
- 2.4. Understanding Function Volatility (VOLATILE, STABLE, IMMUTABLE) and Its Performance Impact
Chapter 3: Sophisticated Data Combination
- 3.1. A-Z of Join Types (INNER, LEFT, RIGHT, FULL OUTER, CROSS)
- 3.2. Advanced Joins: LATERAL Joins for Complex Per-Row Calculations
- 3.3. Understanding Join Algorithms (Nested Loop, Hash, Merge) and Forcing Plans
- 3.4. Self-Joins for Hierarchical and Sequential Data Analysis
Chapter 4: Advanced Aggregation and Window Functions
- 4.1. Beyond GROUP BY: Using GROUPING SETS, CUBE, and ROLLUP
- 4.2. Introduction to Window Functions for Analytic Queries
- 4.3. Advanced Window Functions: Framing, Exclusion Clauses, and Specialized Functions (LAG, LEAD, NTILE)
- 4.4. Combining Window Functions with Conditional Aggregation
Chapter 5: High-Performance Data Manipulation
- 5.1. INSERT, UPDATE, and DELETE Best Practices
- 5.2. Using the RETURNING Clause to Get Data Back from Modifications
- 5.3. Efficient UPSERT Operations with INSERT ... ON CONFLICT
- 5.4. Bulk Data Modification Using CTEs and Joins

Part 2: Database Architecture and Design

This section focuses on the non-procedural aspects of database management: creating robust structures, ensuring performance through indexing, and using abstractions.

Chapter 6: Designing a Robust Database Structure
- 6.1. Choosing the Right Data Types (and when to use custom DOMAINs)
- 6.2. Enforcing Data Integrity with Advanced Constraints (CHECK, DEFERRABLE)
- 6.3. Table Partitioning Strategies (Range, List, Hash) for Very Large Tables
- 6.4. Using Schemas and Search Paths to Organize a Large Project
Chapter 7: Indexing for Peak Performance
- 7.1. Indexing Fundamentals: B-Tree and When to Index
- 7.2. Deep Dive into Index Types: Hash, GIN, GiST, BRIN, and SP-GiST
- 7.3. Advanced Indexing: Partial, Expression-based, and Covering Indexes
- 7.4. Maintaining Indexes: REINDEX, VACUUM, and Monitoring Index Usage/Bloat
Chapter 8: Views and Data Abstraction
- 8.1. Simplifying Complex Queries with Standard Views
- 8.2. Performance Boosting with Materialized Views
- 8.3. Strategies for Managing Materialized View Refreshes
- 8.4. Creating Updatable Views with INSTEAD OF Triggers
Chapter 9: Querying and Using Database Metadata
- 9.1. Using information_schema for Standard SQL Metadata
- 9.2. Accessing PostgreSQL’s System Catalogs (pg_catalog) for Deeper Insights
- 9.3. Practical Recipes: Finding Dependencies, Table Sizes, and Unused Indexes
- 9.4. Real-time Diagnostics with System Views (pg_stat_activity, pg_locks)

Part 3: Advanced Features and Performance Tuning

Here, we dive into specialized and powerful PostgreSQL features that are critical for modern application development and performance optimization.

Chapter 10: Working with Semi-Structured Data: JSONB
- 10.1. Storing, Querying, and Modifying JSONB Data
- 10.2. Mastering JSONB Operators and Functions
- 10.3. Indexing JSONB for High Performance (GIN with jsonb_path_ops)
- 10.4. Best Practices for JSONB Schema Design and Validation
Chapter 11: Managing Transactions and Concurrency
- 11.1. The ACID Properties in Practice
- 11.2. Deep Dive into Transaction Isolation Levels (READ COMMITTED, REPEATABLE READ, SERIALIZABLE)
- 11.3. Explicit Locking: Advisory Locks vs. Row/Table Locks (SELECT ... FOR UPDATE)
- 11.4. Debugging and Preventing Deadlocks
Chapter 12: Debugging and Performance Tuning
- 12.1. Reading and Understanding EXPLAIN ANALYZE Plans
- 12.2. Identifying and Fixing Common Query Anti-Patterns
- 12.3. Using pg_stat_statements to Find and Optimize Costly Queries
- 12.4. Key Server Configuration Tuning for Developers (work_mem, shared_buffers, JIT)

Part 4: Data Analysis from Start to Finish

This part provides a project-based guide to common data analysis patterns, from initial cleaning to advanced modeling, all within PostgreSQL.

Chapter 13: Preparing Data for Analysis
- 13.1. Data Cleaning: Handling Duplicates, Whitespace, and Errors
- 13.2. Imputing and Handling NULL Values Strategically
- 13.3. Standardizing and Normalizing Data for Modeling
- 13.4. Binning and Discretizing Continuous Variables
Chapter 14: Creating Complex Datasets for Analysis
- 14.1. Pivoting Data: From Rows to Columns using crosstab and CASE
- 14.2. Creating Rollup and Cube Tables for BI Dashboards
- 14.3. Assembling Analysis-Specific Tables from Multiple Sources
- 14.4. Sessionization: Grouping User Events into Sessions with Window Functions
Chapter 15: Time-Series Analysis
- 15.1. Generating Complete Date and Time Series with generate_series
- 15.2. Calculating Running Totals and Moving Averages
- 15.3. Comparing Data Across Time Periods (WoW, MoM, YoY)
- 15.4. Filling Gaps and Interpolating in Time-Series Data
Chapter 16: User Behavior and Cohort Analysis
- 16.1. Defining and Creating User Cohorts
- 16.2. Building Retention and Churn Tables
- 16.3. Funnel Analysis: Tracking Users Through Conversion Steps
- 16.4. Analyzing Behavioral Changes Across Cohorts
Chapter 17: Advanced Text Analysis
- 17.1. Advanced Pattern Matching and Extraction with Regular Expressions
- 17.2. Implementing Full-Text Search for Analysis
- 17.3. Fuzzy String Matching with pg_trgm and Levenshtein
- 17.4. Parsing Unstructured Text into Structured Data
Chapter 18: Anomaly and Outlier Detection
- 18.1. Identifying Outliers Using Standard Deviations and Z-Scores
- 18.2. Using Percentiles and Interquartile Ranges for Robust Detection
- 18.3. Finding Anomalies in Time-Series Data
- 18.4. Profiling Data to Investigate Distributions
Chapter 19: Experiment Analysis
- 19.1. Structuring Data for A/B and Multivariate Test Analysis
- 19.2. Calculating Key Metrics (Conversion Rates, Averages) per Group
- 19.3. Checking for Statistical Significance within SQL

Part 5: Data Engineering and Extensibility

The final part covers topics for building robust data pipelines and extending PostgreSQL’s core functionality.

Chapter 20: Data Engineering Pipelines
- 20.1. Understanding Logical Replication vs. Trigger-Based CDC
- 20.2. Setting Up Publications and Subscriptions for Change Data Capture
- 20.3. Using Foreign Data Wrappers (FDW) to Integrate External Data
- 20.4. Monitoring and Managing Replication Slots and FDW connections
Chapter 21: Leveraging the Extension Ecosystem
- 21.1. Finding, Installing, and Managing Extensions
- 21.2. Geospatial Data at Scale with PostGIS
- 21.3. High-Performance Time-Series Data with TimescaleDB
- 21.4. Other Key Extensions: pg_cron, pg_stat_monitor, pgvector for AI
Chapter 22: Procedural Languages and Extensibility
- 22.1. Beyond PL/pgSQL: Using PL/Python and PL/V8
- 22.2. Setting Up the Build Environment (PGXS) for C Extensions
- 22.3. Creating a Custom Function in C
- 22.4. Packaging and Distributing a Custom Extension

Part 7: Advanced Topics

This section covers critical topics for operating data-intensive applications at scale and explores next-generation analytical techniques available directly within PostgreSQL.

Chapter 23: Managing the Application-Database Boundary
- 23.1. Understanding Connection Overhead and Management
- 23.2. External Connection Pooling with pgBouncer (Transaction vs. Session Mode)
- 23.3. Interacting with ORMs: Debugging Generated Queries and Knowing When to Use Raw SQL
- 23.4. Asynchronous Task Queues in the Database
Chapter 24: Security Beyond the Basics
- 24.1. Implementing Row-Level Security (RLS) for Multi-Tenant Systems
- 24.2. Using Column-Level Privileges for Fine-Grained Access Control
- 24.3. Data Masking and Anonymization Techniques with Views and Functions
- 24.4. Best Practices for Application Service Roles vs. Administrative Roles
Chapter 25: Event-Driven Architectures
- 25.1. Real-time In-App Notifications with LISTEN/NOTIFY
- 25.2. Building Robust, Decoupled Audit Trails with Triggers
- 25.3. Comparing LISTEN/NOTIFY with Logical Replication (CDC) for Event Sourcing
- 25.4. Preventing Common Pitfalls in Event-Driven Database Design
Chapter 26: Schema Evolution & Migrations
- 26.1. Principles of Zero-Downtime Migration
- 26.2. Using Migration Tools: Flyway, Liquibase, and Alembic
- 26.3. Strategies for Non-Blocking Changes (e.g., adding columns, changing types)
- 26.4. Understanding and Managing Transactional DDL (Data Definition Language)
Chapter 27: High Availability and Disaster Recovery Concepts
- 27.1. Understanding Streaming Replication (Physical vs. Logical)
- 27.2. The Basics of Failover, Switchover, and Read Replicas
- 27.3. Point-in-Time Recovery (PITR): Your Ultimate Safety Net
- 27.4. Key Considerations for High Availability in the Cloud (e.g., RDS, Cloud SQL)
Chapter 28: Data Ingestion at Scale
- 28.1. Mastering the COPY Command for Maximum Bulk Load Performance
- 28.2. Performance Tuning for Ingestion (e.g., Disabling Constraints/Indexes)
- 28.3. Using UNLOGGED Tables for High-Throughput Transient Staging
- 28.4. Parallel Data Ingestion Patterns
Chapter 29: Testing in the Database
- 29.1. Introduction to Database Unit Testing with pgTAP
- 29.2. Writing Tests for Functions, Views, and Security Rules
- 29.3. Integrating Database Tests into a CI/CD Pipeline
- 29.4. Data Validation and Quality Assurance Strategies
Chapter 30: Next-Generation Analytics in PostgreSQL
- 30.1. Graph Analysis: Advanced Recursive CTEs for Network Traversal and Pathfinding
- 30.2. Probabilistic Structures: Using HyperLogLog and Bloom Filters for Large-Scale Estimation
- 30.3. In-Database ML: Introduction to PostgresML for In-Database Model Training
- 30.4. In-Database ML: Using PL/Python with scikit-learn for Custom Analytics

Part 6: Just Use Postgres

This part explores how to leverage PostgreSQL’s advanced, built-in features to reduce architectural complexity. Each chapter provides a practical guide to replacing a specialized, external system with a robust, transactionally-consistent solution inside the database you already have.

Chapter 31: Postgres as a High-Performance Job Queue
- 31.1. Designing a Transactional jobs Table
- 31.2. The SELECT ... FOR UPDATE SKIP LOCKED Pattern for Concurrent Workers
- 31.3. Implementing Job Priorities and Scheduled Execution with pg_cron
- 31.4. Strategies for Handling Failed Jobs, Retries, and Dead-Letter Queues
Chapter 32: Postgres as a Resilient Cache
- 32.1. Simple Key-Value Caching and Expiration Policies
- 32.2. High-Throughput Caching with UNLOGGED Tables
- 32.3. Automated Query Caching with Materialized Views
- 32.4. Transactional Cache Invalidation Strategies
Chapter 33: Postgres as a Real-time Pub/Sub Broker
- 33.1. Asynchronous Messaging with LISTEN and NOTIFY
- 33.2. Designing Effective Notification Channels and Payloads
- 33.3. Integrating with Application Backends for Live Updates
- 33.4. Understanding Limitations and Use Cases vs. Kafka or RabbitMQ
Chapter 34: Postgres as a Full-Text Search Engine
- 34.1. Mastering tsvector for Document Indexing and tsquery for Searching
- 34.2. Building Performant Search Indexes with GIN
- 34.4. Combining Full-Text Search with Fuzzy String Matching (pg_trgm)
Chapter 35: Postgres as an Analytics Data Warehouse
- 35.1. Structuring Tables for Analytical Workloads (Partitioning)
- 35.2. Leveraging BRIN Indexes for Large, Sequentially Ordered Data
- 35.3. Forcing and Analyzing Parallel Query Execution Plans
- 35.4. Patterns for Hybrid Transactional/Analytical Processing (HTAP)
Chapter 36: Postgres as a NoSQL Document Database
- 36.1. Designing Schemas with the JSONB Data Type
- 36.2. Advanced Querying with JSONB Operators and Path Expressions
- 36.3. High-Performance Indexing of Document Data with GIN
- 36.4. Combining Relational and Document Models in a Hybrid Approach
Chapter 37: Postgres as a Time-Series Database
- 37.1. Native Time-Series: Partitioning by Time Range and Using BRIN Indexes
- 37.2. Analytical Queries with Window Functions for Time-Based Analysis
- 37.3. Supercharging Time-Series with the TimescaleDB Extension
- 37.4. Implementing Data Retention Policies and Continuous Aggregates
Chapter 38: Postgres as a Key-Value Store
- 38.1. Simple Schemas for Key-Value, Configuration, and Feature Flags
- 38.2. Leveraging B-Tree Indexes for Fast Point Lookups
- 38.3. Using the HSTORE Extension for a Dedicated Key-Value Type
- 38.4. When to Use a Table vs. JSONB for Key-Value Data
Chapter 39: Postgres as a Geospatial Database with PostGIS
- 39.1. Core Concepts: The GEOMETRY vs. GEOGRAPHY Types
- 39.2. Spatial Indexing for Lightning-Fast Proximity Queries (GiST)
- 39.3. Common Geospatial Analysis Patterns (Nearest Neighbor, Overlays, Buffering)
- 39.4. Integrating Geospatial Data with Business Analytics
Chapter 40: Postgres as a Graph Database
- 40.1. Modeling Graph Relationships (Nodes and Edges) in Relational Tables
- 40.2. Traversing Networks with Recursive Common Table Expressions (WITH RECURSIVE)
- 40.3. Solving Common Problems like Pathfinding and Hierarchical Queries
- 40.4. When to Consider Graph-Specific Extensions like apache-age
Chapter 41: Postgres as a Cron Scheduler
- 41.1. Introduction to the pg_cron Extension
- 41.2. Scheduling Periodic Jobs (Cleanups, Report Generation) Directly in SQL
- 41.3. Monitoring Job History and Failures
- 41.4. Comparing pg_cron with System-Level Cron and Application-Side Schedulers
Chapter 42: Postgres as a State Machine Engine
- 42.1. Modeling States and Valid Transitions with CHECK Constraints
- 42.2. Using Triggers to Enforce Complex Business Rules on State Changes
- 42.3. Creating an Immutable Audit Log for State Transitions
- 42.4. Guaranteeing State Consistency with ACID Transactions
Chapter 43: Postgres as a Feature Flag & Configuration Service
- 43.1. Designing a Centralized Table for Feature Flags and App Configuration
- 43.2. Implementing Percentage-Based Rollouts and User-Specific Targeting
- 43.3. Reloading Configuration in Your Application Without a Restart
- 43.4. Versioning and Auditing Configuration Changes
Chapter 44: Postgres as a Rate Limiter
- 44.1. Implementing the Token Bucket Algorithm in SQL
- 44.2. Using Window Functions to Count Events in a Sliding Time Window
- 44.3. The Power of Transactional Counters for Atomic Operations
- 44.4. Trade-offs: Consistency and Durability vs. an In-Memory Solution like Redis
Chapter 45: Postgres as a Distributed Lock Manager
- 45.1. Introduction to Advisory Locks (Session-level vs. Transaction-level)
- 45.2. Ensuring Exclusive Execution of a Task Across Multiple Servers
- 45.3. Preventing Race Conditions and the “Thundering Herd” Problem
- 45.4. Practical Use Cases: Singleton jobs, leader election, and resource locking

Table of Contents