Learning SQL (Again)
Published on
I use SQL for 16 years now. It is such a central tool, that it is useful to learn it again, fix patches in knowledge and habits and explore new ways of doing thing. I use this as an opportunity to experiment with a LLM in in human+machine curriculum and content creation. The resulting artifact will be a new deck in my Anki Decs.
Table of Contents
This is still a work in progress. The table of content defines the journey. I first want to bring together everything I want to learn. The content is generated by Gemini 2.5 Pro and driven by me. LLMs are databases and I query them.
Part 1: Core Operations & Database Fundamentals
This part covers the foundational elements of interacting with and building a PostgreSQL database, focusing on advanced patterns and performance implications relevant to a senior developer.
- Chapter 1: Advanced Data Retrieval
- 1.1. Using Common Table Expressions (
WITH) for Readability and Recursion - 1.2. Advanced Filtering:
EXISTS,ANY,ALL, and Subquery Strategies - 1.3. Sophisticated Sorting and Pagination (
NULLS FIRST/LAST,OFFSET-FETCH) - 1.4. Set Operations:
UNION,INTERSECT, andEXCEPTfor Complex Selections
- 1.1. Using Common Table Expressions (
- Chapter 2: Mastering Functions and Expressions
- 2.1. Leveraging the Built-in Function Library (String, Numeric, Date/Time)
- 2.2. Conditional Logic with
CASE,COALESCE, andFILTER - 2.3. Creating Custom Functions (SQL and PL/pgSQL)
- 2.4. Understanding Function Volatility (
VOLATILE,STABLE,IMMUTABLE) and Its Performance Impact
- Chapter 3: Sophisticated Data Combination
- 3.1. A-Z of Join Types (
INNER,LEFT,RIGHT,FULL OUTER,CROSS) - 3.2. Advanced Joins:
LATERALJoins for Complex Per-Row Calculations - 3.3. Understanding Join Algorithms (Nested Loop, Hash, Merge) and Forcing Plans
- 3.4. Self-Joins for Hierarchical and Sequential Data Analysis
- 3.1. A-Z of Join Types (
- Chapter 4: Advanced Aggregation and Window Functions
- 4.1. Beyond
GROUP BY: UsingGROUPING SETS,CUBE, andROLLUP - 4.2. Introduction to Window Functions for Analytic Queries
- 4.3. Advanced Window Functions: Framing, Exclusion Clauses, and Specialized Functions (
LAG,LEAD,NTILE) - 4.4. Combining Window Functions with Conditional Aggregation
- 4.1. Beyond
- Chapter 5: High-Performance Data Manipulation
- 5.1.
INSERT,UPDATE, andDELETEBest Practices - 5.2. Using the
RETURNINGClause to Get Data Back from Modifications - 5.3. Efficient
UPSERTOperations withINSERT ... ON CONFLICT - 5.4. Bulk Data Modification Using CTEs and Joins
- 5.1.
Part 2: Database Architecture and Design
This section focuses on the non-procedural aspects of database management: creating robust structures, ensuring performance through indexing, and using abstractions.
- Chapter 6: Designing a Robust Database Structure
- 6.1. Choosing the Right Data Types (and when to use custom
DOMAINs) - 6.2. Enforcing Data Integrity with Advanced Constraints (
CHECK,DEFERRABLE) - 6.3. Table Partitioning Strategies (Range, List, Hash) for Very Large Tables
- 6.4. Using Schemas and Search Paths to Organize a Large Project
- 6.1. Choosing the Right Data Types (and when to use custom
- Chapter 7: Indexing for Peak Performance
- 7.1. Indexing Fundamentals: B-Tree and When to Index
- 7.2. Deep Dive into Index Types:
Hash,GIN,GiST,BRIN, andSP-GiST - 7.3. Advanced Indexing: Partial, Expression-based, and Covering Indexes
- 7.4. Maintaining Indexes:
REINDEX,VACUUM, and Monitoring Index Usage/Bloat
- Chapter 8: Views and Data Abstraction
- 8.1. Simplifying Complex Queries with Standard Views
- 8.2. Performance Boosting with Materialized Views
- 8.3. Strategies for Managing Materialized View Refreshes
- 8.4. Creating Updatable Views with
INSTEAD OFTriggers
- Chapter 9: Querying and Using Database Metadata
- 9.1. Using
information_schemafor Standard SQL Metadata - 9.2. Accessing PostgreSQL’s System Catalogs (
pg_catalog) for Deeper Insights - 9.3. Practical Recipes: Finding Dependencies, Table Sizes, and Unused Indexes
- 9.4. Real-time Diagnostics with System Views (
pg_stat_activity,pg_locks)
- 9.1. Using
Part 3: Advanced Features and Performance Tuning
Here, we dive into specialized and powerful PostgreSQL features that are critical for modern application development and performance optimization.
- Chapter 10: Working with Semi-Structured Data:
JSONB- 10.1. Storing, Querying, and Modifying
JSONBData - 10.2. Mastering
JSONBOperators and Functions - 10.3. Indexing
JSONBfor High Performance (GINwithjsonb_path_ops) - 10.4. Best Practices for
JSONBSchema Design and Validation
- 10.1. Storing, Querying, and Modifying
- Chapter 11: Managing Transactions and Concurrency
- 11.1. The ACID Properties in Practice
- 11.2. Deep Dive into Transaction Isolation Levels (
READ COMMITTED,REPEATABLE READ,SERIALIZABLE) - 11.3. Explicit Locking: Advisory Locks vs. Row/Table Locks (
SELECT ... FOR UPDATE) - 11.4. Debugging and Preventing Deadlocks
- Chapter 12: Debugging and Performance Tuning
- 12.1. Reading and Understanding
EXPLAIN ANALYZEPlans - 12.2. Identifying and Fixing Common Query Anti-Patterns
- 12.3. Using
pg_stat_statementsto Find and Optimize Costly Queries - 12.4. Key Server Configuration Tuning for Developers (
work_mem,shared_buffers, JIT)
- 12.1. Reading and Understanding
Part 4: Data Analysis from Start to Finish
This part provides a project-based guide to common data analysis patterns, from initial cleaning to advanced modeling, all within PostgreSQL.
- Chapter 13: Preparing Data for Analysis
- 13.1. Data Cleaning: Handling Duplicates, Whitespace, and Errors
- 13.2. Imputing and Handling
NULLValues Strategically - 13.3. Standardizing and Normalizing Data for Modeling
- 13.4. Binning and Discretizing Continuous Variables
- Chapter 14: Creating Complex Datasets for Analysis
- 14.1. Pivoting Data: From Rows to Columns using
crosstabandCASE - 14.2. Creating Rollup and Cube Tables for BI Dashboards
- 14.3. Assembling Analysis-Specific Tables from Multiple Sources
- 14.4. Sessionization: Grouping User Events into Sessions with Window Functions
- 14.1. Pivoting Data: From Rows to Columns using
- Chapter 15: Time-Series Analysis
- 15.1. Generating Complete Date and Time Series with
generate_series - 15.2. Calculating Running Totals and Moving Averages
- 15.3. Comparing Data Across Time Periods (WoW, MoM, YoY)
- 15.4. Filling Gaps and Interpolating in Time-Series Data
- 15.1. Generating Complete Date and Time Series with
- Chapter 16: User Behavior and Cohort Analysis
- 16.1. Defining and Creating User Cohorts
- 16.2. Building Retention and Churn Tables
- 16.3. Funnel Analysis: Tracking Users Through Conversion Steps
- 16.4. Analyzing Behavioral Changes Across Cohorts
- Chapter 17: Advanced Text Analysis
- 17.1. Advanced Pattern Matching and Extraction with Regular Expressions
- 17.2. Implementing Full-Text Search for Analysis
- 17.3. Fuzzy String Matching with
pg_trgmand Levenshtein - 17.4. Parsing Unstructured Text into Structured Data
- Chapter 18: Anomaly and Outlier Detection
- 18.1. Identifying Outliers Using Standard Deviations and Z-Scores
- 18.2. Using Percentiles and Interquartile Ranges for Robust Detection
- 18.3. Finding Anomalies in Time-Series Data
- 18.4. Profiling Data to Investigate Distributions
- Chapter 19: Experiment Analysis
- 19.1. Structuring Data for A/B and Multivariate Test Analysis
- 19.2. Calculating Key Metrics (Conversion Rates, Averages) per Group
- 19.3. Checking for Statistical Significance within SQL
Part 5: Data Engineering and Extensibility
The final part covers topics for building robust data pipelines and extending PostgreSQL’s core functionality.
- Chapter 20: Data Engineering Pipelines
- 20.1. Understanding Logical Replication vs. Trigger-Based CDC
- 20.2. Setting Up Publications and Subscriptions for Change Data Capture
- 20.3. Using Foreign Data Wrappers (
FDW) to Integrate External Data - 20.4. Monitoring and Managing Replication Slots and FDW connections
- Chapter 21: Leveraging the Extension Ecosystem
- 21.1. Finding, Installing, and Managing Extensions
- 21.2. Geospatial Data at Scale with PostGIS
- 21.3. High-Performance Time-Series Data with TimescaleDB
- 21.4. Other Key Extensions:
pg_cron,pg_stat_monitor,pgvectorfor AI
- Chapter 22: Procedural Languages and Extensibility
- 22.1. Beyond PL/pgSQL: Using PL/Python and PL/V8
- 22.2. Setting Up the Build Environment (PGXS) for C Extensions
- 22.3. Creating a Custom Function in C
- 22.4. Packaging and Distributing a Custom Extension
Part 7: Advanced Topics
This section covers critical topics for operating data-intensive applications at scale and explores next-generation analytical techniques available directly within PostgreSQL.
- Chapter 23: Managing the Application-Database Boundary
- 23.1. Understanding Connection Overhead and Management
- 23.2. External Connection Pooling with
pgBouncer(Transaction vs. Session Mode) - 23.3. Interacting with ORMs: Debugging Generated Queries and Knowing When to Use Raw SQL
- 23.4. Asynchronous Task Queues in the Database
- Chapter 24: Security Beyond the Basics
- 24.1. Implementing Row-Level Security (RLS) for Multi-Tenant Systems
- 24.2. Using Column-Level Privileges for Fine-Grained Access Control
- 24.3. Data Masking and Anonymization Techniques with Views and Functions
- 24.4. Best Practices for Application Service Roles vs. Administrative Roles
- Chapter 25: Event-Driven Architectures
- 25.1. Real-time In-App Notifications with
LISTEN/NOTIFY - 25.2. Building Robust, Decoupled Audit Trails with Triggers
- 25.3. Comparing
LISTEN/NOTIFYwith Logical Replication (CDC) for Event Sourcing - 25.4. Preventing Common Pitfalls in Event-Driven Database Design
- 25.1. Real-time In-App Notifications with
- Chapter 26: Schema Evolution & Migrations
- 26.1. Principles of Zero-Downtime Migration
- 26.2. Using Migration Tools: Flyway, Liquibase, and Alembic
- 26.3. Strategies for Non-Blocking Changes (e.g., adding columns, changing types)
- 26.4. Understanding and Managing Transactional DDL (Data Definition Language)
- Chapter 27: High Availability and Disaster Recovery Concepts
- 27.1. Understanding Streaming Replication (Physical vs. Logical)
- 27.2. The Basics of Failover, Switchover, and Read Replicas
- 27.3. Point-in-Time Recovery (PITR): Your Ultimate Safety Net
- 27.4. Key Considerations for High Availability in the Cloud (e.g., RDS, Cloud SQL)
- Chapter 28: Data Ingestion at Scale
- 28.1. Mastering the
COPYCommand for Maximum Bulk Load Performance - 28.2. Performance Tuning for Ingestion (e.g., Disabling Constraints/Indexes)
- 28.3. Using
UNLOGGEDTables for High-Throughput Transient Staging - 28.4. Parallel Data Ingestion Patterns
- 28.1. Mastering the
- Chapter 29: Testing in the Database
- 29.1. Introduction to Database Unit Testing with
pgTAP - 29.2. Writing Tests for Functions, Views, and Security Rules
- 29.3. Integrating Database Tests into a CI/CD Pipeline
- 29.4. Data Validation and Quality Assurance Strategies
- 29.1. Introduction to Database Unit Testing with
- Chapter 30: Next-Generation Analytics in PostgreSQL
- 30.1. Graph Analysis: Advanced Recursive CTEs for Network Traversal and Pathfinding
- 30.2. Probabilistic Structures: Using HyperLogLog and Bloom Filters for Large-Scale Estimation
- 30.3. In-Database ML: Introduction to
PostgresMLfor In-Database Model Training - 30.4. In-Database ML: Using PL/Python with
scikit-learnfor Custom Analytics
Part 6: Just Use Postgres
This part explores how to leverage PostgreSQL’s advanced, built-in features to reduce architectural complexity. Each chapter provides a practical guide to replacing a specialized, external system with a robust, transactionally-consistent solution inside the database you already have.
- Chapter 31: Postgres as a High-Performance Job Queue
- 31.1. Designing a Transactional
jobsTable - 31.2. The
SELECT ... FOR UPDATE SKIP LOCKEDPattern for Concurrent Workers - 31.3. Implementing Job Priorities and Scheduled Execution with
pg_cron - 31.4. Strategies for Handling Failed Jobs, Retries, and Dead-Letter Queues
- 31.1. Designing a Transactional
- Chapter 32: Postgres as a Resilient Cache
- 32.1. Simple Key-Value Caching and Expiration Policies
- 32.2. High-Throughput Caching with
UNLOGGEDTables - 32.3. Automated Query Caching with Materialized Views
- 32.4. Transactional Cache Invalidation Strategies
- Chapter 33: Postgres as a Real-time Pub/Sub Broker
- 33.1. Asynchronous Messaging with
LISTENandNOTIFY - 33.2. Designing Effective Notification Channels and Payloads
- 33.3. Integrating with Application Backends for Live Updates
- 33.4. Understanding Limitations and Use Cases vs. Kafka or RabbitMQ
- 33.1. Asynchronous Messaging with
- Chapter 34: Postgres as a Full-Text Search Engine
- 34.1. Mastering
tsvectorfor Document Indexing andtsqueryfor Searching - 34.2. Building Performant Search Indexes with
GIN - 34.4. Combining Full-Text Search with Fuzzy String Matching (
pg_trgm)
- 34.1. Mastering
- Chapter 35: Postgres as an Analytics Data Warehouse
- 35.1. Structuring Tables for Analytical Workloads (Partitioning)
- 35.2. Leveraging
BRINIndexes for Large, Sequentially Ordered Data - 35.3. Forcing and Analyzing Parallel Query Execution Plans
- 35.4. Patterns for Hybrid Transactional/Analytical Processing (HTAP)
- Chapter 36: Postgres as a NoSQL Document Database
- 36.1. Designing Schemas with the
JSONBData Type - 36.2. Advanced Querying with
JSONBOperators and Path Expressions - 36.3. High-Performance Indexing of Document Data with
GIN - 36.4. Combining Relational and Document Models in a Hybrid Approach
- 36.1. Designing Schemas with the
- Chapter 37: Postgres as a Time-Series Database
- 37.1. Native Time-Series: Partitioning by Time Range and Using
BRINIndexes - 37.2. Analytical Queries with Window Functions for Time-Based Analysis
- 37.3. Supercharging Time-Series with the TimescaleDB Extension
- 37.4. Implementing Data Retention Policies and Continuous Aggregates
- 37.1. Native Time-Series: Partitioning by Time Range and Using
- Chapter 38: Postgres as a Key-Value Store
- 38.1. Simple Schemas for Key-Value, Configuration, and Feature Flags
- 38.2. Leveraging B-Tree Indexes for Fast Point Lookups
- 38.3. Using the
HSTOREExtension for a Dedicated Key-Value Type - 38.4. When to Use a Table vs.
JSONBfor Key-Value Data
- Chapter 39: Postgres as a Geospatial Database with PostGIS
- 39.1. Core Concepts: The
GEOMETRYvs.GEOGRAPHYTypes - 39.2. Spatial Indexing for Lightning-Fast Proximity Queries (
GiST) - 39.3. Common Geospatial Analysis Patterns (Nearest Neighbor, Overlays, Buffering)
- 39.4. Integrating Geospatial Data with Business Analytics
- 39.1. Core Concepts: The
- Chapter 40: Postgres as a Graph Database
- 40.1. Modeling Graph Relationships (Nodes and Edges) in Relational Tables
- 40.2. Traversing Networks with Recursive Common Table Expressions (
WITH RECURSIVE) - 40.3. Solving Common Problems like Pathfinding and Hierarchical Queries
- 40.4. When to Consider Graph-Specific Extensions like
apache-age
- Chapter 41: Postgres as a Cron Scheduler
- 41.1. Introduction to the
pg_cronExtension - 41.2. Scheduling Periodic Jobs (Cleanups, Report Generation) Directly in SQL
- 41.3. Monitoring Job History and Failures
- 41.4. Comparing
pg_cronwith System-Level Cron and Application-Side Schedulers
- 41.1. Introduction to the
- Chapter 42: Postgres as a State Machine Engine
- 42.1. Modeling States and Valid Transitions with
CHECKConstraints - 42.2. Using Triggers to Enforce Complex Business Rules on State Changes
- 42.3. Creating an Immutable Audit Log for State Transitions
- 42.4. Guaranteeing State Consistency with ACID Transactions
- 42.1. Modeling States and Valid Transitions with
- Chapter 43: Postgres as a Feature Flag & Configuration Service
- 43.1. Designing a Centralized Table for Feature Flags and App Configuration
- 43.2. Implementing Percentage-Based Rollouts and User-Specific Targeting
- 43.3. Reloading Configuration in Your Application Without a Restart
- 43.4. Versioning and Auditing Configuration Changes
- Chapter 44: Postgres as a Rate Limiter
- 44.1. Implementing the Token Bucket Algorithm in SQL
- 44.2. Using Window Functions to Count Events in a Sliding Time Window
- 44.3. The Power of Transactional Counters for Atomic Operations
- 44.4. Trade-offs: Consistency and Durability vs. an In-Memory Solution like Redis
- Chapter 45: Postgres as a Distributed Lock Manager
- 45.1. Introduction to Advisory Locks (Session-level vs. Transaction-level)
- 45.2. Ensuring Exclusive Execution of a Task Across Multiple Servers
- 45.3. Preventing Race Conditions and the “Thundering Herd” Problem
- 45.4. Practical Use Cases: Singleton jobs, leader election, and resource locking