Learning PostgreSQL with NASA's Cassini Data: A Curious Moon

I recently worked through A Curious Moon by Rob Conery, a PostgreSQL tutorial with a twist: instead of contrived examples, you analyse real data from NASA’s Cassini spacecraft mission to Saturn’s moon Enceladus. Here are the features and concepts that stood out.

The Setup

Enceladus is one of the most scientifically interesting moons in our solar system – water geysers erupt from its south pole, hinting at a subsurface ocean. The Cassini spacecraft flew past it 23 times between 2005 and 2015, collecting gigabytes of instrument data. That data is publicly available from NASA’s Planetary Data System, and this book has you import and analyse it with PostgreSQL.

The datasets include:

Master Plan – the full Cassini mission timeline of events and observations
INMS – Ion and Neutral Mass Spectrometer readings (1.3GB, 3.3 million rows)
CDA – Cosmic Dust Analyzer impact events (57,000 dust particle detections)

What Makes It Interesting

Real Data, Real Problems

Scientific data is messy. Columns use sentinel values like -99.99 for missing readings. Some rows are truncated. Headers repeat mid-file. The book doesn’t shy away from this – you learn to handle it because you have to.

Full Text Search for Science

One of the more satisfying chapters uses PostgreSQL’s built-in full text search (tsvector, tsquery) to search through Cassini’s mission plan descriptions. Instead of chaining LIKE and ILIKE queries, you build a proper search index and query it:

select * from events
where to_tsvector(description) @@ to_tsquery('enceladus & plume');

Window Functions with Purpose

Window functions can feel abstract in a textbook. Here, they answer real questions – calculating running counts of dust impacts, partitioning flyby data by time windows, and computing altitude changes during closest approach. The context makes the syntax stick.

CTEs That Build on Each Other

The later chapters chain multiple CTEs together to solve complex analytical problems. For example, finding the exact moment of closest approach during each flyby requires finding the minimum altitude per week, joining back to get timestamps, then building time windows around those moments:

with mins as (
  select min(altitude) as nadir, year, week
  from time_altitudes
  group by year, week
), min_times as (
  select mins.*,
    min(time_stamp) as low_time,
    min(time_stamp) + make_interval(secs => 20) as window_end,
    min(time_stamp) - make_interval(secs => 20) as window_start
  from mins
  inner join time_altitudes ta
  on mins.year = ta.year and mins.week = ta.week
  and mins.nadir = ta.altitude
  group by mins.week, mins.year, mins.nadir
)
select * from min_times;

Progressive Database Design

The book mirrors how database design actually evolves. You start with a flat CSV import into text columns, normalise into proper types with foreign keys, create views for convenience, add indexes when queries get slow, and refactor tables as your understanding deepens. It’s not a clean top-down design – it’s iterative, which feels honest.

Chapter Progression

Database setup – schemas, COPY imports, basic type casting
Normalisation – lookup tables, foreign keys, data integrity
Finding flybys – pattern matching, regex, joins across tables
Full text search – views, tsvector/tsquery, date ranges
INMS analysis – date/time functions, CTEs, aggregations
Dust analysis – window functions, materialized views, custom functions
Gravity assist – complex multi-stage CTEs, interval arithmetic
Under the ice – chemical analysis, time-window joins, index optimisation

Worth It?

If you already know basic SQL and want to get comfortable with PostgreSQL’s more powerful features – CTEs, window functions, full text search, custom functions – this is a compelling way to learn them. The scientific context keeps things interesting, and working with a genuinely large dataset (3.3 million INMS rows) forces you to care about performance in a way that toy examples never do.

Devan

A repository of thoughts and notes