The Catalog Learns to Listen

The day Casset stopped storing audio and started understanding it, and why a single migration mattered more than it looked.

For a long time Casset could tell you exactly when a song did something and almost nothing about what it was doing. It knew the beat grid down to the millisecond. It could not have told you the key if you held it upside down.

That changed with a migration that looked, on the surface, like a few extra columns.

model TrackMusicAnalysis {
  // timing layer (existing beat-sync DSP, unchanged)
  bpm Float?  beatMapJson Json?  transientMapJson Json?  energyEnvelopeJson Json?
  analysisVersion String  sourceAudioUrlHash String?  status MusicAnalysisStatus  generatedAt DateTime?

  // musical layer (NEW: the analyzer populates this)
  musicalKey             String?   @db.VarChar(40)
  detectedInstruments    String[]
  musicalConfidence      Float?
  musicalProvider        String?   @db.VarChar(40)   // "openai"
  musicalAnalysisVersion String?   @db.VarChar(40)   // "openai-audio-v1"
  musicalSourceAudioHash String?   @db.VarChar(80)
  musicalAnalyzedAt      DateTime?
  musicalAnalysisError   String?   @db.VarChar(500)
}

Two kinds of knowing

Look at the comment in the middle. There are two layers in one table, and they know things in completely different ways.

The timing layer is old, deterministic DSP. Beats, transients, an energy envelope. Run it twice on the same file and you get the same answer, forever. It is a measurement. It does not have opinions.

The musical layer is new, and it is a different epistemology entirely. Key, instruments, a confidence score. A model listens to the audio and tells you what it hears. It can be wrong. Which is exactly why musicalConfidence, musicalProvider, and musicalAnalysisError are first-class columns. If a thing can be wrong, the schema has to admit it, version it, and let you re-run it later with a better listener. The first principle of the migration was to never let the second kind of knowing contaminate the first. The DSP stays untouched. The model's hearing sits beside it, labeled, dated, and reversible.

Why a few columns is a turning point

Before this, a track in Casset was a file with timing attached. After this, a track can describe itself. It can say I am in F minor, there is a Rhodes and a brushed snare, and I am fairly sure about the first part and not the second. That sentence is small. The consequence is not.

A catalog that can describe itself is a catalog you can ask questions of. Which unreleased songs are in the same key. Which ones share an instrument palette. What disappeared from the work over five years and what kept coming back. None of those are features I had to build one at a time. They fall out of the catalog understanding itself a little better, and every interface on top of it gets smarter for free.

That is the whole bet, stated in a schema. Casset is not trying to be a smarter chat box bolted onto your files. It is trying to make the underlying understanding of a body of work richer, so that the model of the creator compounds. musicalProvider is a string on purpose. The listener will be replaced; the listening should accrue.

The quiet part

The thing I keep returning to is that the catalog did not get bigger that day. It got more legible to itself. Same songs, same files, same beats. But now the library can read a little, and a library that can read is the start of a library that remembers.

We added eight columns. We were really adding a sense.