more bang for your bitrate

posted by tom / February 18, 2005 /

Yesterday I promised in comments to figure out how lossless compression algorithms work. Lossless compression schemes are those that let you store a signal without losing any information -- think of ZIP files, for example. Turns out my top-of-my-head theoretical explanation wasn't bad -- a bunch of lossless compression algorithms do work similarly, and it's a set-in-stone fact that a lossless codec will make some worst-case files larger than they started out.

But I was specifically asked about FLAC, a lossless codec designed for audio. FLAC can take PCM data (a digital audio signal) and compress it to anywhere from 30-70% of its original size without losing any data. Neat. But how does it work?

Well, virtually all of this comes from wikipedia and its FLAC and FLAC-related entries -- if anything sounds wrong, go have a look there. But here's the nutshell-ready version: FLAC operates similarly to other lossless audio codecs by doing two things.

First, it uses run-length encoding. The idea is really simple: if we have a bunch of identical samples occurring in a row -- say, a long string of zeros representing a patch of silence -- it makes more sense to store one number that counts the total number of zeros rather than using up space for a different number representing each individual zero. It probably goes something like this: an "escape sequence" to tell the decompressor to pay attention, then the sample to copy out, then how many times to copy it. The exact implementation might differ, but that's the general idea.

Second, FLAC uses a statistical technique called linear prediction to store the rest of the signal. I'm having a hell of a time finding a clear online explanation of LP for math-averse people like myself. But the basic idea is that for each member of a set of discrete data points, the previous data points are plugged into a function. The coefficients to that function are determined during the compression portion and saved, although I'm not clear on the exact means by which this is accomplished. The idea is that you can store relatively few parameters to generate a linear function that looks like your data, then presumably store a little extra data that tweaks the function where it doesn't match the input data closely enough.

Near as I can tell this all works because samples in PCM data are correlated from one to another by virtue of how sound waves work, and how frequently they're sampled in digital recording. For instance, if air pressure has been increasing for the last dozen forty-thousandths of a second, there's a good chance it'll keep going up in the next one. To make this work even better, a lot of lossless codecs flatten the curve of the sound wave by taking its first derivative prior to generating the linear prediction data (I'm unclear on whether FLAC specifically does this, though). It's that first derivative curve that's regenerated at the decompression phase, then integrated, and presto, you've got your original signal.

Make sense? Yeah, I don't know either. Any statisticians or engineers out there, feel free to correct me. The worst part of doing this kind of research is reading over and over again how elementary an operation linear prediction is. Why don't you say it to my face, you goddamn nerds!

Post A Comment

Name


Email Address


URL


Comments


Remember info?



Google Analytics