settling our hash

posted by tom / April 19, 2005 /

Not the food; not the drug; not the terrifying drunken running hobby. No no no, a FAR more exciting type of hash: the algorithmic kind! Party!

See, these things have been in the news lately because a couple of popular hashing techniques have started to look susceptible to attack. This might seem mind-numbingly boring, but it's important -- in a broad sense, because of security and encryption. In practice, because defeating hashing could in theory allow record companies bring P2P to a halt.

Hashing is a one-way mathematical operation that generates a short key from a larger file. There are different hashing algorithms, but as an example, here's an MD5 hash for the wma file on my desktop of Pat O'Brien's explicit voicemail messages:

74db2e2eeae803e9865c35a4e0542289
Hashes can generally be considered distinct -- they're commonly compared to fingerprints. Like fingerprints, they're primarily useful for comparison purposes. If you've got a fingerprint record of a wanted criminal and a man in custody, you can compare the suspect's fingerprints to the ones on record and tell if he's the guy you're looking for. It's pretty easy to do, too -- rather than shipping the guy's whole body over to your place of comparison, you can just send a card with his prints through the mail. Also important is the fact that small changes in the suspect/file produce large changes in the fingerprint/hash. Round up the guy's brother and, even though they may be similar in lots of ways, the fingerprints will be completely different. Similarly, if you flip one bit in a file it'll have a completely different hash.

It's important to note, though, that two files can have the same hash. It's easy to see why: an MD5 hash is only 16 bytes long, meaning that it can have 2(16*8) possible values. That's a lot, but not much compared to the size of pat_obrien_coked_n_horny.wma -- that file is 2,645,957 bytes, a length that could host 2(2,645,957*8) different values. MD5 is now considered a weak hashing algorithm, but the principle applies to all hashing techniques: so long as a hash is shorter than the message from which it's generated, there will always be multiple possible messages with the same hash. When this happens it's called a collision.

So why is this important? Well, you can use a hash to make sure a downloaded message hasn't been tampered with. If it has, its hash will be different from the one "on record". P2P applications use hashes a lot -- that's how they can tell when differently named files are actually identical. They also use them to identify the individual segments into which a file is commonly broken up during downloading. Collisions represent a case where this system breaks down.

So let's say you're the RIAA. The downloading? You're not such a big fan of it. What can you do to poison the P2P networks? Well, the obvious answer is to flood them with junk, making it harder to download real music. But you've already done that -- Kazaa has lots of files on it that contain a few seconds of music, then ear-torturing static. But that just puts another copy of a song on a network. There's nothing stopping the real copy from being there too, and presumably people will hang on to more good-sounding copies than static-y ones. It'll consequently be faster to download the real one, so more people will, and the poisoned copy will die out.

But what if you could also ruin the good copy? If you could generate a file, or pieces of a file, with the same hash as the good copy or its pieces, you could upload them to the P2P networks and produce horribly screwed-up music files. Well, a Finnish firm claims to be able to do exactly that.

Nobody seems to be clear on whether they actually can or not. Hashing has been having a rough time lately -- like I said, MD5 is considered weak, but the industry-standard SHA-1 has been having problems, too -- it was broken earlier this year (on my birthday! hey!). So far these attacks are just collision attacks -- techniques to generate two files with the same hash from scratch. That wouldn't cut it for poisoning the P2P networks, though. Instead you'd have to take a given file with a given hash and create another file with the same hash. That's much harder -- it's called a preimage attack and, so far as we know, nobody can do it yet. But if they could, it'd be a problem.

Not an unsolvable one, though. Hard as a preimage attack is, a preimage attack that produces an identical-hash file of the same size as the original is even harder, and that, too, would probably be necessary to fool BitTorrent and its ilk. And there's nothing stopping future P2P apps from applying two different hashing techniques to the same file -- finding a preimage solution to two different algorithms is probably mathematically impossible. Plus the geeks on high seem to think that SHA-256, the hashing vancomycin, is still safe.

But in the short term, the breaking of these hashing algorithms might deal a significant blow to P2P -- particularly for Kazaa, which has one of the weaker hashing schemes around.

Comments

But the internet is so large and I'm just one man. What can I possibly do to help?

Posted by: jeff on April 19, 2005 02:16 PM

personally, I'd suggest buying as much ammunition and clean water as you can get your hands on.

Posted by: tom on April 19, 2005 02:21 PM

Post A Comment

Name


Email Address


URL


Comments


Remember info?



Google Analytics