It all began on a Monday afternoon. My computer was broken, the friend I wanted to play World of Warcraft with was at work, and I was bored. So, I decided I'd pop the Mega Man Anniversary Collection into my Playstation 2 and play some Mega Man 4 on our new 35" TV (the one that weighs 190 pounds). One of the very first things that struck me (I had the anniversary collection for a while now, but this was the first time I'd played one of the NES games on it) was the music... it was different!
After listening to it for a bit, I realized it was remixed versions of the original music in something resembling MIDI quality. As I used to collect video game music, I wanted it in my (MP3) collection. However, I'm a lot lazier than I used to be (back when I used to record music directly from the consoles), and I wanted an easier way. Particularly because of the fact that I'd heard the music, after several minutes, fade out and restart from the beginning; this was a strong indication that the game was using digital (and thus fairly easy to rip) music. So began the hunt.
Sticking the DVD into my friend's computer (who was preoccupied playing Dragon Quest 8 for about a week before and a couple days after this) turned up the basic PS2 configuration files (SYSTEM, SLUS, etc.), a number of IRX modules, and 13 AIF files. No XA files (for those who aren't familiar with them, the XA extension refers to CDXA format files, a multi-stream digital ADPCM compressed format popular with Playstation games), which I originally expected to find. Okay, now what?
The absence of anything else, and the fact that the AIF files consumed 3.5 gigs of the DVD, made it probable that they were archives. A quick look at the files in a hex editor seemed to agree. As you can see in the picture, there's a table of 16-byte structures with 5 apparent fields in little endian order (three 32-bit fields followed by two 16-bit fields). As well, the first 4 bytes of the file listed the offset of the end of the table; this was probably a file table.
Given the fact that the second 32-bit field of the file table entry was generally always equal to the second 32-bit field of the previous entry added to the third 32-bit field of the previous entry agreed with this; it seemed as though the second field was the file offset, and the third the file size. This was confirmed by following some of the file offsets and finding what appeared to be file headers; in addition, this also made it apparent that the files in the archives were neither encrypted nor compressed. The lack of any type of pattern in the first 32-bit field, and the complete absence of any file names in the archive, made me think this was a file name hash.
My first thought was to scan the archives looking for XA files. So I did a text search for 'CDXA', the format tag of the CDXA format. No dice. I then tried searching for 'RIFF', the tag of RIFF format files (of which CDXA files were). This turned up two matches: one apparently in an executable, and the other of an RIFF/WAVE file (standard issue .WAV file). I followed the offset back to the file table, and cut and pasted the WAV file, and played it. From the sound, it seemed to be background music for the title screen, nothing more.
As well, in the process of various text searches, I'd found strings referring to various files with names such as BGM.XA. This made me wonder if the files were stored outside the DVD file system. I don't really know why you would do that, but I've seen this done in other games, before. So, I whipped out Visual Studio 2003 and MSDN library, and started coding a text searcher. This one would open the DVD drive as a volume (look at CreateFile for information about this), then search for the text. In the process I amused myself by writing my first ever complete program using asynchronous I/O, which used dual read buffers to read a block while searching the other. But in the end, it was futile. No occurrences of 'CDXA' or 'RIFF' were found outside those found in the archive files.
Okay, now what?