Search This Blog

Saturday, September 02, 2006

The Burning Crusade MoPaQs

A few days ago, BZ made me aware of the fact that the World of Warcraft: The Burning Crusade friends and family beta was available for download on the WoW site. As one of the areas of my expertise is the MoPaQ archive format (used by all Blizzard games since Diablo), I immediately wanted to know whether there had been any additions to the format with this new release.

I walked him through all the places he needed to look for additions, as he already had it downloaded and I did not. No new flags in the file table, no new extended attributes. MPQDump reported that there were no new compressions methods in use, nor unusual "system" files. There were, however, 12 new bytes in the MPQ header; unfortunately, they were all 0 in all of the game archives.

To make a long story short, I spent several hours over Thursday and Friday looking at the disassembly and running the thing (the installer, to be specific) with a debugger; I couldn't actually watch the code that used the new fields execute, but I did watch the code around those areas, and tried to put the pieces together in my head.

Finally, I'd completed my analysis, and was ready to update my specs. But I couldn't help but want to verify that everything I'd figured out was correct; but how do you study something when that thing doesn't exist? Well, you make it, and see if it works. And thus began the experiment to create a recombinant MPQ.

I made a list of all the new features in BC, so that I could be sure I tried all of them.
- Pointer to the extended file table
- Large archive support for the hash table pointer
- Large archive support for the file table pointer
- Large archive support for the file pointers
- The shunting system

How to test all of these with minimal effort, while eliminative false negatives and positives? Well, to me, the path of least resistance was fairly obvious: I spliced 4294967296 bytes of garbage directly after the MPQ header. This ensured that every file pointer in the archive would have to be altered, and shifted above the 32-bit file pointer limit present in older MPQs. Because it was exactly 4294967296 bytes, no existing pointers in the file (that is, the low 32 bits of the pointers) would have to be altered; the upper bits just had to be inserted, and they would always be 1. Thus, by simply splicing data there and setting the new fields of the header (two of three of which just needed to be set to 1), I'd knocked all of the three first items off the checklist. However, now I needed to add the high bits to all of the file pointers. This was accomplished simply by appending the proper number of bytes at the end of the archive (2 bytes per file) with the hex pattern 01 00.

But the real clincher would be the shunt. I had, I believed, figured out enough about the shunt to get it to do its thing. However, there were two values from the shunt header that the MPQ API saved in its archive data structure that I couldn't tell where they were used, meaning I couldn't tell HOW they were used. So, all I could do is set the value I knew what it did to what it should be and the value I didn't to 0, and hoped for the best.

After writing recMPQ, a program to perform the recombination on an archive, I ran the program on all three of the installer tomes (installation archives). What better way to verify my understanding than to use the recombinant archives as vectors and attempt a transfection?

I observed the experiment from WinDbg. As the archive was opened, I placed watches on the fields that the unknown portions of the shunt header were saved to, with the hope of being able to find the location of the code that was accessing them. Unfortunately, this failed; the fields were never observed to be accessed.

However, the recombinant MPQs worked perfectly - they were uptaken and their payload delivered without difficulty. Thus, the experiment was a success, and I updated my specs with (most of) the information I'd learned.

1 comment:

Anonymous said...

Hi Justin!

My paper is not related to machine translation. I'm going to post my machine translation results in an iranian conference, because it didn't have much contribution other than applying a method (Template based machine translation) to English2Farsi!

The paper that I submitted was about integration of a corporate blogging system with Customer Relationship Management (CRM) solutions. And I submitted it to Organizational Engineering track. It is a pure information technology topic, and it was the result of my research paper for my CRM course at univ...