Search This Blog

Sunday, December 09, 2007

Random Fact of the Day

Q is officially starting to panic.

Stuff to do this week (last week of school):
- Finish the class version of E Terra
- Start/finish video of something for school
- Start/finish term paper
- Study for and take 4 finals


Olivier Ragain said...

Seems like a lot on your plate, but I was wondering if you would not mind add up to it a bit.
I am currently working on reverse engineering an archive file which, for now, seems a bit simplier than the MPQ one.
So here is my question:
- How did you find out the algortihm that translated an entry in the file table to a filename (or filename to file table entry)? Did you dissasemble the exe, or used another way ?


Justin Olbrantz (Quantam) said...

If you're lucky, you can eyeball it just from looking around the archive with a hex editor (I've been able to do this with some simple archive formats).

If you're less lucky, you'll have to get out a good disassembler and debugger and watch the code as it loads the major archive structures (from CreateFile), and try to make sense of it. If you can find something you know is a filename, that's even better, because you can breakpoint on access and watch how it gets used.

I've amazed BahamutZero on several occasions as I was telling him info about a format I was looking at in real time, just from looking at the hex (I seem to recall a reference to a crystal ball). Though that only works when the format isn't encrypted.

For example, I was able to figure out a lot of the Guild Wars archive format in a fairly short amount of time just from looking at the hex, but I haven't yet determined how it stores filenames (which correlates with the fact that I've never looked at it with a disassembler and debugger, and the filenames are obviously either hashed, encrypted, or compressed).

Olivier Ragain said...


Thanks for the quick answer, took me 2 days to find the hash method in olly debugger (first time) which hashes the file name and then compare it to the hashes of the files that can be found in the archive. Another thing about the archive is that it was created so that it could be bigger than 4GB...

So, hope my english won't be too bad from now on...

I ll make a quick explanation of the archive format. A file entry is 34 Bytes long with 12 Bytes I was not able to decypher partly until like a few hours ago. The file entry points to the file in the archive which holds a header which can be 136 Bytes or 137 bytes long (still do not know why the 1 byte difference).

So, stupid thing is, out of the 12 Bytes stored in the filetable entries that i was not able to understand, only 8 are 'created' by that hash method, so I still have 4 to go.

The second weird thing is that the exe i am dissassembling is not using the 128 bytes I do not get (out of the 136/7). I can't make heads or tails of for now... Since the exe is a patcher I hope to get the info regarding that chain on the next patch since right now the patcher ain't writing to these parts of the archive file

With Olly Debugger I tried REC, however it crashes like on half the methods I d like to read in a C like language instead of the assembly. Boomerang crashes too. You would not happen to know another free software that would do this right ?

Enjoy new years eve,


Justin Olbrantz (Quantam) said...

I've never worked with anything to convert from assembly to C/something. I've always used disassembly directly, and done the conversion back to high level language manually, when necessary (a tiny fraction of the time).

The process of compilation (high level language to assembly) is fundamentally degenerate: there are an almost infinite number of ways to compile a given piece of high level code into assembly, some of which are extremely difficult for a reverse-compiler program to figure out; as well, there are multiple pieces of high level code that could be compiled into the same assembly. So at best you'd get source that is ugly; at worst it would be as difficult to understand as the assembly itself (or more so).

What you should look into, though, is an intelligent disassembler, such as Interactive Disassembler. It's smart enough to identify many stack variables and function parameters automatically (obviously it doesn't know what they mean, but it finds where they exist, and translates references to them), and you can then manually refactor the names of variables, functions, and parameters. That's way, way easier than having to mentally translate all the ESP+x and EBP+x stuff.

Olivier Ragain said...

So archive have files, files have metadata (crypted in some kind of way), files who have the same metadata have the same crc32 hash ... the cryptic metadata is 128 bytes long ...

the metadata is getting weirder by the second...

I tried to calculate the crc32 of the metadata with and whitout its header but did not get the same value as the one stored in the file entry so i guess the crc32 of the metadata is taken from the clean data

am totally lost...

Olivier Ragain said...

okay, metadata contains something that can't be the filename, part of content in the file i guess since the crc 32 of the file and the metadata for that file are the same if the content of the file are the same

file1: meta1 & crc1
file2: meta2 & crc2
if ( content of file 1 = content of file2) then meta1 == meta2 and crc1 == crc2;

now, just need to find the method that produces the metadata ^^

Justin Olbrantz (Quantam) said...

If you know where the memory for a new entry is allocated before the entry is filled, breakpoint on write would probably work, and would be easy.

Olivier Ragain said...

Just found out that the metadata is actually downloaded from the net when files need patching.
Since the archive format has changed in the last 6 months (before it contained the filenames in clear) I hope that it will change again in the futur :)

Thanks for all the help

Olivier Ragain said...


Just a final note, we got a patcher working without the need to change the metadata which is great :)

See you some other time :)