Search This Blog

Sunday, January 10, 2010

Bibliography - Programming

This post is a partial directory of the various programming and reverse-engineering I've done over the last 12 years, especially those that produced some manner of result (program, specification, or processed data). This list may omit projects that are too old, too obscure, too uninteresting, or that I just don't remember off the top of my head; it also omits most projects that were done as assignments or term projects in school. With a handful of exceptions, all of these are entirely my own work. With the exception of the few most significant projects being moved to the top, the list is in alphabetical order.

This list may undergo updates in the future.

(CHK) Starcraft Map File Format [reverse-engineering]
Reverse-engineering of most of the Starcraft map file format. This was my first modding community project, and it was for this that I was invited to join Campaign Creations in 1998. This consisted of repeated edits of a map in the Starcraft map editor (StarEdit) and then observing the changes in the map file it generates. Perhaps one of the most impressive parts of this accomplishment was that the vast majority of this was done using nothing more than the DOS edit.com utility and Windows calculator; no true hex editor, no disassembler or debugger (at that time I wasn't even familiar with reverse-engineering), etc.

It looks like this spec has become difficult to come by these days. Campaign Creations appears to no longer have it (I'll have to get that fixed) and only a few modified and/or butchered copies of it found on Google, some lacking credits for who wrote it. So, here's the version I still have on my computer.

MoPaQ 2000 [programming]
This (a.k.a. MPQ2K) was the project that really put the name Quantam on the Blizzard game modding map: the first stable, full-featured MPQ editor. Prior to this there was already a full-featured MPQ editor (I think the name was MPQEdit), however the extreme instability and bugginess meant that it was widely considered worse than nothing, and rarely used.

MPQ2K was a command-line user interface for the Lelik MPQ API, a library that hijacked the Starcraft Campaign Editor (StarEdit) and called the MPQ-writing functions it contained (as Starcraft maps are packaged into MPQ archives), without ever knowing the details of the MPQ format, itself (it's worth noting that this is where I learned about breaking and entering into processes and hooking functions). At the same time as the library, Lelik released a sample editor (I believe the name was MPQ Archiver) that the MPQ2K interface design used as a starting point; it was very clumsy and minimalistic, however, and lacked the major features that would make MPQ2K popular, such as scripting. MPQ2K remained the MPQ editor for several years, until cumulative speed improvements finally allowed WinMPQ, a graphical MPQ editor, to seize the title of most popular MPQ editor.

Understanding the significance of a practical MPQ editor requires some knowledge of the context. Prior to the use of MPQs, the only option to modify Starcraft game data, apart from what could be done in the map editor alone, was StarDraft, the predecessor to MPQDraft. StarDraft allowed in-memory patching of most Starcraft data files, allowing much more than was possible with the map editor alone (e.g. modification of sound effects, graphics, gameplay data such as unit statistics, etc.). While this was a hugely important development that essentially founded the Starcraft modding community (apart from custom maps), StarDraft suffered from several severe shortcomings that drastically limited its practicality: it was version-specific (meaning that every time a Starcraft patch was released StarDraft had to be manually updated by the coder before it could work with the new version), it could not replace music and certain other files, and, perhaps most significantly, it tended to actually slow down the game, degrading gameplay noticeably.

Thus, StarDraft was the first revolution in Starcraft modding, and use of MPQ files was the second. The replacement of StarDraft with custom MPQs as the basis for community mods solved many of these problems that plagued StarDraft. Music files could now be patched (although a handful of relatively minor files remained unpatchable until the coming of MPQDraft); as well, as MPQs were used natively by Starcraft and other Blizzard games, rather than requiring hacking into the game via StarDraft, this approach meant no gameplay degradation; it also meant this method could be applied to other Blizzard games that used MPQs, not merely Starcraft. This made StarDraft obsolete overnight.

MoPaQ File Format [reverse-engineering]
The complete reverse-engineering of the MPQ archive format and publishing of what remains the authoritative modder's specification the format. Most of this was performed during development of MoPaQ 2000 2.0. While Lelik reverse-engineered StarEdit sufficiently to allow LMPQAPI to call functions in the Blizzard MPQ API in StarEdit, he never actually looked at the MPQ format itself. Thus the work fell to me to provide functions that didn't exist in StarEdit, by writing code that edited the MPQs directly (code which was then integrated into LMPQAPI 2.0). This was also where I first learned how to reverse-engineer a program via disassembly and debugging. While I gave out a lot of information about the format to a number of people on forums and other media, I didn't actually release a full spec document until some time later (the official version of the spec is here, though as it's a wiki it's been known to be vandalized from time to time).

MPQDraft [reverse-engineering, programming]
The third revolution in Blizzard game modding, MPQDraft perfected the MPQ file technique. Like StarDraft, MPQDraft consisted of a loader that actively invaded the game being patched by a mod; however, MPQDraft suffered from none of the problems of StarDraft: it was version (and game) independent, it finally solved the problem of unpatchable files once and of all, and did not produce any gameplay degradation.

Like StarDraft, MPQDraft allowed the creation of "self-executing MPQs", which allow authors of mods to conveniently distribute large, complex mods in single executable files. However, MPQDraft added support for a plugin system, which allowed developers to add on new functionality to mods, while still making use of the basic functionality of MPQDraft. A couple examples of the most noteworthy plugins developed for MPQDraft are MemGraft (now obsolete), which allowed modification of data stored in the Starcraft executable itself (as opposed to in the Starcraft data files), and ThunderGraft (described below).

MPQDraft is now open-source, on SourceForge. Surprisingly, it still gets more than 600 downloads each month (this is only counting downloads through the SourceForge page; mirrors such as the one on Campaign Creations would not be counted), despite the fact that the most recent game it works on, Warcraft III, is about 8 years old, now.

ThunderGraft [reverse-engineering, programming]
The third of my major modding programs, ThunderGraft adding the ability to play modern audio compression formats (e.g. MP3 and Ogg Vorbis) to older Blizzard games, especially Starcraft. Prior to Warcraft III, Blizzard games used either raw PCM (Diablo) or compressed ADPCM (Starcraft, Diablo II, and Warcraft II) audio formats to store music and sound effects; both formats were undesirable for mods because they were either very bulky (in the case of PCM) or resulted in noticeable loss of audio quality with sub-optimal file sizes (ADPCM). ThunderGraft replaced the entire music streaming system of the games with its own, using the FMod Ex audio library to perform audio decoding of any of the multitude of compression formats the library supports. To my knowledge, it is, six years later, still the only modding tool to attempt such a feat. I finally got around to cleaning up and releasing ThunderGraft as open source a couple months ago on SourceForge, though given that the most recent game it supports, Diablo II, is about 10 years old, and the fact that broadband has made small mod sizes less critical than it was 10 years ago, it sees little use today (only about 30 downloads through SourceForge per month).

Allocation Benchmark [reverse-engineering, programming]
A project to examine the relative performance of several memory allocators - the Windows heap, Windows low-fragmentation heap, Hoard allocator, and the Blizzard Storm allocator (SMem, a custom buddy allocator). Allocation, free, and resize operations were logged from an actual game of Warcraft III via function hooking, then fed in order to each allocator while the duration of each operation is measured, and the results compared. Covered some in various posts.

AUS Decode [reverse-engineering, programming]
This project consisted of reverse-engineering the archive (AIF) and digital audio (AUS) file formats in the Mega Man Anniversary Collection to create a decoder that could extract the music directly from the game. Both AIF and AUS file formats were previously unexamined and undocumented, although the audio compression algorithm turned out to be the documented Sony ADPCM variant VAG. Reverse-engineering of both formats was accomplished with nothing more than a hex editor (no disassembler or debugger). I talked a tiny bit about this, but never said much about the technical side of it or released the code.

(BIN) Diablo II Binary Data Format [reverse-engineering]
Reverse-engineering of a number of major Diablo II binary data formats, e.g. armor.bin, itemtypes.bin monstats.bin, treasureclassex.bin, etc. These files are compiled binary versions of the SLYK spreadsheet files containing game global data. This was originally done with the intent of making a data editor tentatively called HellForge, but I never got around to that. I don't believe I ever released the specs of the BIN files; if I did, it would have been on Phrozen Keep.

DxWnd [reverse-engineering, programming]
DxWnd allows full-screen DirectDraw games that do not support windowed-mode DirectDraw to run in a window. I can't remember whether the initial idea was mine or Skywing's, but I did the initial research and reverse-engineering for the program and came up with the basic method. Skywing then wrote the code and debugged the cases that required modifications to my basic method (e.g. programs that combine Windows GDI and DirectDraw for graphics, such as the Diablo/Starcraft/Warcraft 2 Battle.net interface). I later added code to support window resizing via Direct3D, rather than the standard drawing method using GDI, though I can't remember if that code ever got released.

E Terra [programming]
A real-time strategy game based on ecology and evolution. Because this game focuses on nature and the organic, rather than humanity and society, the gameplay is substantially different than any existing RTS I've heard of, though it's bears some resemblance to Populous. E.g. there are no unit classes - unit "classes" are created by the player through gameplay/evolution; units are not built/trained - they are born automatically; etc. Current status: awaiting the motivation to work on it more.

Engram [design]
The name a Portmanteau contraction of "English program", a programming language based on a subset of natural English. This was something I started working on early in my compilers class, when I heard we were each going to be writing a compiler as a term project. Work on it came to an abrupt end when I learned that we didn't get to choose the language for our compilers. Other than what I communicated to a friend in IRC and IM, I never really wrote down a specification or anything.

LibQ [programming]
A small, minimalistic cross-platform library (designed for at least Windows and POSIX support) of highly platform-dependent features, such as multithreading (more specifically, synchronization, inter-thread/process communication, and atomic operations and structures), endian-conversion, and high-performance file I/O. Talked about extensively on this blog, though that was before I started using tags or anything to make the posts easy to find.

NWNScript Compiler [programming]
A compiler capable of compiling NWScript, the world scripting language used in NeverWinter Nights, NWN2, and Dragon Age, to native machine code for use in Skywing's NWN2 standalone server. This is the first attempt at such a thing that I'm familiar with; both the original games and Skywing's standalone server previously implement interpreters to execute the scripts.

In addition to the analyzer, which parses the NWScript bytecode files and generates intermediate representation code, two backend code generators were created to produce native code by proxy. Skywing's backend generates .NET bytecode, which uses the .NET just-in-time compiler to generate partially optimized native code; my backend generates Low-Level Virtual Machine code, which uses the LLVM optimizer and JIT compiler to produce optimized native code.

Q1 [design, programming]
A fairly simple and elegant RISC processor design and emulator. The Q1 began as an idea for a term project in computer architecture class to create a simple CPU and emulator; like Engram, the Q1 was conceived and work began prior to learning that we couldn't design the instruction set (the whole purpose of the Q1). However, I stuck with this one long enough to finish the basic instruction set architecture and create an emulator and sample programs (though not an assembler, which made it a huge pain to program). A number of the design decisions were discussed on the blog.

Ray Tracer [programming]
As with just about everybody who has ever taken a graphics course, I had to make a simple ray tracer. As this was only one assignment (and not a term project), the emphasis was kind of on the simple (there was no requirement for support of meshes, reflection/refraction, etc. However, I really ended up getting into this one, and added a substantial amount of additional features beyond those required in the assignment. Thanks to all my additions, it easily grabbed the top spot in the class.

While I intended to, I never got around to writing much about the ray tracer and the various features I chose to implement (and why), though the project page on the school wiki summarizes the features and shows them off with various screen shots.

recMPQ [reverse-engineering, programming]
recMPQ was a simple program to test some revere-engineering work on the MPQ format. The World of Warcraft Burning Crusade expansion added several new MPQ format features related to support for large archives (greater than 4 gigabytes). However, at the time no actual archives used these features. Consequently, I was forced to rely on pure disassembly, without being able to look at properly-formed archives or even watch the game load an archive using these features. recMPQ was a program to make a large MPQ archive based on my disassembly findings, which I could then replace a BC archive with to observe whether the game was able to successfully load the archive (indicating I had properly used the new features). This information was ultimately added to my MoPaQ format specification.

TextBreaker [programming]
A system to identify the language of a block of text using artificial neural networks. This was my term project (and accompanying term paper) in artificial intelligence class, which integrated AI, my interest in linguistics, and my interest in biology all into one. Unfortunately, identifying languages isn't as easy as it sounds, and the project ended up being more educational - in the sense of revealing difficulties and what does and doesn't work - than remarkably successful at actually identifying text. Nevertheless, it did manage to impress the (undergraduate) class, the teacher (specializing in AI), and my grandpa (a professional linguist), and my teacher and grandpa both suggested that I continue work on it and submit my results to AI and linguistics journals.

I never got around to cleaning up the source to release it (I did the whole term project in a week and a half, which made for very hackish code), but I did post the accompanying Orthographic Language Identification Using Artificial Neural Networks paper back when I turned it in.

StormDump [reverse-engineering, programming]
A memory tracker and browser used to examine the memory use of Blizzard games. Unlike the benchmarking project mentioned above, which merely logged calls to the allocation functions, this project consisted of reverse-engineering of the memory allocation system itself, and the associated data structures. It logged each memory allocation made, and tracked how many allocations were made of given sizes and types, both at a particular time and in total, and allowed you to dump all memory allocated at a particular time via hotkeys. All of this was done by directly traversing the memory structures for the allocation system. I don't believe I ever gave this or the source out to anyone.

Watcher [programming]
A simple program that logs all calls to API functions in various libraries that you specify using an configuration file. Version 1 of this hooked only functions specified in specified DLLs via INI files; some functions, such as the MPQ APIs, had hard-coded parameter-logging functions. Version 2 was capable of hooking all functions in all libraries and was intended to be scriptable to allow function-specific parameter logging for logged functions, though scriptability was never actually implemented.

ZIC2RAW [reverse-engineering, programming]
A project/program to extract the digital music files from the 3DO version of Mega Race. The format actually ended up being quite trivial to reverse-engineer, though the files turned out to not be entirely self-contained - part of the music data was embedded in the game executable itself, and not stored in the separate music files. Apart from a tool to work with 3DO discs, all of the reverse-engineering for this was done with a hex editor.

1 comment:

Necrolis said...

It may have been a while, but is it possible you could post/share the source for StormDump?

I'm curious to see how well it stacks up to the memory system employed by D2 (fog.dll's version).