Search This Blog
Friday, January 27, 2012
R.I.P. Dorkess
Dorkess (AKA Poguita): March 27, 1992-January 27, 2012
The twins, Poguita and Ping Pong (as well as two others we didn't keep, which seemed to be from different fathers) were born at our house on March 27, 1992. Poguita (yes, that is an English/Spanish pun) was named for the distinctive white stripe on her nose.
As with Slinky, I later started calling her Dorkess based on her distinctive personality. She was always extremely high strung, even for a cat. The slightest sudden noise would cause her to do her characteristic "vertical takeoff and landing" - shooting a couple feet into the air - then flee in terror, either out of the house or into a closet to hide. I used to say she never walks anywhere, she just sprints.
Her dorkiness led to a wide array of memorable incidents. In one case me, my mom, and one of my friends were watching something scary on TV. At one point, my mom screamed, at which point the entire clothes hamper exploded, throwing various pieces of clothing into the air. As it turned out, Dorkess had been sleeping in it, and completely freaked out when my mom screamed.
She was also an extremely finicky cat; nothing was acceptable to her unless you did it exactly how she wanted it. Perhaps she was narcissistic, as she did, after all, love staring at herself in shiny surfaces. Though she lacked the congenital metabolic problems Slinky had - and as a result was not so thin and slinky - she seemed to have some kind of a deformity on her back feet such that she was never able to fully retract her claws, resulting in a distinctive clicking as she walked on hard surfaces.
As she got older, she gained another distinctive attribute. For a variety of reasons, from wanting something to letting people know she was entering the house, as well as some which were just mysterious, she would use just about every muscle in her body to howl at an unbelievable volume, which vaguely resembled a baby crying; as one friend put it the first time he heard it, "What the heck was that?". On a couple occasions she even howled so loudly that she got dogs howling along with her. One particularly interesting feature of this was that, when howling, her meows were neither identical nor random, but followed a well-established pitch contour (e.g. there was always a large drop in pitch during the third meow).
Apart from a couple of serious but temporary cases where she was severely sick (and could conceivably have died), she was in fairly good health until just recently (though she certainly showed her age). She did end up going completely deaf a few years ago, but this was probably for the best, as afterwards she was much calmer and less high strung.
At one point a couple years ago, she began accumulating fluid around her lungs, making it difficult to breath. This probably would have killed her, but with regular medication with Lasix (the drug) the problem essentially went away (though the Lasix did seem to be doing some harm to her kidneys and liver). We never did learn the root cause, and it's possible that's ultimately related to why she died. For one reason or another, she eventually decided that she wanted to die, and simply stopped eating. This, on top of her already weakened state due to old age, resulted in her death a week later. The ultimate reason for this choice remains unknown.
One of Dorkess' common places to sleep
The twins: Dorkess (left) and Slinky (right)
The twins sleeping, Zen-style; best guess is that Slinky is in front, Dorkess is in back
Me and Dorkess, 5-10 years ago
Rest in peace, Dorkess.
Friday, December 30, 2011
Anti-Piracy for Dummies
Are you a content producer or the holder of copyright you didn't actually create? Are filthy thieving pirates downloading and uploading your content all across the web, driving your gross revenue into the negative? Fear not! Immigration and Customs Enforcement is here to help!
Beginning several years ago, the US Immigration and Customs Enforcement agency, in conjunction with the Department of Justice and the Department of Homeland Security, now handles copyright and other intellectual property enforcement through Operation In Our Sites. Through this program ICE has the authority to forcibly seize web sites that infringe your copyrights et al, providing critical relief for those besieged by piracy such as you. Once seized, these sites prominently display the ICE et al logos and a brief explanation that the site has been seized. As an example, observe the fate that befell torrent-finder.com for the rampant theft of copyrighted works it harbored.
It's a common misconception that you need to have special government connections, secret backroom deals, or incriminating evidence on politicians to request such a seizure. In fact, this is not true. Any copyright, patent, or trademark holder with a Windows computer may request such a seizure for any legitimate reason, and ICE will rapidly respond, taking the offending site offline with all due haste.
To request such a seizure, simply follow these easy steps:
Beginning several years ago, the US Immigration and Customs Enforcement agency, in conjunction with the Department of Justice and the Department of Homeland Security, now handles copyright and other intellectual property enforcement through Operation In Our Sites. Through this program ICE has the authority to forcibly seize web sites that infringe your copyrights et al, providing critical relief for those besieged by piracy such as you. Once seized, these sites prominently display the ICE et al logos and a brief explanation that the site has been seized. As an example, observe the fate that befell torrent-finder.com for the rampant theft of copyrighted works it harbored.
It's a common misconception that you need to have special government connections, secret backroom deals, or incriminating evidence on politicians to request such a seizure. In fact, this is not true. Any copyright, patent, or trademark holder with a Windows computer may request such a seizure for any legitimate reason, and ICE will rapidly respond, taking the offending site offline with all due haste.
To request such a seizure, simply follow these easy steps:
- Log onto your Windows computer with an account with administrator privileges (e.g. the Administrator account)
- Open the Run dialog, either through the Start menu or by pressing WINKEY+R
- Type in notepad "C:\Windows\System32\drivers\etc\hosts". If your Windows directory is not "C:\Windows", you will need to substitute the correct path.
- For each pirate site you wish to have seized, add the line "74.81.170.110 [site domain name]" to the bottom of the file. For example, to seize TechDirt, a notorious site dedicated to the promotion of piracy and the suppression of the free speech of congressmen and businesses, you would add "74.81.170.110 www.techdirt.com"
- Save the file and close Notepad
- Wait for your request to be processed by ICE. This typically varies from a few seconds to a day or so. If the sites are still online after several days, you should probably go back and make sure you did not make any mistakes in performing the above procedure.
That's all! Fast. Easy.
With you and ICE working together, we can all look forward to a piracy-free tomorrow!
This post inspired by tsavory.
Monday, October 18, 2010
Public Service Announcement
While solar panels are dark blue, almost black, they are also reflective. This means that they will likely appear white or some other bright color, which may be a big shock (and a big aesthetic problem) if you have a dark roof and the panels are clearly visible.
My parents are NOT amused.
Update: Actually this is only the case when the sky is cloudy, which it was the day my parents had the panels installed. When the sky is clear they look like they do without the reflection.
My parents are NOT amused.
Update: Actually this is only the case when the sky is cloudy, which it was the day my parents had the panels installed. When the sky is clear they look like they do without the reflection.
Tuesday, September 21, 2010
Wednesday, September 08, 2010
Public Service Announcement
When you call Discover card customer service, the very first thing you have to do is enter the last 4 digits of your Discover card, last 4 digits of "your social security number", and your ZIP code. It's important to note here that "your social security number" and "your ZIP code" do NOT mean "your social security number" and "your ZIP code"; it means the social security number and ZIP code of the primary cardholder, which may or may not be you.
Discover card apologizes for the inconvenience of refusing to even let you talk to a real person without providing something it specifically did not ask for.
Discover card apologizes for the inconvenience of refusing to even let you talk to a real person without providing something it specifically did not ask for.
Wednesday, August 04, 2010
NWScript Stack System
Note: SW has written a new post, talking about NWScript in general, akin to my post yesterday (though possibly in more detail than mine; I haven't actually read it all yet).
So, now we know where we are (the NWScript system), and where we need to go (LLVM assembly). Now, how do we get there?
Well, a lot of things are simple enough, requiring only trivial use of LLVM and a parser. But after investigating the NWScript system for this purpose, I identified several areas that would be nontrivial and would require some thought and effort, which I'll discuss.
The first challenge - that is, a nontrivial design issue - was the stack machine. Stack machines are easy to interpret, but harder to convert to something like LLVM assembly that uses registers (like real assembly languages) and strongly declared and typed variables (unlike real assembly languages). Stack machines use a huge number of stack locations during execution, the vast majority of them (more than 96% of them, in one sample I looked at) temporary copies that last only until the next math (or other) instruction. From the opposite perspective, it's necessary that all accesses of a single variable be mapped to the same memory location, regardless of where the variable is, relative to the top of the stack. It's even possible that multiple separate locations in the code could create a stack location that represents only a single variable, despite being created in multiple places.
To illustrate these problems, a few examples:
Here, we can see two stack locations are created that do not represent variables at all, but rather temporary parameters to the MUL instruction. A third temporary that also is not a real variable is created as the return value to the MUL instruction, which is then used as a parameter to the ADD instruction along with a fourth temporary copied from a variable. Finally, a fifth temporary is created for the return value from the ADD instruction, which is copied down to a result variable and destroyed. The C code for this would be
It may also be possible to use the stack as, well, a stack, building up a variable-length list of things as they're processed in one loop, then popping them in a second loop. I can't think of a practical use for this off the top of my head (something that is both useful and couldn't be accomplished any other way), but it might be possible.
So, the stack system poses challenges related to identifying variables. However, a second, less obvious problem exists, although it's not as specific to stack machines. In this case, the problem relates to functions. In NWScript, a call consists of:
The solution I came up with to this problem (as well as other problems, which I'll talk about in the future), was a two-pass system that performs analysis and intermediate code generation (intermediate code is an expanded form which deals with variable references rather than the stack, making it much easier to work with; this intermediate code would then be converted to the output code type in a third pass). The two passes could have been done in a single pass, but this would have been quite messy and needlessly memory-hungry, as opposed to the more elegant and efficient two-pass system.
First Pass
The first pass through the script file performs a couple tasks. It identifies functions and code flow blocks: the largest groups of instructions that execute as an atomic unit and have only a single execution path from the first to the last instruction in the block. It also determines the number of parameters and return values for each function (in NWScript, functions only have a single return value, but when the bytecode is generated, structs and other complex types are flattened to primitive types, making it look like there are multiple return values).
This pass is quite simple. It proceeds by scanning through the code, beginning with the main function, parsing each instruction as it goes. The current stack pointer (but not an actual stack) is maintained from instruction to instruction. As well, a code flow graph is created by watching for branch instructions and creating new flow blocks and splitting old ones (if a branch jumps into the middle of an existing flow, that flow must be split, as it's not an atomic unit). Finally, entries for functions are created as they're encountered.
The biggest trick here is dealing with function calls. The effect of other operations on the stack can be trivially calculated by decoding the instruction, but we can't step past a call instruction until we know the stack displacement. The same is true for calls to API functions, but we already know exactly what they do to the stack, as those are a closed set.
What I decided to do was to use a task queue. When a new, previously unencountered function is encountered in a call, a new function entry and task queue entry corresponding to the function are created, and the current analysis task is marked as blocking on that function. Conditional branches are handled more or less the same way: a new task is created for one branch, though not blocking on anything, and analysis continues at the other branch. Whenever the current task blocks (e.g. it calls an unanalyzed function), analysis switches to the first task in the queue that is not blocking on anything, and execution continues from there.
Tasks ultimately complete when a return is encountered. When that occurs, the number of parameters for the function is calculated by the difference in stack pointer from the beginning of the function to the end. The function is marked as analyzed, and any tasks blocking on the function are unblocked and may potentially continue analysis. The number of return values is determined by observing the lowest stack address written to within the function. This requires that all paths through a function be evaluated, but as the return value does not affect overall stack displacement, tasks blocking on the function may resume prior to determining the exact number of return values.
While this can potentially generate a large number of blocked tasks in the queue, so long as the script does not contain a loop that has no exit path or a function that calls itself recursively without a path that returns, this is guaranteed to ultimately locate a function that can return. While functions may have multiple paths from beginning to end, and all must eventually be traversed, we need only a single path to return to know the function's displacement, resulting in unblocking of tasks waiting on that function. This discovery process cascades, and analysis of all routes through the script eventually completes.
Second Pass
After the completion of the first pass, we now know all functions, code flows, and the number of parameters and return values for each function. The second pass performs several functions:
The code for this pass resembles a mini-interpreter (well, I suppose it is), including a stack filled with references to the variable objects at that location. A new variable object is created each time some manner of instruction pushes a variable on the stack. To make things simpler during processing and final code generation, variables are explicitly created and destroyed with dedicated instructions in the intermediate code, unlike the implicit generation/destruction in the bytecode.
Because the stack is necessary to determine what variables individual instructions reference, functions must, as in the first pass, be executed in control flow order, rather than simply evaluating each flow in the list in random order (as we do with functions). The stack at the end of each flow is saved until analysis of the function completes, so that successor flows evaluated later can pick up the stack where the previous flow left off (and one other reason I'll explain in a second).
Most of the details of the second phase relate not to the topic of this post, but to later ones. One detail worth mentioning, however, is that after all flows for a function have been evaluated, stack merging occurs. Because it's possible that a stack location might be created in two separate locations that together form a single variable (as in the example above), when multiple flows converge in a single successor flow, the stacks from each of the predecessor flows must be merged: for each offset in the stacks, the variables in each corresponding position are merged, eliminating all but one copy. Because the intermediate code representation is flexible, variable merging is trivial, and requires no modification of the intermediate code itself.
However, for this and other reasons, this compiler does not, as of now, support variable use of the stack: when the stack is extended to a variable length by a loop or some such, as mentioned earlier. There is no way to generate such a structure in NWScript (the scripting language), and manually-written bytecode files are anywhere between incredibly rare and non-existent, so we thought that was a reasonable omission.
One really cool thing about this system, however, is that the host program can be a hybrid of interpretation and compilation. E.g. the compilation of all scripts can be offloaded to a worker thread when a module is loaded to compile in the background, and the host program can use the interpreter for a script until compilation of that script finishes. The interpreter could also be used as a fall-back for scripts the compiler is unable to compile for one reason or another, such as those with evil stack usage.
Below is an example of a random block (a flow, to be precise) of bytecode and its corresponding intermediate code. Note that the addresses differ by 0xD because the bytecode block is generated by a disassembler that includes the file header in the address:
And lastly, perhaps getting ahead of myself, the same segment after running it through my variable optimization pass:
For those keeping score, that's 17 instructions in the original bytecode, 43 in the raw intermediate code (26 that are create or destroy, 9 that are assigns to temporary variables), and 19 in the optimized intermediate code (10 that are create or destroy; if that still sounds like too many, note that all of those variables will be converted to registers in LLVM).
So, now we know where we are (the NWScript system), and where we need to go (LLVM assembly). Now, how do we get there?
Well, a lot of things are simple enough, requiring only trivial use of LLVM and a parser. But after investigating the NWScript system for this purpose, I identified several areas that would be nontrivial and would require some thought and effort, which I'll discuss.
The first challenge - that is, a nontrivial design issue - was the stack machine. Stack machines are easy to interpret, but harder to convert to something like LLVM assembly that uses registers (like real assembly languages) and strongly declared and typed variables (unlike real assembly languages). Stack machines use a huge number of stack locations during execution, the vast majority of them (more than 96% of them, in one sample I looked at) temporary copies that last only until the next math (or other) instruction. From the opposite perspective, it's necessary that all accesses of a single variable be mapped to the same memory location, regardless of where the variable is, relative to the top of the stack. It's even possible that multiple separate locations in the code could create a stack location that represents only a single variable, despite being created in multiple places.
To illustrate these problems, a few examples:
// var1 * 0x1E
CPTOPSP FFFFFFEC, 0004
CONSTI 0000001E
MULII
// + var2
CPTOPSP FFFFFFEC, 0004
ADDII
// var3 =
CPDOWNSP FFFFFFF0, 0004
MOVSP FFFFFFFC
Here, we can see two stack locations are created that do not represent variables at all, but rather temporary parameters to the MUL instruction. A third temporary that also is not a real variable is created as the return value to the MUL instruction, which is then used as a parameter to the ADD instruction along with a fourth temporary copied from a variable. Finally, a fifth temporary is created for the return value from the ADD instruction, which is copied down to a result variable and destroyed. The C code for this would be
var3 = (var1 * 0x1E) + var2;
// if (3 == 0)
00000045 CONSTI 00000003
0000004B CONSTI 00000000
00000051 EQUALII
00000053 JZ off_00000065
// true: var = 1
00000059 CONSTI 00000001
0000005F JMP off_0000006B
// false: var = 2
00000065 CONSTI 00000002
0000006B
In this second case, we have an optimized variable initialization where a stack location is created in two separate locations (depending on whether a condition is true), but is in reality only a single variable. This is from a script that has int g_i = (3 == 0 ? 1 : 2);
It may also be possible to use the stack as, well, a stack, building up a variable-length list of things as they're processed in one loop, then popping them in a second loop. I can't think of a practical use for this off the top of my head (something that is both useful and couldn't be accomplished any other way), but it might be possible.
So, the stack system poses challenges related to identifying variables. However, a second, less obvious problem exists, although it's not as specific to stack machines. In this case, the problem relates to functions. In NWScript, a call consists of:
- Reserve a temporary slot on the stack for the return value
- Push the parameters in reverse order
- Call the function
- Copy and pop the return value
The solution I came up with to this problem (as well as other problems, which I'll talk about in the future), was a two-pass system that performs analysis and intermediate code generation (intermediate code is an expanded form which deals with variable references rather than the stack, making it much easier to work with; this intermediate code would then be converted to the output code type in a third pass). The two passes could have been done in a single pass, but this would have been quite messy and needlessly memory-hungry, as opposed to the more elegant and efficient two-pass system.
First Pass
The first pass through the script file performs a couple tasks. It identifies functions and code flow blocks: the largest groups of instructions that execute as an atomic unit and have only a single execution path from the first to the last instruction in the block. It also determines the number of parameters and return values for each function (in NWScript, functions only have a single return value, but when the bytecode is generated, structs and other complex types are flattened to primitive types, making it look like there are multiple return values).
This pass is quite simple. It proceeds by scanning through the code, beginning with the main function, parsing each instruction as it goes. The current stack pointer (but not an actual stack) is maintained from instruction to instruction. As well, a code flow graph is created by watching for branch instructions and creating new flow blocks and splitting old ones (if a branch jumps into the middle of an existing flow, that flow must be split, as it's not an atomic unit). Finally, entries for functions are created as they're encountered.
The biggest trick here is dealing with function calls. The effect of other operations on the stack can be trivially calculated by decoding the instruction, but we can't step past a call instruction until we know the stack displacement. The same is true for calls to API functions, but we already know exactly what they do to the stack, as those are a closed set.
What I decided to do was to use a task queue. When a new, previously unencountered function is encountered in a call, a new function entry and task queue entry corresponding to the function are created, and the current analysis task is marked as blocking on that function. Conditional branches are handled more or less the same way: a new task is created for one branch, though not blocking on anything, and analysis continues at the other branch. Whenever the current task blocks (e.g. it calls an unanalyzed function), analysis switches to the first task in the queue that is not blocking on anything, and execution continues from there.
Tasks ultimately complete when a return is encountered. When that occurs, the number of parameters for the function is calculated by the difference in stack pointer from the beginning of the function to the end. The function is marked as analyzed, and any tasks blocking on the function are unblocked and may potentially continue analysis. The number of return values is determined by observing the lowest stack address written to within the function. This requires that all paths through a function be evaluated, but as the return value does not affect overall stack displacement, tasks blocking on the function may resume prior to determining the exact number of return values.
While this can potentially generate a large number of blocked tasks in the queue, so long as the script does not contain a loop that has no exit path or a function that calls itself recursively without a path that returns, this is guaranteed to ultimately locate a function that can return. While functions may have multiple paths from beginning to end, and all must eventually be traversed, we need only a single path to return to know the function's displacement, resulting in unblocking of tasks waiting on that function. This discovery process cascades, and analysis of all routes through the script eventually completes.
Second Pass
After the completion of the first pass, we now know all functions, code flows, and the number of parameters and return values for each function. The second pass performs several functions:
- Determine the types and locations of local and global variables and create variable objects for them
- Determine the types of function parameters and return values
- Generate the intermediate code
The code for this pass resembles a mini-interpreter (well, I suppose it is), including a stack filled with references to the variable objects at that location. A new variable object is created each time some manner of instruction pushes a variable on the stack. To make things simpler during processing and final code generation, variables are explicitly created and destroyed with dedicated instructions in the intermediate code, unlike the implicit generation/destruction in the bytecode.
Because the stack is necessary to determine what variables individual instructions reference, functions must, as in the first pass, be executed in control flow order, rather than simply evaluating each flow in the list in random order (as we do with functions). The stack at the end of each flow is saved until analysis of the function completes, so that successor flows evaluated later can pick up the stack where the previous flow left off (and one other reason I'll explain in a second).
Most of the details of the second phase relate not to the topic of this post, but to later ones. One detail worth mentioning, however, is that after all flows for a function have been evaluated, stack merging occurs. Because it's possible that a stack location might be created in two separate locations that together form a single variable (as in the example above), when multiple flows converge in a single successor flow, the stacks from each of the predecessor flows must be merged: for each offset in the stacks, the variables in each corresponding position are merged, eliminating all but one copy. Because the intermediate code representation is flexible, variable merging is trivial, and requires no modification of the intermediate code itself.
However, for this and other reasons, this compiler does not, as of now, support variable use of the stack: when the stack is extended to a variable length by a loop or some such, as mentioned earlier. There is no way to generate such a structure in NWScript (the scripting language), and manually-written bytecode files are anywhere between incredibly rare and non-existent, so we thought that was a reasonable omission.
One really cool thing about this system, however, is that the host program can be a hybrid of interpretation and compilation. E.g. the compilation of all scripts can be offloaded to a worker thread when a module is loaded to compile in the background, and the host program can use the interpreter for a script until compilation of that script finishes. The interpreter could also be used as a fall-back for scripts the compiler is unable to compile for one reason or another, such as those with evil stack usage.
Below is an example of a random block (a flow, to be precise) of bytecode and its corresponding intermediate code. Note that the addresses differ by 0xD because the bytecode block is generated by a disassembler that includes the file header in the address:
00005282 ACTION GetTimeHour(0010), 00
00005287 CPTOPSP FFFFFF34, 0004
0000528F ACTION StringToInt(00E8), 01
00005294 ACTION GetTimeHour(0010), 00
00005299 SUBII
0000529B ADDII
0000529D CPDOWNSP FFFFFF00, 0004
000052A5 MOVSP FFFFFFFC
000052AB CONSTI 00000000
000052B1 CONSTI 00000000
000052B7 CONSTI 00000000
000052BD CPTOPSP FFFFFEF8, 0004
000052C5 ACTION SetTime(000C), 04
000052CA CPTOPSP FFFFFF04, 0004
000052D2 CONSTI 00000000
000052D8 LTII
000052DA JZ off_00005304
00005275: CREATE 01F36000 ( temp SSA int )
00005275: ACTION 0010 (0) 01F36000 ( temp SSA int )
0000527A: CREATE 01F36030 ( temp SSA string )
0000527A: ASSIGN 01F36030 ( temp SSA string ), 02102C10 ( string )
00005282: CREATE 01F36060 ( temp SSA int )
00005282: ACTION 00E8 (1) 01F36060 ( temp SSA int ), 01F36030 ( temp SSA string )
00005282: DELETE 01F36030 ( temp SSA string )
00005287: CREATE 01F36090 ( temp SSA int )
00005287: ACTION 0010 (0) 01F36090 ( temp SSA int )
0000528C: CREATE 01F360C0 ( temp SSA int )
0000528C: SUB 01F360C0 ( temp SSA int ), 01F36060 ( temp SSA int ), 01F36090 ( temp SSA int )
0000528C: DELETE 01F36090 ( temp SSA int )
0000528C: DELETE 01F36060 ( temp SSA int )
0000528E: CREATE 01F360F0 ( temp SSA int )
0000528E: ADD 01F360F0 ( temp SSA int ), 01F36000 ( temp SSA int ), 01F360C0 ( temp SSA int )
0000528E: DELETE 01F360C0 ( temp SSA int )
0000528E: DELETE 01F36000 ( temp SSA int )
00005290: ASSIGN 02102910 ( int ), 01F360F0 ( temp SSA int )
00005298: DELETE 01F360F0 ( temp SSA int )
0000529E: CREATE 01F36150 ( temp SSA int )
0000529E: ASSIGN 01F36150 ( temp SSA int ), 01F36120 ( const int )
000052A4: CREATE 01F361B0 ( temp SSA int )
000052A4: ASSIGN 01F361B0 ( temp SSA int ), 01F36180 ( const int )
000052AA: CREATE 01F36210 ( temp SSA int )
000052AA: ASSIGN 01F36210 ( temp SSA int ), 01F361E0 ( const int )
000052B0: CREATE 01F36240 ( temp SSA int )
000052B0: ASSIGN 01F36240 ( temp SSA int ), 02102910 ( int )
000052B8: ACTION 000C (4) 01F36240 ( temp SSA int ), 01F36210 ( temp SSA int ), 01F361B0 ( temp SSA int ), 01F36150 ( temp SSA int )
000052B8: DELETE 01F36240 ( temp SSA int )
000052B8: DELETE 01F36210 ( temp SSA int )
000052B8: DELETE 01F361B0 ( temp SSA int )
000052B8: DELETE 01F36150 ( temp SSA int )
000052BD: CREATE 01F36270 ( temp SSA int )
000052BD: ASSIGN 01F36270 ( temp SSA int ), 02102910 ( int )
000052C5: CREATE 01F362D0 ( temp SSA int )
000052C5: ASSIGN 01F362D0 ( temp SSA int ), 01F362A0 ( const int )
000052CB: CREATE 01F36300 ( temp SSA int )
000052CB: LT 01F36300 ( temp SSA int ), 01F36270 ( temp SSA int ), 01F362D0 ( temp SSA int )
000052CB: DELETE 01F362D0 ( temp SSA int )
000052CB: DELETE 01F36270 ( temp SSA int )
000052CD: TEST 01F36300 ( temp SSA int )
000052CD: DELETE 01F36300 ( temp SSA int )
000052CD: JZ 000052F7
And lastly, perhaps getting ahead of myself, the same segment after running it through my variable optimization pass:
00005275: CREATE 01F36000 ( temp SSA int )
00005275: ACTION 0010 (0) 01F36000 ( temp SSA int )
00005282: CREATE 01F36060 ( temp SSA int )
00005282: ACTION 00E8 (1) 01F36060 ( temp SSA int ), 02102C10 ( merged string )
00005287: CREATE 01F36090 ( temp SSA int )
00005287: ACTION 0010 (0) 01F36090 ( temp SSA int )
0000528C: CREATE 01F360C0 ( temp SSA int )
0000528C: SUB 01F360C0 ( temp SSA int ), 01F36060 ( temp SSA int ), 01F36090 ( temp SSA int )
0000528C: DELETE 01F36090 ( temp SSA int )
0000528C: DELETE 01F36060 ( temp SSA int )
0000528E: ADD 02102910 ( merged int ), 01F36000 ( temp SSA int ), 01F360C0 ( temp SSA int )
0000528E: DELETE 01F360C0 ( temp SSA int )
0000528E: DELETE 01F36000 ( temp SSA int )
000052B8: ACTION 000C (4) 02102910 ( merged int ), 01F361E0 ( merged const int ), 01F36180 ( merged const int ), 01F36120 ( merged const int )
000052CB: CREATE 01F36300 ( temp SSA int )
000052CB: LT 01F36300 ( temp SSA int ), 02102910 ( merged int ), 01F362A0 ( merged const int )
000052CD: TEST 01F36300 ( temp SSA int )
000052CD: DELETE 01F36300 ( temp SSA int )
000052CD: JZ 000052F7
For those keeping score, that's 17 instructions in the original bytecode, 43 in the raw intermediate code (26 that are create or destroy, 9 that are assigns to temporary variables), and 19 in the optimized intermediate code (10 that are create or destroy; if that still sounds like too many, note that all of those variables will be converted to registers in LLVM).
Monday, August 02, 2010
& NWScript
The attentive might have wondered why I bothered to spend a moderate amount of timewriting about a topic I'd already moved past, which is kind of uncharacteristic of me. As I said before, I had looked at LLVM for a couple days a couple weeks prior to the corresponding blog post, then moved on (I can't recall if there was anything in between LLVM and cache-oblivious algorithms).
Well, part of the reason I didn't spend too much time on LLVM was because while it's neat, I didn't have an immediate use for it. However, a few days before I wrote said post Skywing randomly came up to me and asked if I wanted to write a native code compiler for NWScript, the scripting language in NeverWinter Nights 1 and 2, for the client and server he's been writing. Suddenly I had an immediate and interesting use for LLVM.
NWScript (for more information see this) is a language based on C, and supports a subset of the features. It supports only a small number of data types: int, float, string, and a number of opaque types that are more or less handles to game structures accessed solely via APIs. It also supports user-defined structures, including the built-in type vector (consisting of 3 floats), though it does not support arrays, pointers, etc. All variables and functions are strongly typed, although some game objects - those referred to with handles, such as the "object" type - may not be fully described by their scripting type. NWScript has a special kind of callback pointer called an action (a term which, confusingly, has at least two additional meanings in NWScript), which is more or less a deferred function call that may be invoked in the future - specifically, by the engine - and consists of a function pointer and the set of arguments to pass to that function (a call to an action takes no parameters from the caller itself).
NWScript is compiled by the level editor into a bytecode that is interpreted by a virtual machine within the games. Like the Java and .NET VMs, the VM for NWScript is a stack system, where a strongly-typed stack is used for opcode and function parameters and return values. Parameters to assembly instructions, as with function calls, are pushed on the top of the stack, and instructions and function calls pop the parameters and, if there is one, push the result/return value onto the top of the stack. Data down the stack is copied to the top of the stack with load operations, and copied from the top of the stack to locations further down by store operations.
Unlike those VMs, however, NWScript uses a pure, integrated stack system, implementing global and local variables through the stack, as well. Global variables are accessed through instructions that load/store values at indices relative to the global variable pointer BP, which is an index into the stack. Normally, global variables are at the very bottom of the stack, pushed in a wrapper function called #globals before the script's main function is called; there are opcodes to alter BP, though this is not known to ever be used. Local variables are accessed by loads and stores relative to the top of the stack, just as with the x86 (ignoring the CPU registers). Unlike the x86, however, return addresses are not kept on the stack, but like Itanium are maintained in a separate, program-inaccessible stack.
Oh, and for those wondering, the Java and .NET VMs have separate mechanisms for global and local variables that do not use the stack. This is actually rather surprising - you would expect them to use the stack at least for local variables, as actual processor architectures do (when they run out of registers to assign).
The following show a couple random pieces of NWScript bytecode with some comments:
You might notice that Skywing has started a series of his own on the project, today. I actually wrote this post several weeks ago, but was waiting until the project was done to post about it. But now that it looks like he's gonna cover it, I need to start posting or I'll end up posting after him despite starting before him.
Well, part of the reason I didn't spend too much time on LLVM was because while it's neat, I didn't have an immediate use for it. However, a few days before I wrote said post Skywing randomly came up to me and asked if I wanted to write a native code compiler for NWScript, the scripting language in NeverWinter Nights 1 and 2, for the client and server he's been writing. Suddenly I had an immediate and interesting use for LLVM.
NWScript (for more information see this) is a language based on C, and supports a subset of the features. It supports only a small number of data types: int, float, string, and a number of opaque types that are more or less handles to game structures accessed solely via APIs. It also supports user-defined structures, including the built-in type vector (consisting of 3 floats), though it does not support arrays, pointers, etc. All variables and functions are strongly typed, although some game objects - those referred to with handles, such as the "object" type - may not be fully described by their scripting type. NWScript has a special kind of callback pointer called an action (a term which, confusingly, has at least two additional meanings in NWScript), which is more or less a deferred function call that may be invoked in the future - specifically, by the engine - and consists of a function pointer and the set of arguments to pass to that function (a call to an action takes no parameters from the caller itself).
NWScript is compiled by the level editor into a bytecode that is interpreted by a virtual machine within the games. Like the Java and .NET VMs, the VM for NWScript is a stack system, where a strongly-typed stack is used for opcode and function parameters and return values. Parameters to assembly instructions, as with function calls, are pushed on the top of the stack, and instructions and function calls pop the parameters and, if there is one, push the result/return value onto the top of the stack. Data down the stack is copied to the top of the stack with load operations, and copied from the top of the stack to locations further down by store operations.
Unlike those VMs, however, NWScript uses a pure, integrated stack system, implementing global and local variables through the stack, as well. Global variables are accessed through instructions that load/store values at indices relative to the global variable pointer BP, which is an index into the stack. Normally, global variables are at the very bottom of the stack, pushed in a wrapper function called #globals before the script's main function is called; there are opcodes to alter BP, though this is not known to ever be used. Local variables are accessed by loads and stores relative to the top of the stack, just as with the x86 (ignoring the CPU registers). Unlike the x86, however, return addresses are not kept on the stack, but like Itanium are maintained in a separate, program-inaccessible stack.
Oh, and for those wondering, the Java and .NET VMs have separate mechanisms for global and local variables that do not use the stack. This is actually rather surprising - you would expect them to use the stack at least for local variables, as actual processor architectures do (when they run out of registers to assign).
The following show a couple random pieces of NWScript bytecode with some comments:
; Allocate int and object local variables on the stackStack systems - especially this one, where everything is held on the stack - make for VMs that are very simple and easy to implement, and Merlin told me in the past that they're easy to compile to native code, as well. So it's not surprising that the developers used this architecture for NWScript.
0000007F 02 03 RSADDI
00000081 02 06 RSADDO
; Push parameter 0
00000083 04 06 00000000 CONSTO 00000000
; Call game API function
00000089 05 00 0360 01 ACTION GetControlledCharacter(0360), 01
; Copy return value to the object local variable
0000008E 01 01 FFFFFFF8 0004 CPDOWNSP FFFFFFF8, 0004
; Pop the return value
00000096 1B 00 FFFFFFFC MOVSP FFFFFFFC
; Allocate float on stack
00000015 02 04 RSADDF
; Incredibly stupid way of setting the float to 3.0f
00000017 04 04 40400000 CONSTF 3.000000
0000001D 01 01 FFFFFFF8 0004 CPDOWNSP FFFFFFF8, 0004
00000025 1B 00 FFFFFFFC MOVSP FFFFFFFC
; Assign the current stack frame to BP
0000002B 2A 00 SAVEBP
; Call main()
0000002D 1E 00 00000010 JSR fn_0000003D
; Restore BP to what it was before (not that it was anything)
00000033 2B 00 RESTOREBP
; Clean up the global variables
00000035 1B 00 FFFFFFFC MOVSP FFFFFFFC
0000003B 20 00 RETN
; Push last two parameters to FloatToString
0000043C 04 03 00000009 CONSTI 00000009
00000442 04 03 00000012 CONSTI 00000012
; Divide 5.0f by 2.0f
00000448 04 04 40A00000 CONSTF 5.000000
0000044E 04 04 40000000 CONSTF 2.000000
00000454 17 21 DIVFF
; Return value of DIVFF is the first parameter to FloatToString
00000456 05 00 0003 03 ACTION FloatToString(0003), 03
; Return value is second parameter to SendMessageToPC. Push first parameter.
0000045B 04 06 00000000 CONSTO 00000000
00000461 05 00 0176 02 ACTION SendMessageToPC(0176), 02
You might notice that Skywing has started a series of his own on the project, today. I actually wrote this post several weeks ago, but was waiting until the project was done to post about it. But now that it looks like he's gonna cover it, I need to start posting or I'll end up posting after him despite starting before him.
Subscribe to:
Posts (Atom)