Search This Blog

Sunday, December 31, 2006

Moshikashite Kore Sugoku Warui

Be afraid. Be very afraid. Q has just learned about Starfish. Even if it's old as dirt, this can't possibly be a good thing. Some of the stuff it generated:



















Oh, and for the curious, that phrase is from Azumanga Daioh, one of my favorite mangas. Naturally translated, it means "This could be really bad": もしかして [possibly] これ [this] 凄く [terribly] 悪い [is bad].

Memory Barriers 3 - The LibQ Way

Okay, so those are the constraints within which LibQ must work. This brings me to my first critical design decision regarding memory barriers: LibQ will use hand-coded assembly functions, and client programs will call these functions through pointers (well, the functions will indeed be called through pointers, but this is transparent to the client program), allowing the fastest option to be chosen at run time based on the system.

Okay, that was fairly easy. This second decision took a lot more thinking through to reach a conclusion: what to do with respect to atomic functions? There are at least three options that are fairly obvious. The first is to provide the weakest possible guarantee: that all atomic functions are not memory barriers. This ensures that users will use memory barriers manually, and even on architectures where atomic instructions are not memory barriers the correct behavior is achieved.

However, evaluation of this possibility suggests that it is not a good option. While non-barrier atomic operations will run at maximum speed, particularly on weak architectures like PowerPC, adding in additional memory barriers on architectures like the x86 quickly kills performance. Consider the worst case scenario: an atomic operation that must have a full sandwich barrier, on a Pentium 3 (which has no full memory barrier instruction, and so must use a serializing instruction) - perhaps 100 cycles for the LOCK, another maybe 150 cycles to flush buffer contents, and two CPUID instructions at about 65 cycles each; all together, that's 385 cycles. This could be reduced by about 1/3 by leaving out the CPUIDs, as LOCK instructions are full barriers on the Pentium 3.

Alternately, we could provide the strongest possible guarantee: that all atomic instructions are full sandwich barriers. This means that users don't need to worry about putting memory barriers around atomic instructions at all, and ensures barrier safety.

However, this also is not a particularly good solution for a cross-platform library like LibQ. This strategy is optimal for the Pentium 3, where that's exactly how atomic instructions work. On Pentium 4 it's only slightly worse, as two MFENCE (or LFENCE) instructions would only add only a couple dozen cycles to the atomic operation and memory flush. However, this could be bad on weaker architectures like PowerPC. Memory barrier instructions there are comparable to the cost of a CPUID, and at least one is unnecessary most of the time (not often do you need a full sandwich barrier around an atomic operation).

The third option is to make special versions of LibQ atomic functions for each variation. While this has the advantage of providing maximum speed on all architectures (and removing function call overhead wherever possible), it would be a huge pain in the ass for me and anyone else who writes all the functions. Let's think about this: 3 types of memory barriers, 3 barrier configurations, times how many atomic function? That's like 50 or 100 different functions for each platform. Mr. Flibbles doesn't much care for that.

So, what should I do? Fortunately, there's a variation on one of these (or perhaps a hybrid of two of them) that is both easy for me to code and efficient with respect to CPU cycles (not 100% optimal, but tolerably close): option number one. Atomic instructions are not guaranteed to be memory barriers of any kind. HOWEVER, there are special memory barrier functions used to promote atomic instructions to memory barriers.

This requires only 6 functions - 3 different types of barriers, 2 configurations (sandwich is just top + bottom barriers). Using this system, these functions may function as necessary, and be no-oped as necessary, to provide (almost) maximum speed. The only waste is the overhead of one or two functions calls; my benchmarks suggest that this is a minimal increase, not taking more than 10 cycles per function - 4 to 12% waste for two barriers, when neither are necessary.

Saturday, December 30, 2006

Another Exercise for the Reader

Something I'm currently coding in LibQ (*shock and awe*) is the capacity for timeouts in waits on a condition variable, a feature that was left out of LibQ originally, because of an insurmountable race condition which would be created by the way LibQ emulated condition variables on Windows. This challenge is very succinct: how is it possible to create a fast (remember the definition of fast, with regard to synchronization objects) condition variable that supports timeout, and what constraint(s) would exist on the semantics of this condition variable? Ready, set, go!

Oh, one more thing: if you find that challenge too easy, you can try to do it without the use of atomic operations (the way LibQ does it).

Wednesday, December 13, 2006

Memory Barriers 2

Subtitle: Speeding Up, Slowing Up, Maybe Even Blowing Up

It has always been my intention to fully support memory barriers in LibQ. However, as LibQ is mainly something I work on when I'm feeling inspired, it shouldn't be too surprising that some features take a while to get implemented (especially at the times when I don't feel like coding anything for a few months). But while lack of interest on my part may have been the biggest factor, there was also the fact that I hadn't, until recently, decided on the semantics of LibQ memory barriers.

Stand-alone memory barriers (memory barriers in the middle of a sequence of normal code) are (mostly) straightforward, both in concept and in implementation. However, the exact method of achieving a memory barrier differs drastically by architecture. For example, the x86 guarantees that stores will be performed in order. As such, write barriers are not needed at all (or you could use an SFENCE instruction, if you want to play it safe and assume that this guarantee of write ordering won't be around forever). However, the x86 has no read barrier or full barrier instructions prior to the Pentium 4 (LFENCE and MFENCE instructions, respectively); the conventional wisdom has been to use a serializing instruction whenever these functions are needed (CPUID seems like a fairly good candidate on Pentium 4, while LOCK CMPXCHG seems faster on other CPUs).

The Alpha deserves special mention, as it does something that, as best I and the Linux people know, is truly novel (thank God). For reasons that appear to be a consequence of the split cache model (having multiple cache banks per core, which may be locked separately to avoid memory stalls), the Alpha does not have true memory barriers. While its memory barriers will order instructions for that CPU, there are no guarantees that the access order will hold on other processors. Amazingly (and not in a good way), this even applies to pointer dereferences; that is, if you set a shared pointer after initializing the structure, it's possible that another CPU could get the new pointer, yet pull uninitialized data through the pointer. This means that not only must you have a write barrier between two writes if they must appear in order, but you must also have a read barrier between the reads of those two variables; the read barrier will ensure that the caches will be brought into sync on the reading CPU.

Moving tangentially, different architectures have different semantics for their atomic functions. On a Pentium 3 (and other Intel CPUs before Pentium 4), LOCKed atomic operations (required to be truly atomic with multiple cores and/or CPUs) are synchronizing instructions (effectively full barriers). However, in Pentium 4 they have been demoted to simply being write barriers. As best as I can tell, on the PowerPC atomic instructions aren't memory barriers at all. Nor are atomic x86 instructions that do not use LOCK (not using LOCK is much faster, but only atomic with respect to a single core/CPU).

This is made even more complicated by the fact that there are three possible configurations for a memory barrier to be in, with respect to an atomic operation. In the acquire configuration (think of acquiring a mutex), the memory barrier appears to be just after the atomic operation - the atomic operation can be executed earlier than where it appears, but no I/O after it (inside the mutex) may be executed before it. In the release configuration, the barrier appears to be just before the atomic instruction - I/O after the atomic operation may be executed before the atomic operation, but the atomic operation cannot occur before any I/O in front of it (inside the mutex). Finally, there's the sandwich configuration, where instructions are serialized around the atomic operation, as if there was one barrier before the operation, and one after - the atomic operation cannot be executed before any preceding I/O, and no subsequent I/O may be executed before the atomic operation.

Saturday, December 09, 2006

Exercise for the Reader

So, in one class I'm writing implementations of the dining philosophers problem (note that a variation uses chopsticks and rice) using both semaphores (it is possible to prevent deadlock if you're careful) and mutex/conditions. Running tests on it, I noticed something odd - a semi-deadlock state in the semaphore version; that is, a single philosopher was taking his time eating, while none of the others could eat (they couldn't get a hold of a pair of chopsticks). It took me a little bit to figure out what was going on (because I knew that if the teacher saw that he'd ask me if it was broken).

So, here are the questions for you:
1. How do you implement a deadlock-free dining philosophers system using semaphores?
2. How could a single philosopher block all four others all by himself?

Tuesday, December 05, 2006

Memory Barriers

Well, I felt like writing something; however, this post might not turn out that great, as I've got a lot of material to cover, which I may or may not be able to elegantly separate into multiple topics (as it's too much to cover in one post).

Conceptually, computers execute instructions in single file. An instruction and its parameters (if applicable) are read from memory, the instruction is executed, and the results are written back to memory (if applicable). For some CPUs (that's all I can say, due to my indiverse knowledge of various CPUs), such as the x86, that's exactly how they originally worked. However, the x86 has gotten dramatically more complex over the last ten years or so. The 486 (I'm giving these milestones to the best of my memory, but I can't say for certain they're all correct) added support for multiple x86 microprocessors to run in parallel in the same system. With the Pentium, a second arithmetic logic unit was added, allowing the processor to execute two instructions in parallel. The Pentium Pro added support for speculative memory reads (prefetching of data used by instructions not yet executed), and added a memory write buffer that stores memory writes before they even make it to the processor's internal cache. The Pentium 4 allows two threads to be executed on a single processor, by means of instruction stream interleaving. With the Pentium D, CPUs began to contain two cores in a single chip, and Intel is about to launch a Core 2 CPU containing four cores on a single chip.

What all this comes down to is that it is no longer possible for a programmer to know the exact order instructions will be executed in a program. Thanks to speculative reads and the uncertainty of exactly what is in the cache at any point, execution is nondeterministic, and it is even impossible for a nerd with a calculator and an x86 optimization manual to calculate exactly what order a set of assembly language instructions would be executed in (at least not in the general case; in highly serialized code it might be possible). Moreover, even implementations of x86, such as the Pentium 4, Core 2, and Athlon 64, differ in implementation details, such as execution time of particular instructions.

However, as the processor (actually core) is always self-consistent, this is normally completely transparent to the programmer. The result of a calculation will always be deterministic, and strictly dictated by instruction order, even if the actual order of events inside the processor to arrive at the end result differs wildly. The only time when a programmer must be concerned with such details is when processor self-consistency is not sufficient - that is, when they are writing a program that much synchronize execution with something outside the core, such as a piece of hardware, another chip or processor on the motherboard, or even another core. While largely irrelevant for everyone else, writers of hardware-interface code (drivers) and of core operating system parts must be able to ensure that the internal state of the processor remains consistent with the world outside the processor. At least, those that don't work for Creative Labs.

There are many ways of accomplishing different aspects of this requirement, and the methods often vary by processor. Main memory is one of the things most commonly shared by the the processor and other hardware, so it is necessary that the hardware hear exactly what the processor is trying to tell it. The x86 processor orders its memory accesses relatively strongly. The Pentium 4 guarantees that writes will be performed in the order they appear in the program; however, writes may be buffered before being committed to the processor's cache, and there may be further delays before the data is written to main memory. Furthermore, it makes no guarantees about the order of reads from memory (and, remember, reads can be performed even before the instruction to perform the read is executed). It could be disastrous for another processor (including processors on hardware devices) to mix in its own data with what one processor is writing, or read data that one processor is still in the process of writing.

Memory barriers (also called fences) are used to prevent this. They instruct the processor to create a memory bottleneck at the memory barrier instruction, which some class of memory accesses may not cross. A read barrier in between two reads will ensure that the second read (or any later reads) may not be executed before the first; the same goes for write barriers and full barriers.

Serializing instructions take this one step further, not only guaranteeing the order of memory access with relation to the serializing instructions, but also preventing ALL execution of subsequent instructions until the preceding instructions have entirely finished executing and the results have been written to memory. This is something you want to avoid whenever possible, as it's a major performance killer, with the potential to soak up hundreds of cycles in dead time (although, to be fair, fences also a performance hazard, though not as large of one, as other instructions and some types of memory accesses may still execute across the fence).

Friday, November 17, 2006

Interrim

While you're waiting for the my next big post (whenever that'll be, although I do have one I'm "working on"), you need to go watch Death Note. Really. Even if you don't like anime. No ecchi (breasts, panties, etc.), no giant robots, no DBZish fighting scenes, no ridiculous comedy, no shooting machine guns out of Ferraris at 150 MPH; just pure brainpower and sometimes adrenaline-pumping suspense. Unless the quality drops off steeply before the end of the series (less than one third of the series has aired so far), it'll definitely make my list of best animes ever.

Go! Watch! Now!

Oh yeah, and the manga (graphic novel) version is out in English (well, about 2/3 of it, so far), if you prefer to read (Amazon carries it, and local book stores probably do, as well). I'll probably buy that around Christmas.

Thursday, November 09, 2006

& Bad Teacher Tricks

Warning: Rant ahead.

So, I just got my grade on my first test in operating systems class, and it was a three-fold disappointment (for three different reasons). The score itself was a 69% - an abysmal score for somebody whose straight-A record wasn't broken until the third semester of college. However, next to the score was a letter grade - an A. That's right, an A. To add insult to double-injury, the teacher showed the grade distribution. Out of the whole class, there were two As. That means that my grade of 69 was either the best or the second best grade in the entire class.

It shouldn't take much effort at all to realize that something is very, very wrong with this situation. How could a 69% possibly be even the second highest score in the class? There appears to be a certain mentality to it - the teacher seems to believe that curving the scores makes up for abysmal test writing.

Another example of a test I had with this teacher was a problem involving B+ trees, using variable length name fields. I can't remember what the exact problem was, but after looking at it for a couple minutes I called the teacher over to ask about it. The conversation went something like this:
Me: Is it just me, or is it theoretically impossible to solve this problem?
Teacher: It's possible
Me: How? This buffer is too small to store this much data
Teacher: Just assume that the average length will be less than [some number I don't remember] [note that this could - in theory - work because if the average name length was small enough it would fit]
Me: That assumption isn't realistic at all, and is never suggested anywhere in the problem. In fact the problem kind of suggests that it wouldn't be true...

A similar incident (actually more than one) occurred on this test, only it was after the test had already been graded. He had built an implementation-specific algorithm assumption into the answer he counted as correct (without ever mentioning this assumption in the problem). If, like I did, you gave a more general answer that would be true regardless of implementation, you got a 0. His excuse for this was that the typical implementation allows this assumption, never mind the fact that we also discussed in class an alternate implementation which violates this assumption. This annoyed me all the more because I got a later problem wrong for making an "invalid" assumption about the problem, based on the fact that that's typically how it's done in practice.

Then there are the enjoyable classes where the teacher thinks that they can get away with bad tests without curving, simply by making homework weigh more in the class total, relying on the homework grade to bring up the overall scores to acceptable levels.

Let me state it very clearly: if your first or second highest score in the class is a 69%, there is something wrong with your tests. If you consider the ability to ask the teacher questions to be a good alternative to making your questions clear, there is something wrong with your tests. If you have to make non-test things have a very high weight to compensate for very low test scores, there is something wrong with your tests.

Wednesday, November 08, 2006

& Politics

So, after that last post, a lot of people (or at least as many as read this blog) might be wondering what my take on politics and differences in opinion are; how I can believe that people (on occasion) are totally wrong, without believing that there's something wrong with them. From my experience on political blogs and debates, ranging from far left to moderately far right, it seems to me that people like to pick up a single vice (or a couple) that explains the opposite opinion and run with it.

My take, however, is quite a bit different. While I certainly have run across people so caustic and annoying that I wish I could smash their skulls in with a ratchet, I believe that most of the time differences in political views are due to two specific factors:

1. Differences in assumptions. Assumptions usually arise when something cannot be proved to a satisfactory degree one way or another (or the person is simply is too lazy to learn about the evidence). Lots of examples of this in politics, because you usually can't tell a person's motive for certain; so some assume it's bad, some assume it's good.

Other times the cause of assumptions is more complex. People create a network of assumptions and conclusions based on evidence over their lives, which can color their input of new data, leading to new assumptions that would not have been made by a more objective analysis of the evidence. I wouldn't call this a vice per-se, as it's something everybody does, and is more based on chance (what assumptions the person already has) than being a good or bad person.

In either case, assumptions powerfully affect our logic. As assumptions are believed to be true, they are counted as facts in the reasoning process. Thus, two people might look at exactly the same evidence, and employ the same logic, yet come to two difference conclusions simply because they hold differing assumptions.

This reminds me of a comment I made on Juan Cole's blog. He couldn't understand why people persisted in taking the president of Iran's comment about "Israel's occupation of Palestine must fade from the pages of time" (his translation) "out of context", and thinking it meant Israel should be annihilated. He believed that the difference in perception was due to people reading too much into the "deceptive" "wiped off the map" translation. I pointed out that even his translation sounds threatening to anybody who doesn't, like him, hold the assumption that Iran's intentions are honorable (especially considering that the alternate assumption is usually that Iran has sinister intentions); a misunderstanding was not required to come to the opposite conclusion. His assumption that Iran's intentions were honorable prevented him from understanding the opposition, which were based on the opposite assumption.

2. Differing priorities. At least when I'm not emotionally involved (it's quite difficult for anyone to think totally rationally when you're emotionally involved), I always consider choices to be the evaluation of benefits and drawbacks. I actually consider it to be a weighted average - a math problem. Each consideration in making the choice is a variable, and each has an assigned weight. Different people weigh different things differently. Two people who evaluate exactly the same benefits and drawbacks may come to different conclusions simply because they have weighed one or more variables differently.

The example that most readily comes to mind is abortion. Just to list a few of the benefits (actually, they more take the form of arguments in favor of, but you should be able to get an idea of how you would construct a weighted average based on these):
- Difficult or impossible to get kids to always abstain or have "safe" sex
- The cost of making a girl have the baby even if she was irresponsible is too great
- The girl may have been raped, and not have been able to choose at all
On the drawbacks list:
- Killing of innocents is wrong
- Allowing abortion on demand encourages irresponsible behavior
- Blame rests with the parents (particularly the mother, for the purposes of this argument), not the child, and so we should not allow the blame and punishment to be transferred to the child (in the form of the abortion)

This is by no means an exhaustive list. I'm just giving you some ideas of the types of variables that appear in the evaluation of abortion. You alter the weight of one or more of those, and you can easily change the conclusion.

This is one thing that I do respect Neo-neocon for. Although I don't always agree with her conclusions (and, at varying times, her assumptions and logic), I do respect the way that she often attempts to come up with explanations for the other side that aren't simply assuming vices. Even if her analyses aren't always right, she makes a good effort (far above the average on political sites), and that requires respect.

& Partisan Politics and Logic 101

I love this class of "logic" employed heavily in... well, just about anywhere where there's an "us" vs. "them" mentality, particularly in politics:
Fact: Person X [on the other side] did Y
Assertion (assumed to be true without proof): Person X did Y for reason/objective Z [invariably bad]
Fact: There's no way Y could accomplish Z
Conclusion: Person X is bad, and an idiot, too

Uh huh. Maybe when you can come up with a reasonable explanation for your opponent's behavior other than a set of vices (greed, stupidity, cluelessness, just plain being evil, etc.) I'll take your politics seriously (note the subtle implication that I take few people's politics seriously).

Tuesday, November 07, 2006

Election Day Jitters?

Something is very unusual at Slashdot today. As of my writing of this post, 23 of the 24 articles featured on the main page have been tagged "itsatrap", and seven "itsnotatrap". Usually only articles that say that MS is doing something good (as rare as that is) get "itsatrap"; and I've never seen "itsnotatrap" before.

Saturday, November 04, 2006

Creepy

So, that's one more thing I can scratch off the list of "things I need to see/do before I die". In a debate on another blog I ran across a guy that appears to be part of an atheist cult. By this I am using the term "cult" to refer to any group that dogmatically defines themselves such that they are the few, the proud, the enlightened, and everybody else that doesn't strictly adhere to their every word isn't a true believer.

As hard as that is to imagine, it actually happened. This guy refers to his as "true atheism", and "insults" others that don't adhere to his exquisite wording (which strangely tends to contradict the definitions found in dictionaries...) by saying they're not "true atheists". Is there such a thing as a false atheist? Is that like a false Aryan or false messiah? Do false atheists go to hell?

Excuse me while I go cry in the corner.

Wednesday, November 01, 2006

Late Night Psychosis

So, I'm sitting here thinking about Japanese and Korean, and a random thought strikes me. It's been hypothesized that Korean and Japanese are sister languages (though the divergence is fairly old). From what I've seen, I can imagine that. The basic mechanics of the two show similarity, although I've also seen some differences.

Korean has 14 consonants, roughly corresponding to h, n, k/g, l/r, b/p, m, ng, s, d/t, ch/j, kh, t, cha, p. It has 8 vowels, corresponding to ah, ou, oh, oh (don't ask me the difference, I don't know), oo, ee, ea, e. Note that there are additional characters (28 total, IIRC), but they appear to not be discrete sounds. Korean syllables can be very complex, like ours in English, having as many as 3 or 4 consonants and a couple vowels in a single syllable.

Now, Japanese is unusual (at least for us westerners) for having a remarkably simple syllable structure. A syllable is formed one of three ways: a vowel alone, a consonant followed by a vowel, and a consonant followed by two vowels (a diphthong). There are 16 consonants, roughly corresponding to k, s, t, n, m, y, r/l (something of a cross between those two; can also sound similar to 'd'), w, g, z, d, b, p, ch, j, sh; n is the only consonant that need not be followed by a vowel (although the vowel can sometimes be slurred/silent in some syllables). There are 5 vowels, which sound something like ah, eh, ee, oh, oo.

Native Japanese speakers have an inherent deficiency in the ability to pronounce more complex syllables. This is due to the fact that the brain develops based on the language spoken in very early childhood. Japanese children use very simple syllables (due to the nature of the Japanese language), and eventually their brains mature, and their old tongues lose the potential to learn new tricks (thus forming the stereotypical Japanese accent; actually, this is what forms basically all accents). You could call this a form of epigenetic inheritance (although that's not exactly what that term is typically used for) - in this case it's something which is inherited by culture, rather than by biology.

Now, a bit of evolutionary biology. The founder effect is a process of evolution where a small number of animals (I'm being general here; don't send me hate speech allegation) separate from the population and move to a new area where there isn't free exchange (breeding) with the original population. They thus form a new population that can evolve separately from the original.

My hypothesis is that this is what happened between Korean and Japanese (actually, it could also have been due to a bottleneck effect); but there's a twist. I think that the population where Japanese arose was in fact founded by members that broke off from the previously common population. Now, what if one of these founders had an unnatural (specifically, nongenetic) speech defect? A defect either in the language center of the brain, or the physical components of the voice system, such that they were unable to pronounce the more complex syllables? In theory, if this was taught to the children, it could produce the drastic simplification in syllable form seen in Japanese today.

Now that I think about it, there was actually a movie that had this idea in it - an isolated mother with an unnatural speech defect gave birth to two daughters and raised them alone. The daughters "inherited" the same slurred speech, even though they did not have the same physical defect.

Sunday, October 29, 2006

Dictionary.com FTL

No results found for user's guide.

Did you mean sea squirt?

Suggestions:
sea squirt
Esguard
Escouade
ice skate
ice-skate
upsurged
isosorbide
Arsacid
Esocidae
Szeged
escudo
Enschede
Assized
assuaged
exiguity
assessed
overskirt
I find my life is becoming increasingly surreal...

UPDATE: Dark_Brood pointed out that M-W also fails, although not in quite as humorous of a manner:
The word you've entered isn't in the dictionary. Click on a spelling suggestion below or try again using the search box to the right.

Suggestions for user's guide:
  1. assurors
2. as regards
3. assorted
4. assessors
5. exorcised
6. exorcized
7. accessorized
8. ogresses
9. ozonized
10. isomerized

Screen Shots

Just got these two within a couple minutes of each other.


Note the Google targeted ad (text) directly to the right of "Compose Mail".


...oops.

Friday, October 27, 2006

& Random Topic of the Day #2

The conversation with Dark_Brood continued on other topics, and eventually made it to the anime I'm currently watching: Fate/Stay Night. Amusing, with an interesting take on some mythology, if somewhat cliched (including the typical anime inconsistency). Watching this brought up a serious discussion about Japan. Now, we both realize that anime isn't real (heck, this anime has a female King Arthur, Heracles, Medea, and Gilgamesh all in present day Japan); but we both believe that consistent themes in anime writing can be used to deduce the perspectives of the writers and the target audience.

FusionReactorII says (5:13 PM):
This is not the most politically correct anime
FusionReactorII says (5:14 PM):
There seems to be a "good" progression of Arthur [that's King Arthur; a female, in this anime] to a more traditional female, over the course of the series
FusionReactorII says (5:15 PM):
Seems like it's making a few political/social statements
FusionReactorII says (5:15 PM):
Though I suppose it IS anime. And Japan is such a weird mix of sexual liberalism and conservativism :P
FusionReactorII says (5:16 PM):
In some ways they're much more liberal about sexuality than in the US, in other ways much more conservative
dark_brood says (5:17 PM):
They're much more liberal, except that males are like a superior race to females.
FusionReactorII says (5:17 PM):
Heh
dark_brood says (5:18 PM):
A man can pretty much do what he wants (sexually) a woman, uh oh.
FusionReactorII says (5:20 PM):
Well, to some extent. Women may be more free to do what they want sexually in Japan, but it seems like traditional gender roles are stronger than in the US. Stuff that I see in anime just looks old fashioned to me, with regards to gender roles
FusionReactorII says (5:20 PM):
Like how people might have thought like 50 years ago in the US :P
FusionReactorII says (5:22 PM):
Then again, there are a large number of people in more rural areas int he US, which tend to be more conservative, and with which I'm not particularly familiar with
FusionReactorII says (5:22 PM):
Maybe I'm just more acquainted with the more liberal parts of the US :P
dark_brood says (5:23 PM):
I was partly also referring to the even bigger underreportedness (that's not a word I think) of rape in Japan than most other countries.
FusionReactorII says (5:25 PM):
Heh. Did you see Melancholy?
dark_brood says (5:25 PM):
Ye
dark_brood says (5:25 PM):
Not all but alot
FusionReactorII says (5:25 PM):
I was just remembering the one girl saying "If I'm ruined for marriage, will you take me?" When I saw that I was thinking "What... the heck..?"
dark_brood says (5:29 PM):
"In a United States study of women college students, Koss et al. (1988) found that about 21 per cent of stranger rapes were reported and only 2 per cent of acquaintance rapes were reported." wow
FusionReactorII says (5:30 PM):
Wow
dark_brood says (5:31 PM):
I'm not quite clear if that study was done in Japan or in the US, could it be possible that it was done by the US in Japan?
dark_brood says (5:31 PM):
It's from a paper on rape in japan
FusionReactorII says (5:32 PM):
Huh
dark_brood says (5:33 PM):
"This notion implies that acquaintances cannot be true rapists. In Japan, this idea is reinforced by the way police handle rape. They tend to accept only rape reports that resemble ‘classic rapes’, sexual intercourse with physical force, committed by a stranger in a secluded public place at night."
FusionReactorII says (5:34 PM):
Yeah, I've definitely seen indication of that in anime
dark_brood says (5:34 PM):
"In the United States the police frequently rule these type cases as ‘unfounded’." (the opposite)
FusionReactorII says (5:35 PM):
In fact I'd noticed that before. Like you stick a guy and a girl in a room alone [in anime], and everyone assumes they're just GOING to have sex, regardless of the relationship between them
FusionReactorII says (5:35 PM):
Seems archaic, to me
dark_brood says (5:35 PM):
Ye

& Random Topic of the Day #1

So, today I talked to Dark_Brood for the first time in a few months. For those not familiar with him, he's a moderately long-time friend who is into both computer programming (what he's majoring in) and biology, and we've had many interesting discussions about both over the years (although I really haven't talked to him much in the last one or two years, since he started college).

I can't really remember how this particular topic got started, but I started telling him about a very interesting recent Nova episode I saw a couple days ago. As luck would have it, there's an online transcript of that very episode, as well as part two (which I didn't see, and I read the transcript during the conversation). You should go read them. The beginning of the first episode should serve as a decent teaser:
1500 years ago something extreme happened to the world's climate-something that must have terrified those who witnessed it.

The sun began to go dark.

Rain poured red, as if tinted by blood.

Clouds of dust enveloped the earth.

Cold gripped the land for two years.

Then came drought,

Famine,

Plague,

Death.

Whole cities were wiped out - civilisations crumbled.

There is evidence of a catastrophe-a catastrophe whose consequences affected the entire world-and may have changed the course of human history.

Thursday, October 26, 2006

& The Irrelevant Name

As a very minor portion of the grant proposal, we needed to come up with a name for our "company". As we would likely be making our game for teaching Japanese, I knew just what to use. I never got around to writing one post on the topic, but you might have noticed that there's a general tendency for English (and probably other non-Asian language speakers) to think that anything written in Chinese/Japanese is cool, regardless of what it actually says (this works the other way around, too; in Japan, anything written in English is automatically cool, regardless of what it says or even whether it makes sense. This is taken to a grotesque extreme in Madlax, where the bad guy was named Monday Friday). To make a play on this, I decided to have the name be Japanese for "irrelevant name". I decided to try translating it myself, having the Japanese speaker in our group verify it was correct (which turned out to be a good thing, as the first try was rather disastrous).

The first e-mail I sent to him (part of a larger e-mail, actually). I'm annotating all of these here for those who doesn't know Japanese:
Incidentally, does 不関連名 [fukanrenmei - this is supposed to be a Chinese-style compound, meaning something like "no relation name"] sound okay, or does it sound like something that someone who knows only a little Japanese would write? :P

Masaya:
Well, we do not have a word 不関, but it sounds to me like "no relation".
and by 連名 you mean something like "joint names"?
不関連名 does not really make sense to me.
What are you trying to say?
Maybe I can translate into Japanese Kanji if you give me English expression.

Me:
That bad, huh? Well, what I was trying to say was "irrelevant name". That ended up being harder to translate than I expected, due to my rudimentary knowledge of Japanese. I was using 関連 [kanren] for "relevant" (although I also saw 関係 [kankei]), combined with 不 [fu] for "irrelevant" (though Google's translator prefers the prefix 無 [mu]), and 名 [na when used alone, mei when in a compound] for "name" (also saw 名前 [namae]). I wasn't really sure how the grammar for this would go. The one I sent you was just shoving it all together into a single compound word (I was thinking like 灼眼 [shakugan - "burning/eyes"] - from an anime name
- or 聖剣 [seiken - "holy/sword"] - from a game name) - fukenrenmei. Some other constructions I considered (and didn't know which, if any, would be correct):
- 不関連な名 or 不関連の名 ["fukanren na na" and "fukanren no na" - something like "name of no relation"] (I'm thinking な is correct in this instance, but I'm not sure [na is used for abstract modifiers, no for concrete])
- 関連がない名 [kanren ga nai na - something like "name for which there isn't a relation"] (clausal form)

Are any of those correct?

Masaya:
Now I see what you mean.

"relevant" means 関連 [kanren] or 関係 [kankei] like you said.
I think 無 [mu] is better for 関連 and 関係 than 不 [fu], as we have words 無関係 and 無関連.
名 [na] for "name" is also correct, and if you want to say "irrelevant name", 無関連な名 [mukanren na na] makes more sense to me than 無関連名 [mukanrenmei].

What you wrote was all correct.
関連がない名 [kanren ga nai na] means the same as 無(不)関連な名 [mukanren na na].

It is difficult even for me to combine something that means "irrelevant" and
something that means "name" without HIRAGANA.
無関連な名 [mukanren na na] is easier to understand than 無関連名 [mukanrenmai], but a word without HIRAGANA actually looks and sounds better.

So I guess we'll go with either 無関連名 [mukanrenmei] or 無関連な名 [mukanren na na].

Tuesday, October 24, 2006

& Las Vegas

So, we (me, my dad, and his parents) just got back from two days at Las Vegas. The first night (when we arrived) we hit the casinos. I suspect my grandparents and my dad's aunt (who met us there) together sunk at least a grand into those slots (I guess you could call my grandparents the Idle Upper-Middle Class). I budgeted $120 for the slots (actually I had planned less than that, but my grandma gave me some extra cash at the beginning of the night - she tends to do that). After 15 or 20 minutes, I decided that was enough for me, and left with $330 cash.

The next day me and my dad went out (together) to do our own stuff. This amounted to burning rubber in a Corvette, driving a Hummer up a 16" curb, on a 45 degree incline (the incline going the width of the Hummer), and other off-road type stunts, went indoor skydiving (can you say gigantic wind turbine?), and got nicely bruised up while going 30+ MPH off sand dunes, over rocks and bushes, and through turns (apparently breaking when turning or running over some change in elevation is a foreign concept to the dune buggy world); I'd probably have cracked my skull after hitting my head on the "roof" of the buggy so many times, were it not for the nicely padded helmet (we lost two bottles of water on that ride, one of which I was holding in my lap at the time, while my dad took a turn driving).

But today was perhaps the most eventful. After sleeping in later than we had planned, we got to the car to find that it had been mistaken for a cave wall. From the hood going counter-clockwise, the following words had been inscribed on it:
Rapist
Murder
Eddie
Terry (note that we're not sure whether the first letter was a T or an F, or whether the last letter was a Y or an X)
Idaho
Boners

So, after 4 1/2 hours of talking to people from the hotel security department (actually two departments from two different hotels, which shared the same parking garage), the local branch of the car rental place, and the police (amusingly, we're apparently the second car to have the exact same thing written in the same garage in the last week) we hit the road. Unfortunately, it was only about half an hour before we also hit a road crew repaving I-15. So we ended up spending two full hours to get 5-10 miles (I'm just guessing; we forgot to look at the odometer at the beginning).

While we were sitting around almost unmoving, we took some time to look at the cars and other vehicles around us. Of unusual interest was one Jeep SUV riding on a car transport carrier. Over the more than half an hour we were able to look at it, we found it had a number of very odd features. It had a large assembly mounted right above the windshield, which we believed to be a light array. It had a large unit with exhaust pipes that appeared to be an external air conditioning unit on top, right at the back of the vehicle. It also had a number of odd color-coded ports on the side and back, by the gas cap and rear light.

Fortunately, it had two distinctive markings: the name Axion Racing, and the number 23. So, when we got back, my dad went searching online for information about this peculiar vehicle. As it turns out, the vehicle's name is Spirit; it's an entirely autonomous robotic vehicle that participated in the 2004 and 2005 DARPA grand challenge, and earlier this year became the first autonomous robotic vehicle to make it to the top of Pikes Peak (it appears that the "light" assembly extends; it wasn't near that far forward when we saw it). Sorry we didn't take pictures (despite having a digital camera and plenty of time); we didn't realize it was a celebrity at the time :P

Wednesday, October 18, 2006

Multiplicative vs. Additive

I don't recall if I ever used the terms on here, but in a recent e-mail I sent to my grandpa (a professional linguistic) telling him about Caia, I mentioned that I specifically wanted it to have additive complexity rather than multiplicative. I cited a couple example languages that fall into these categories (the same ones I'm going to talk about here), but I didn't really explain what I meant by them. His response did not particularly indicate or suggest whether he understood what I meant.

My knowledge of world languages is nowhere near sufficient to know whether there exist any purely additive or purely multiplicative natural languages, so I'll use some particular instances from two languages as examples. Latin is kind of the classic multiplicative-complexity language (although some parts of it have additive complexity); we are going to talk about the Latin close demonstrative pronoun ("this/these" in English).

As I mentioned quite some time ago, Latin inflects nouns based on gender (masculine, feminine, and neuter), number (singular and plural), and case (nominative, genitive, dative, accusative, and ablative). This results in their being 30 different "this/these" pronouns (!), as shown (I'm gonna do my best to make this look okay without FrontPage - what I usually use when doing complex formatting):
        Nom     Gen     Dat     Acc     Abl
Singular
Masc. hic huius huic hunc hoc
Fem. haec huius huic hanc hac
Neut. hoe huius huic hoc hoc
Plural
Masc. hi horum his hos his
Fem. hae harum his has his
Neut. haec horum his haec his

Ugh. That's disgusting. As you can see, there is some degree of regularity, but enough exceptions that you really need to memorize about 18 of the 30. This is what I mean by multiplicative: the number of different forms of a given thing that must be memorized are roughly equal to the multiplicative total of all the different ways in which the thing can be inflected (3 * 2 * 5 - gender, number, case - in this example).

English is a bit better than Latin, although that's partly due to the fact that, compared to Latin, English inflects very few of its words, and those it does inflect have few variations. Japanese, however, has an example of additive complexity that's just beautiful - its demonstrative and interrogative pronoun system. Words are inflected by class (specifically, how far away the thing being referred to is) and form (whether it's a noun form, adjective form, etc.) as shown:
        Near    Far     Further Int.
Noun kore sore are dore
Adj. kono sono ano dono
Example konna sonna anna donna
Manner koo soo aa doo
Place koko soko asoko doko

In case it's not obvious, the English translation of the first column would be: "this thing" (noun), "this" (adjective), "such as this" (example), "like this" (manner), and "here" (place).

Thus, knowing only the four distance class prefixes and five word form suffixes, we can form all 20 combinations while only having to remember a single irregularity - asoko ("way over there" or some such). This is taken even further with indefinite pronouns, in which further suffixes are added to the interrogative forms above (we'll use "doko" - "where" - as an example), for forms such as "somewhere", "nowhere", "anywhere", "everywhere".

Additive >> multiplicative. Especially in Caia, where I intend to encode a LOT of information into pronouns and conjugation islands.

Tuesday, October 17, 2006

& Bootlegging - UPDATED

So, Q was doing some price comparison on eBay, Amazon, and other places, with the intent of obtaining the aforementioned music CDs. While I was doing so, I happened upon an eBay listing of Blanc Dans Noir (the third Noir soundtrack) for $5.50. That quickly set off my bullshit detector, but it was still possible that this guy was simply selling these CDs at or below wholesale cost (improbable, but not impossible).

So, I needed proof that this was bootlegged. I first started by comparing the scan of the CD with the one on Amazon. The covers themselves looked okay, but the blisters didn't match; however, this also was not conclusive proof

I did, however, notice something distinctive on the eBay copy - the number KO-88241. This looked suspiciously like a product ID number - but whose? Some searching around revealed that it belonged to a company called K-O Records Ltd. Not the JVC Victor listed on Amazon, but still not quite enough.

The last piece of the puzzle was found by searching Google for listings of bootleg CD manufacturers. K-O wasn't listed on Anime Digital's FAQ (the first one I looked at), but Chudah's Corner identified K-O Records as a known bootleg company.

So, we've got a bootlegged copy of Blanc Dans Noir on eBay. Is that all? Nope. Looking at the person's eBay store, we can see a wide variety of music CDs for the same price. While I'm certainly not going to check all of them, a random sampling reveals they all bear the same KO- product ID on the blister. Looking at the number of comments this guy has received, it appears that he's been running a massive bootleg operation for quite some time.

As a humorous aside, while I was examining the covers, I noticed this from the scan included with the Blanc Dans Noir that I downloaded:


If you've been following my trail of investigation, you should be able to recognize that as an Ever Anime product ID - another bootleg.

UPDATE: While I was still conducting my investigation (and before I was convinced that the one on eBay was a bootleg, I sent this question to the seller:
I'm sorry there's no elegant way to ask this: this is a legal
(non-bootlegged) copy produced by JVC Victor (not Son May or some such),
right? The price just seems hard to believe, given that the retail price is
3045 yen.
Before I'd formed my conclusion I was expecting a reply; however, that expectation ended abruptly when I proved that it was a bootleg through other means. Much to my surprise, I found this waiting for me when I got home from technical writing class:
Hi, sorry this is the sonmay version. ~Jenny
Holy crap, did I hear that right? I wonder if she knows she's selling bootlegged copies; I can't imagine she'd have responded if she did.

Errata

While I was doing price comparison (and investigating soundtrack counterfeiting), I found out that I was totally mistaken about the $30 price on Blanc Dans Noir. As a matter of fact, they do only charge your for 1 CD worth. See, in Japan, anime soundtracks (not sure about popular music) go for about $30 for a single CD. Maybe that has something to do with so few soundtracks making it to the US (as they sell for $15 retail here)...

Now, if you're pretty sharp, you might have wondered something: how did Q not know this (he has imported anime music from Japan, after all)? The answer is both sad and humorous: I got ripped off. I assumed the price of $14 was fair for my Mai-HiME and Mai-Otome soundtracks, as that's a bit less than what CDs retail for here (and what my two US versions of Madlax cost legally), and didn't bother to check what the Japanese price was - sure enough, it's $30 each (the ones I got were Miya Records bootlegs, by the way). Note that the first two Noir soundtracks I listed a couple posts ago were really $15, as they were the US versions (the third one was never brought to the US).

I'd report the guy I bought them from to Amazon (what I bought them through), but as far as I can tell he's no longer around, so he probably got busted by somebody else.

Sunday, October 15, 2006

Marketing Math

So, having acquired the Mai-HiME soundtracks (2), Mai-Otome soundtracks (2), and Xenosaga III soundtracks for my birthday, I was looking into acquiring some of the others on the "todo" list - the Xenosaga I and II soundtracks, and the Noir soundtracks (3). Looking at the prices, the Xenosaga ones were none-too-cheap, both being in the $35-40 range. This was a bit more than I was hoping for (well okay, almost 50% more than I was hoping for), but at least understandable for 2-CD sets. The first two Noir soundtracks were, as is usual for anime soundtracks, one CD each (do they Japanese know how to maximize profits, or what?); the price was also typical: $15 each.

Thus, I was surprised to find the third Noir soundtrack selling for $30. A little searching for info confirmed what my mental math had suggested: the soundtrack had two CDs. This came as a moderate surprise to me, as, to my knowledge, it was not significantly longer than the first two.

A look in WinAmp revealed that I was half right: all of the other CDs mentioned (individual CDs) were in the range of 50-60 minutes. The entire third Noir soundtrack, however, totaled 73 minutes - about 1/3 longer than the other CDs, but not more than would have fit on one CD. Looks like I've stumbled upon an evil marketing plot; I imagine the executives' meeting went something like this (although in Japanese):

Executive 1: Alright, people, these Noir soundtracks 1 and 2 are selling like crazy. But I hear we've still got some unpublished music from that series, and that means more money for us. Is that true?
Executive 2: Yes sir, we've still got some unpublished music left, and I hear Kajiura has been playing around with some of the tracks from the other soundtracks, so there might be some remixes we could squeeze out of her, too.
Executive 1: So what's that total, exactly?
Executive 2: I dunno, maybe 73 minutes?
Executive 1: 73 minutes?? That's too much to give people for $15. Split it into two soundtracks.
Executive 3: But sir, a CD can hold 74 minutes...
Executive 1: You go sit in the corner!
Executive 4: But really, people won't pay $15 each for two 36 minute CDs...
Executive 1: You're fired!
Executive 5: I have an idea that might work...
Executive 1: Well, spit it out!
Executive 5: What if we made a single two CD soundtrack, and sold it for $30? With two CDs, people will get the impression that they're getting two CDs worth, and never bother to actually check the play time on either one. As an added bonus, we save money by not having to produce two manuals.
Executive 1: Not bad! Does anybody else have any better ideas, or should we go with that?
*silence*
Executive 1: Alright, ship it!

And So...

Q discovered ActionFonts, and his font list will never be the same again.

In other news, the maker of this one has entirely too much time on their hands:

Aduzings I (from Azumanga Daioh). And yes, that is a good series.

Thursday, October 12, 2006

SWEEEEEEEEEEEET

So, this semester I'm taking a technical writing class required of all computer science majors - feasibility studies, users manual, grant proposals, that kind of stuff; it sucks. But at least one part of it looks promising - the grant proposal.

Last class we each made a request for proposals on some topic we think up. Seeing yet another opportunity to be a smart-ass (who else would put an asymptote in a budget graph in a homework assignment?), I wrote a request for proposals on the following:
"Wanted: Teacher-less Japanese foreign language curriculum based on immersion in anime."

So, today the teacher read all (25) of the ideas to the class, and had everybody choose which topic they want to do their (group) grant proposals (with in-class group presentations) on. As best I can remember, the most popular were, in order:
- virtual textbooks
- aggressive anti-popup software (that is, it launches attacks on anyone that attempts to make a popup appear on your computer)
- grammar/writing learning software (can't remember the details of that)
- foreign language curriculum based on anime and/or video games (my topic got merged with another person's)

That's right, my smart-ass idea made the charts; so we've got a group to do it. Since somebody else mentioned it, I actually think the video game version is more viable (less licensing costs and all that). This is gonna be godly. Just think: we can demo a real prototype (maybe something with Neverwinter Nights) in the presentation!

Monday, October 09, 2006

Disambiguation in Caia

Ambiguities are fun (and troublesome) in any language, but when you're designing a language from scratch, they tend to be even more fun, as now it's YOUR fault that they exist. Lately I've been spending a moderate amount of thought trying to solve a problem.

Let us take the following classic phrase from IRC and syntactically translate it into Caia:
'the holy castrating sledgehammer'

That seems relatively unambiguous in English, thanks to the intuitive meaning of the word order used. In Caia, however, there are almost no true adjectives, as Caia instead forms pseudo-adjectives by relating one noun to another with an attributive particle ("of"); true adjectives are things which do not have an acceptable noun form (things like "more"). Thus, in Caia it would look something like this:
'sledgehammer of holiness of castration'

Now that looks a bit frightening even in English, although it's even worse in Caia. The most intuitive interpretation of this phrase, in Caia, would be 'sledgehammer of (holiness of castration)' (parentheses added as an indicator of how the syntax tree would be formed), suggesting that it is the holiness that is being castrated - something along the lines of "sledgehammer of castrated holiness".

'sledgehammer of castration of holiness' is no better. This implies that holiness is a property of castration. More freely translated, it would sound something like "sledgehammer of holy castration", which is also not what we are trying to say.

Caia has something called delimiting particles, which act like the parentheses used earlier to illustrate word grouping - they indicate that everything in between the delimiters should be treated as a single syntactic unit, with regard to the rest of the sentence. While this can be handy for longer things such as relative (noun) clauses ("the house that Jack built with his own hands"), the fact that they must be paired makes them annoyingly cumbersome to use for simpler relations.

Another possible "solution" would be to make a list of the pseudo-adjectives like so:
'sledgehammer of holiness and castration'

This, however, presents a similar problem in a different place: is castration in a list along with holiness ('sledgehammer of (holiness and castration)' - what we're actually looking for) or with the sledgehammer ('(sledgehammer of holiness) and castration')?

That problem made me think quite a bit, and I believe I've come up with a solution: the disjoining particle. The disjoining particle does exactly the opposite of what the delimiting particles do: rather than indicating that a block of words go together syntactically, the disjoining particle indicates that a group of words do NOT go together. Speaking with regards to the syntactic tree, the disjoining particle indicates that the attachment for the following words is not the previous word (as we would intuitively assume), but rather the syntactic parent of the previous word. In this case, the syntactic part of "holiness" is "sledgehammer". Thus, the following unambiguously represents the phrase we were trying to translate (and, in fact, you could switch holiness and castration and have the same meaning):
'sledgehammer of holiness DISJOINING_PARTICLE of castration'

This is not limited to single word shifts. Suppose, for illustration's sake (this wouldn't actually happen in Caia, as the particles for attribution and ownership are different) we tried to translate the following:
'Justin's holy castrating sledgehammer'

The correct and unambiguous translation would be the following (actually, you could put the relations in any order and retain the same meaning):
'sledgehammer of Justin DISJOINING_PARTICLE of holiness DISJOINING_PARTICLE of castration'

Of course, now that we've added two disjoining particles and two duplicated attributive particles, it might be more elegant (and would require no more words; in fact, if we had one more attribute to attach to sledgehammer it would actually come out to be less words) to just use the delimiters and conjunctions as follows:
'sledgehammer of (Justin and holiness and castration)'

Friday, October 06, 2006

Script Fun

So I was making my daily blog rounds and came upon this on Narges' blog. Seeing the opportunity to make a quip, I pounced on the reply button. And by the time I realized I'd entered into something (not having looked at the reply count before posting), I'd won this:



That's "Justin" (or some odd-sounding accented version of it) written in Persian script. Persian for the most part uses the same alphabet as Arabic, but it adds a couple of letters and changes a couple pronunciations.

I don't know if I've ever mentioned it on the blog, but I've mentioned it in IMs and forums: I think Arabic script is the prettiest writing system I've seen; Tengwar takes second (even though it's not a real-world script). Din dabireh and Devanāgarī get honorable mentions.

And I know I've mentioned it to a few different people on IM, but I'm not sure if I've mentioned it on the blog, but I've experimented with making a few scripts myself, some experimental (just to try out a theme), others are intended to actually be used.



These are called S-Runes. I was originally going for something like Chinese characters, but based on mathematical formula. This was an experimental character set. Originally they were written at about a 30 degree slant, but that didn't work at all with a computer monitor, so I made them straight horizontal and vertical.



Some early sketches of the S-Runes, back when they were still slanted:



Next is another experimental script. This one was based on simple multiplicative complexity: using combinations of 1/3 top and 2/3 bottom glyphs to form each character, occasionally forming things that don't look exactly like the combination would lead you to expect.

There was a third experimental script based on a tic-tac-toe board (no, you're probably not correctly imagining what it looks like), but I'm not sure where the paper I had that written down on is.



Next is a real script, called something along the lines of Caia hieroglyphics. This was originally to be the official Caia writing system, but I ultimately decided it was too cumbersome to use for a language that is designed for efficiency. It kind of resembles Aramaic script.



Lastly, the current official Caia script. This is fairly heavily derived from the Caia hieroglyphics, as it was supposed to be kind of like cursive is to printing. Many of the Caia hieroglyphs can be found in some form in Caia script, but a number of them couldn't be adapted well to script form.

Monday, September 25, 2006

Dude, Where's My Blogging?

As those of you keeping count have probably noticed, there hasn't been a whole lot of blogging from me, lately. There's a pretty intuitive reason for that. As I mentioned previously, Squid and I went to Kansas to work for the same company as Skywing over the summer. Skywing's boss (actually the chief technical officer) had invited me to work there out of the blue, while Squid was just bored, and thought out of state travel might amuse him. It came as a surprise, then, that when we arrived, the CTO told Squid that if he was bored and wanted to work for the company as a Q/A person (since his programming abilities aren't really enough to get by at an actual job) he could; only, they didn't have budget to pay him.

As Squid was in fact bored (that and the fact that, for technical reasons, he could only use the internet from the hotel room for the first couple of weeks while I was there), he decided to try working (a relatively new concept for him, although I don't suppose I'm one to talk). So, I worked on my (programming) project for the ten weeks we were there, and he worked for their Q/A "department" (actually only like 3 people, including him).

Well, it turned out that Squid was actually pretty good at Q/A, and at the end of the summer both of us were invited to stay and work full-time for the company, once they managed to find budgets for us (during the summer I was the only one getting paid, and only making an intern's pay). I declined, as I wanted to finish up my last 3 semesters of college (double majors, remember), though I expected to go back over the winter and summer breaks, and work remotely during the school semesters (maybe).

So, we came home, and I started school on August 22. A couple weeks after, however, we got a call from them, regarding some new funding. They wanted me to work remotely to finish the project I'd started during the summer (and wanted it done in three weeks!), and they now had budget for Squid.

So, there are a couple points to this story. First, I'm going to school full-time (taking the same number of units I was in the previous semesters; I also have two project and a term paper due within three weeks from now) and still working part-time, meaning I'm busier than I used to be (and Gord help me if I ever start playing WoW again). Though I'm not sure if it'll last; my project has to be done by next Monday, and I don't know whether they'll want me to work on anything else remotely after it's done (from what I've heard they don't usually let people work remotely).

Second, Squid has accepted the job, and is leaving (for good) on Thursday. I guess if his luggage doesn't explode this time (last time his carry-on suitcase set off the bomb detector), I'll see him around Christmas, if I go back to work there. Amusingly (and surprisingly), it look like his sister may move into his room in the house here, after he moves out. Unfortunately, I don't think she downloads every single anime episode that comes out, like he does (so that I could always just get whatever I wanted from him); oh well :P

Friday, September 22, 2006

Slashdot Go Boom





Initiating SYN Stealth Scan against slashdot.org (66.35.250.150) [1 port] at 22:29
Running: Linux 2.4.X|2.6.X
OS details: Linux 2.4.21 (Suse, X86), Linux 2.4.6 - 2.4.21, Linux 2.6.8 (Debian)

Now that's ammunition.

Wednesday, September 20, 2006

Q's Fact of the Day

Blast Processing was a marketing term coined by Sega to advertise the fact that the Sega Mega Drive/Genesis could calculate faster motion than the Super Nintendo Entertainment System and was generally taken by the public to refer to the main system processors. Strictly the term refers to a technical feature of the Genesis that wasn't replicated on the SNES - the ability for the CPU to be working on one visible section of map while the graphics processor displays another. Since only the visible part of the map is uploaded at any one time, this feature greatly increases the distance that the map can scroll from one frame to the next, but few if any people will have been able to discern that meaning from the advertising.
http://en.wikipedia.org/wiki/Blast_Processing
I always wondered what that term really meant.

Monday, September 18, 2006

Public Service Announcement

This is Q's public service announcement of the... well, since whenever the last one was. Today I'm writing to warn you about possibly the most idiotic, incompetent bank in the whole world: Wells Fargo. This bank is so special that it's been charging me a monthly service fee on my free savings account (which I only opened because I needed one to get their student credit card, at the time) for several years, now. I've been over there four times to yell at them and tell them to fix it. The first three times it was "I'm so sorry, sir, I'll fix it right now." Lo and behold: bam, monthly service charge on the next month's statement.

The fourth time, however, was a little different (note that it's exactly the same amount of money in the account as there was the last several years, minus their deductions). This time it's "You have less than the minimum balance in your account, sir. Didn't anybody tell you there was a minimum balance fee?" Uh, no. Okay, so I add another $1,400 to the account (several times the minimum balance). Four weeks later: bam, monthly service fee on the account statement.

Now, I'm not positive about what to make of the fact that the first three times I went there nobody mentioned a minimum balance. It could be that their bankers (the ones that have their own desks) are just absolutely clueless about how their bank policies work, and Gord knows what it was they "fixed" when they said they had done so; or, it could be that the minimum balance is a new policy (this is consistent with the fact that I have no recollection of any service charge, and I read all the account information before opening it, but I'm not absolutely positive that I haven't forgotten), in which case it wouldn't explain the first couple years of deductions.

In either case, go put your money into Nigerian banks, people; you'll get ass-raped less than with Wells Fargo. And do make sure to spread the word.

Saturday, September 02, 2006

The Burning Crusade MoPaQs

A few days ago, BZ made me aware of the fact that the World of Warcraft: The Burning Crusade friends and family beta was available for download on the WoW site. As one of the areas of my expertise is the MoPaQ archive format (used by all Blizzard games since Diablo), I immediately wanted to know whether there had been any additions to the format with this new release.

I walked him through all the places he needed to look for additions, as he already had it downloaded and I did not. No new flags in the file table, no new extended attributes. MPQDump reported that there were no new compressions methods in use, nor unusual "system" files. There were, however, 12 new bytes in the MPQ header; unfortunately, they were all 0 in all of the game archives.

To make a long story short, I spent several hours over Thursday and Friday looking at the disassembly and running the thing (the installer, to be specific) with a debugger; I couldn't actually watch the code that used the new fields execute, but I did watch the code around those areas, and tried to put the pieces together in my head.

Finally, I'd completed my analysis, and was ready to update my specs. But I couldn't help but want to verify that everything I'd figured out was correct; but how do you study something when that thing doesn't exist? Well, you make it, and see if it works. And thus began the experiment to create a recombinant MPQ.

I made a list of all the new features in BC, so that I could be sure I tried all of them.
- Pointer to the extended file table
- Large archive support for the hash table pointer
- Large archive support for the file table pointer
- Large archive support for the file pointers
- The shunting system

How to test all of these with minimal effort, while eliminative false negatives and positives? Well, to me, the path of least resistance was fairly obvious: I spliced 4294967296 bytes of garbage directly after the MPQ header. This ensured that every file pointer in the archive would have to be altered, and shifted above the 32-bit file pointer limit present in older MPQs. Because it was exactly 4294967296 bytes, no existing pointers in the file (that is, the low 32 bits of the pointers) would have to be altered; the upper bits just had to be inserted, and they would always be 1. Thus, by simply splicing data there and setting the new fields of the header (two of three of which just needed to be set to 1), I'd knocked all of the three first items off the checklist. However, now I needed to add the high bits to all of the file pointers. This was accomplished simply by appending the proper number of bytes at the end of the archive (2 bytes per file) with the hex pattern 01 00.

But the real clincher would be the shunt. I had, I believed, figured out enough about the shunt to get it to do its thing. However, there were two values from the shunt header that the MPQ API saved in its archive data structure that I couldn't tell where they were used, meaning I couldn't tell HOW they were used. So, all I could do is set the value I knew what it did to what it should be and the value I didn't to 0, and hoped for the best.

After writing recMPQ, a program to perform the recombination on an archive, I ran the program on all three of the installer tomes (installation archives). What better way to verify my understanding than to use the recombinant archives as vectors and attempt a transfection?

I observed the experiment from WinDbg. As the archive was opened, I placed watches on the fields that the unknown portions of the shunt header were saved to, with the hope of being able to find the location of the code that was accessing them. Unfortunately, this failed; the fields were never observed to be accessed.

However, the recombinant MPQs worked perfectly - they were uptaken and their payload delivered without difficulty. Thus, the experiment was a success, and I updated my specs with (most of) the information I'd learned.

Friday, September 01, 2006

MoPaQ File Format Spec Updated

I updated my MPQ file format spec today, after pretty much completing my reverse-engineering of the Burning Crusade modifications to the format (which started on Thursday, after I found out that the BC beta was out). Also, the spec has a new home, now that CC has bit the dust (or at least the old site did) - on BZ's wiki.

Tuesday, August 22, 2006

Forgeted to Mention

I be back in California, now. After ten weeks of me being in Kansas, working for SW's company, Dorkess and Bigg'ns (my dog) sure beed happy to see me. It beed fun. I geted to work on a tool that attempt to auto-configur their server; I doed not quite hav enough time to finish it, but I finished all the major stuff, and all that be left be a bunch of odds and ends. I be hoping that I canill go back to work there during the coming winter and summer break, between school semesters.

Which bring up the fact that I be back in school for the fall semester. Three courses that probably beill boring, and one (operating systems concepts) that possibly might be intereting, if I don't already know most of the stuff the course beill teaching. Stuff like scheduling, synchronization, I/O, memory managment, real-time systems, etc.

While I beed in Kansas, I picked up watching a few new TV series. Monk beed mildly amusing, although it beed more of a time-filler than something I would rather watch than do other things. Psych, on the other hand, I might continu to watch, even though I be back home.

House be just godly. It be probably one of my favorite shows (along with Law and Order). It be about a doctor (House) and his medical students who get some of the more confusing cases, either due to extremely obscure illnesses, or symptoms that don't seem consistent with the illness, due to some unique circumstance. To quot one person on Star Alliance, House be Sherlock Holmes, only with medicine instead of crime. Oh, and doed I mention that House be a complete (and highly amusing) ass-hole? Arrogant, rude, anti-social, immature, rebellious, unprofessional, mean, etc. Since I discovered House, I hav learnen that a good number of my friends also watch it. And now you be going to watch it, too! You can probably find it for download somewhere. Or you could just watch it on Fox (tonight, I believ).

Other than that, I hav been slowly watching Stellvia of the Universe, as per SW's suggestion, and playing Neverwinter Nights and Hordes of the Underdark. Speaking of which, it hav been a good five or six months since I last played WoW. I wonder if I canill mak it all the way to the expansion (which beill November, at the earliest) without playing it again. Also, NWN2 beill coming out in October, and I be planning to get it, along with a new computer, using the money from this summer's work.

I wonder if I workill on LibQ again any time soon. I hav not worked on it in several months, and now I be rather engrossed in my evil plans that I be researching. Oh, and as a last note, this post be only tangentially related to what I be researching English grammar for.

Saturday, August 19, 2006

English Verbs - The 1' Verbs

Yeah, I realize I still have some uncompleted series to finish (like the Japanese grammar series), but at the moment I'm feeling most inspired in something else, so bear with me.

From time to time the last couple months, I've been looking at English grammar (yes, as you might imagine, it was for some greater purpose; no, you can't know what it is). Today I've been going through the English verbs and classifying them according to how they are conjugated. English verbs are broadly divided into two groups: the regular verbs and the irregular verbs. However, I prefer a bit more specific classification than that, and have thus created my own classification system.

Regular verbs are verbs which follow a very specific conjugation pattern, illustrated below:
Infinitive: to _ (ex: to mark)
Non-past tense: _ (mark), _s (marks; third person singular)
Non-past participle/gerund: _ing (marking)
Past tense: _ed (marked)
Past participle: _ed (marked)
Notice that there are 4 distinct conjugations: the infinitive/non-past tense, non-past participle/gerund, non-past third person singular tense, and past tense/participle. But all 4 of these conjugations can be derived directly from the present tense (for regular verbs ending in a vowel, the final vowel is removed before appending suffixes). Most English verbs are of this class (at least by number of verbs; by frequency of use or number of commonly used verbs it's a totally different story).

I call this a 1' conjugation because it has a single root: either the present tense directly, or the present tense with the final vowel removed.

There's one more type of 1' verb, which, as process of elimination would dictate, is part of the irregular verb superclass. This is the Single Present/Past/Participle conjugation class. As the name implies, this class is distinct in that the present tense (non-third-person singular), past tense, and past participle conjugations are all identical. For example:
Infinitive: to _ (to cast)
Non-past tense: _ (cast), _s (casts; third-person singular)
Non-past participle/gerund: _ing (casting)
Past tense: _ (cast)
Past participle: _ (cast)
In this case, there are three distinct conjugations, all derived directly from a single conjugation - the present tense.

The full list of non-archaic verbs in this class: bet, bid, cast, cost, cut, hit, hurt, knit, let, put, read (this one only belongs in this class if you're talking about how it's written; it's pronunciation differs from the rest of this class), rid, set, shed, shred, shut, slit, spit, split, spread, and thrust (and writing this list has made me decide that I will not being doing any more all-inclusive lists). Note that a couple of those are verbs that are in the middle of a transition to regular verbs (for example, knit) - both the Single Present/Past/Participle conjugation and regular verb conjugation patterns are considered grammatically correct for them.

I probably haven't given you enough data to see this last point, yet, so I'll just say it. Save for a handful of highly irregular verbs (ones that do not fit any of the conjugation patterns I have discussed or will discuss in this series), there are only 5 possible distinct conjugations for a given verb, from which all other grammatical conjugations draw - the non-past tense, the non-past third-person singular tense, the non-past participle, the past tense, and the past participle. Save for the truly irregular verbs, all the other tenses and gender/number combinations are drawn from those 5 in highly regular ways (for example, all perfect tenses are formed by adding a helper verb to the past participle).

As well, in all but the highly irregular verbs, there are no more than 3 roots for any given verb (usually 1 or 2), which may be derived to form the additional distinct conjugations (for example, the third-person singular non-past conjugation is always formed by adding -s to the present tense; as well, the non-past participle is always formed by adding -ing to the non-past tense).

Oh, and for the trivia value, as far as I know, be/being/am/are/is/was/were/been is the most complicated and highly irregular word in the English language, having 8 distinct conjugations and 6 roots.

Monday, August 07, 2006

Pop IT Quiz!

So today I wrote an e-mail address parser/validator based on RFC 3696. These are three e-mail addresses I threw at it to see if it was working correctly:

$q{_}i$|-|y\\ "f4t"@corp.goat.net
dorky.cat@corp.goat.net.
slinky-kitty@.corp.goat.net

Which one(s) of these is/are incorrect, without looking at an e-mail format reference?

Thursday, July 20, 2006

& Lessons from the Morning Meeting

Thou shalt not play with a screwdriver like a sword without verifying that the bit is locked, lest ye nail your boss on the other side of the room in the face with a 4", extra-long screwdriver bit.

Sunday, July 16, 2006

The Animation Control Incident

So, I'm going about my business at work, working on my project. What I'm working on, generally, is a configuration wizard for the product we make. It tries to detect as much information as possible, and then let's the user review/change the setting before configuring/activating the program. One of the neat features it has (which I may blog about more in depth) is a mechanism of asynchronous data prefetch. As some of the detection methods can take several seconds (or even as long as 15 seconds, if a computer it's trying to reach on the network is offline), this hides that time by allowing the user to use the wizard while info is gathered in the background, ideally preventing the user from ever knowing how much time was spent gathering information.

Anyway, I needed a "please wait" dialog for when the user flips to a page whose data isn't loaded yet (and also to use for times when it has to spend time validating the user's input). The company has a nice little AVI to use for such occasions, so I threw it on a dialog with an animation control (the standard Windows common control). However, testing this dialog revealed something odd - once you told it to play with Animate_Play/ACM_PLAY, it would show the first frame for several seconds, until it finally began to play the animation.

A ridiculous amount of fidgeting with the parameters (and the source to the company's Animation control wrapper class) later, I was still unable to find anything I was doing wrong to cause this (although I did manage to observe that it wasn't just the first time - any time the window was completely obscured, this lag occurred). I asked Skywing about it, and about another weird thing I'd seen in the animation control class the company has. He said he'd seen the delay before, too, and had looked into it a bit, but was unable to locate the cause (and he said I should try and figure it out). He said that it seemed to be completely hiding the first iteration of the AVI.

So I looked at the control. It was creating a decoding thread, which looked like this:

.text:5D0AEC6D ; DWORD __stdcall PlayThread(LPVOID)
.text:5D0AEC6D _PlayThread@4 proc near
.text:5D0AEC6D
.text:5D0AEC6D arg_4 = dword ptr 8
.text:5D0AEC6D
.text:5D0AEC6D mov edi, edi
.text:5D0AEC6F push ebp
.text:5D0AEC70 mov ebp, esp
.text:5D0AEC72 push esi
.text:5D0AEC73 mov esi, [ebp+arg_4]
.text:5D0AEC76 push 1
.text:5D0AEC78 push esi
.text:5D0AEC79 call _DoNotify@8 ; DoNotify(x,x)
.text:5D0AEC7E push esi
.text:5D0AEC7F call _HandleTick@4 ; HandleTick(x)
.text:5D0AEC84 test eax, eax
.text:5D0AEC86 jz short loc_5D0AECB7
.text:5D0AEC88 push edi
.text:5D0AEC89 mov edi, 0FA0h
.text:5D0AEC8E mov ecx, [esi+5Ch]
.text:5D0AEC91 test ecx, ecx
.text:5D0AEC93 jz loc_5D0B8C48
.text:5D0AEC99 test eax, eax
.text:5D0AEC9B mov eax, [esi+44h]
.text:5D0AEC9E jl loc_5D0AF8BA
.text:5D0AECA4 push eax ; dwMilliseconds
.text:5D0AECA5 push ecx ; hHandle
.text:5D0AECA6 call ds:__imp__WaitForSingleObject@8 ; __declspec(dllimport) WaitForSingleObject(x,x)
.text:5D0AECAC push esi
.text:5D0AECAD call _HandleTick@4 ; HandleTick(x)
.text:5D0AECB2 test eax, eax
.text:5D0AECB4 jnz short loc_5D0AEC8E
.text:5D0AECB6 pop edi
.text:5D0AECB7 push 2
.text:5D0AECB9 push esi
.text:5D0AECBA call _DoNotify@8 ; DoNotify(x,x)
.text:5D0AECBF xor eax, eax
.text:5D0AECC1 pop esi
.text:5D0AECC2 pop ebp
.text:5D0AECC3 retn 4
.text:5D0AECC3 _PlayThread@4 endp

.text:5D0AF8BA add eax, edi
.text:5D0AF8BC jmp loc_5D0AECA4


HandleTick is what draws each frame (we'll get back to this in a minute). It was trivial to determine that the problem was due to WaitForSingleObject spending several seconds waiting before timing out. But that didn't explain why, or what it was waiting on.

By stepping through the loop I observed that the 5D0AEC9E-5D0AF8BA-5D0AF8BC-5D0AECA4 route was being taken. The value loaded from the struct was 100 ms, and it was getting 4000 ms added to it. I got out my cell phone (with built-in stopwatch) and verified that the drawing delay was exactly 4.1 seconds - not the 2.8 seconds Skywing predicted (the length of the animation). However, as best I could tell, it was doing this every single iteration. So why was there only a single gap in the animation?

As you can see in the disassembly, the 4 s delay path is only taken when HandleTick returns a negative number (which it was, in this case, returning -1). Looking at this function revealed the following of interest (I'm not pasting the whole function, here):

.text:5D0AECE7 push eax ; lpCriticalSection
.text:5D0AECE8 mov [ebp+lpCriticalSection], eax
.text:5D0AECEB call ds:__imp__EnterCriticalSection@4 ; __declspec(dllimport) EnterCriticalSection(x)
.text:5D0AECF1 push dword ptr [esi] ; hWnd
.text:5D0AECF3 call ds:__imp__GetDC@4 ; __declspec(dllimport) GetDC(x)
.text:5D0AECF9 mov ebx, eax
.text:5D0AECFB lea eax, [ebp+var_10]
.text:5D0AECFE push eax ; LPRECT
.text:5D0AECFF push ebx ; HDC
.text:5D0AED00 call ds:__imp__GetClipBox@8 ; __declspec(dllimport) GetClipBox(x,x)
.text:5D0AED06 cmp eax, 1
.text:5D0AED09 jz loc_5D0AF8A5

.text:5D0AED43 push ebx ; hDC
.text:5D0AED44 push dword ptr [esi] ; hWnd
.text:5D0AED46 mov edi, eax
.text:5D0AED48 call ds:__imp__ReleaseDC@8 ; __declspec(dllimport) ReleaseDC(x,x)
.text:5D0AED4E push [ebp+lpCriticalSection] ; lpCriticalSection
.text:5D0AED51 call ds:__imp__LeaveCriticalSection@4 ; __declspec(dllimport) LeaveCriticalSection(x)
.text:5D0AED57 pop ebx

.text:5D0AF8A5 mov eax, [esi+50h]
.text:5D0AF8A8 mov [esi+48h], eax
.text:5D0AF8AB xor eax, eax
.text:5D0AF8AD cmp [esi+4Ch], edi
.text:5D0AF8B0 setnz al
.text:5D0AF8B3 neg eax
.text:5D0AF8B5 jmp loc_5D0AED43

The jump at 5D0AED09 was being taken, resulting in eax getting set to -1 ([esi+4Ch]/-1 != edi/0). That one REALLY threw me. At first I thought that GetClipBox was succeeding, and as a result HandleTick was failing. But in fact GetClipBox has the following return values:
#define ERROR 0
#define NULLREGION 1
#define SIMPLEREGION 2
#define COMPLEXREGION 3

And everything becomes clear. If the animation's window is completely covered by another window, GetClipBox returns NULLREGION. If GetClipBox returns NULLREGION, HandleTick returns -1. If HandleTick returns -1, the timeout duration gets 4 s added to it.

Actually, there are two more pieces of information we still require to completely crack this case. The first was provided by Skywing and WinDbg - the event being waited on is actually a termination signal. When it's time for the animation control to stop playing, this event is set (among other things), short-circuiting the loop. This means that the value of [esi+44h] at 5D0AEC9B determines the rate at which frames are drawn. This is a bit of deviation from the most common use of events, where the event being set - not the timeout expiring - is the expected result.

It seems likely that the 4s addition is a backoff case. If the animation control window becomes completely obscured, there's no need to draw the frame, and the rendering thread stalls itself (excessively, if you ask me; I probably wouldn't have given more than a 1 second timeout).

So, now we know why the delay. That just leaves the question of why this timeout executes every time at the very beginning. After thinking about it a moment, the answer came to me: it's a consequence of how (or where) we're playing the animation - in the WM_CREATE (for windows) or WM_INITDIALOG (for dialogs) message handler. This is exactly where initialization that requires the window to already be created is supposed to go. Now, here's the trick: at this point, the window has been created - but it is still not visible (obviously - you want to do initialization BEFORE the window appears on screen). Since the rendering is done in a separate thread, this thread can execute concurrently with the UI thread. If the rendering thread gets to execute before the WM_CREATE/WM_INITDIALOG handler returns, and the window is shown, the rendering thread will go into timeout.