Search This Blog

Thursday, May 29, 2008

Absolutely Amazing

Since it would take quite a patchwork of quotes to summarize this story, I'll just give a few bullet-points of my own as a summary.
- Revision3 uses BitTorrent to distribute its own content (legal distribution, in other words)
- Everybody's second-favorite company MediaDefender decided to play with R3's tracker. Once they found a hole that allowed the tracker to serve torrents not by R3, they began using the tracker to track their own files.
- R3 discovered that somebody was using their tracker for external content and banned MD's torrents
- MD's servers (the ones actually uploading the files that they were using R3's tracker to track) responded by DOSing R3's tracker (according to one person on Slashdot, MD has a 9 Gbps internet connection for this purpose), taking R3's tracker and other systems completely offline
- The FBI is currently investigating the incident. Some have suggested and are praying that the PATRIOT Act could be used to charge MD with cyber-terrorism, as defined by law.

Various coverage:
Inside the Attack that Crippled Revision3 (mirror)
MediaDefender, Revision3, screw-up
Revision3 Sends FBI after MediaDefender

Thursday, May 22, 2008

More on kd-tree Splitting

I was just reading the paper Analysis of Approximate Nearest Neighbor Searching with Clustered Point Sets, which, as the title indicates, performs performance analysis of 3 different kd-tree splitting policies. The policies used are the standard policy, the sliding-midpoint policy, and the minimum-ambiguity policy, which is new in this paper.

The minimum ambiguity method takes into account not only the full set of data points, like all the others, but also takes into account a set of training regions which represent the distribution of regions to find points within - that is, the future searches themselves. As with the other methods, the goal of the algorithm is to minimize the average number of nodes in the tree that overlap each search region; however, when both the searches and data points are known, the minimum-ambiguity method can do it better than the others.

Two different scenarios analyzed are of particular interest. In all cases the data points were clustered; the two correspond to the distribution of the training regions: the same clustered distribution as the data points, or uniform distribution. In the case of both using the same clustered distribution, the minimum-ambiguity policy > the standard policy > sliding-midpoint policy (here using the internet use of '>' as "is superior to"). In the case of searches distributed uniformly, the sliding-midpoint policy > minimum-ambiguity policy, with both far superior to the standard policy.

So, what's it mean for writing a kd-tree for a game? Well, it provides some pretty interesting information, though it doesn't change the bottom line. As mentioned in my paper, more complex splitting policies like sliding-midpoint and minimum-ambiguity are only viable for data sets that are essentially fixed. In a game, this corresponds to immobile objects that are either unchanging (e.g. cannot die) or change extremely infrequently; in E Terra, this corresponds to doodads - objects which take up space but do not have any function - and immobile game objects such as grass (which is edible but not consumable).

As also mentioned previously, the distribution of points is not expected to be uniform - it's expected that there will be clusters of things at various focal points on the map. Furthermore, in the case of mobile objects, the search distribution will roughly equal the distribution of the data points themselves.

Unfortunately, neither of these facts is useful to us. Despite the mostly known distribution of searches, we cannot use the minimum-ambiguity policy in any of our trees because the set of search regions - corresponding mostly to the mobile game objects - is dynamic. Furthermore, it wouldn't be of any particular benefit to use the data points in the static trees as the search region distribution, as the majority of searches will be from the mobile objects, for things like collision detection and sight radii.

Thursday, May 15, 2008

Detox

Well, I've finally finished my last final of the semester, and the last homework assignment/term project was turned in last week. Now comes the much more pleasant task of purging myself of any knowledge acquired this semester. A list of some of the things I ought to work on this summer:
- Clean up and post kdTrieDemo
- Clean up and post TextBreaker
- Clean up and post the GUI code from MPQDraft
- Work on E Terra
- Do follow-up experiments to my AI term project
- Work on Secret Project V (no, that doesn't stand for "vendetta")
- Work more on my various languages

Though somehow I'm betting the only one I'll do in any great amount is:
- Play World of Warcraft

Saturday, May 10, 2008

The Story of Comcast

As companies like Comcast are increasingly in the news on technology-related news sources, some might wonder how the entire situation with Comcast came to be. Well, this abridged version of the actual shareholder delegate meetings in two acts might shed some light on the topic.

Act 1

Shareholder Delegate #1: Gentlemen, I propose that we advertise higher bandwidth, increase prices, and sign up more customers. Yay or nay?
Delegate #4: What is the business impact of this proposal?
Delegate #1: More money for us.
Delegate #4: Yay.
Delegate #3: Yay.
Delegate #5: Yay.
Delegate #2: Yay!
Delegate #5: I propose we increase network capacity to accommodate those additional customers.
Delegate #2: What is the business impact of this proposal?
Delegate #5: We'll need to spend some money in the short term to...
Delegate #2: Nay!
Delegate #4: Nay!
Delegate #3: Yay.
Delegate #1: Nay.

Repeat 37 times

Act 2

Delegate #2: Holy shit! We've got way more network traffic than the network can handle! It's strangling our network to death!
Delegate #4: This is a disaster! Quick, somebody find a scapegoat!
Network Technician: Well, it looks like BitTorrent is using up a fair amount of bandwidth.
Delegate #4: Kill it, quick!
Network Technician: BitTorrent traffic blocked. Network performance has returned to mediocre levels.
Delegate #2: Whew. Crisis averted.
Delegate #2: Now then, I propose we increase advertised bandwidth and sign up more users. Oh, and we can't forget to increase the price; we are selling a limited resource, after all.
Delegate #3: What is the business impact of this proposal?
Delegate #2: More money for us.
Delegate #2: Yay.
Delegate #5: Yay.
Delegate #4: Yay!
Delegate #3: Yay.
Delegate #5: I propose we upgrade our network to...
Delegate #2: I thought we voted you out months ago. Security!
*delegate #5 is dragged out of the room by security guards*

*cut to next business meeting*

Delegate #4: Gentlemen, I propose that we raise advertised bandwidth, increase prices, and sign up more customers.

*curtain*

This post inspired by Mac Chaos' various parodies over the years, which are probably better than mine.

Friday, May 09, 2008

Spatial Sorting with kd-trees - Part 2

Here's the paper itself. The teacher required that I use the ACM format, but I cut corners where possible (e.g. he only said we needed to have those four sections) :P I should note that the sections themselves are from specifications given by the teacher. I think if it were an ACM paper, or even if I was coming up with the structure myself, it would have been organized significantly differently.

Thursday, May 08, 2008

Spatial Sorting with kd-trees - Part 1

So, today I turned in my graphics programming term project... what's done of it, anyway. Spatial Sorting with kd-trees is the name of the paper/presentation; not quite as impressive sounding as my AI project, but then, this project as a whole is less impressive than my AI project (and my third term project even less) :P

Well, I was supposed to give the presentation today, on the last day of class. Unfortunately, due to gross lack of time, that didn't actually happen. If I recall correctly, 12 people were scheduled to go today, in 75 minutes of class time; that's about 6 minutes per person. Problem is, everybody made their presentations with at least 10 minutes in mind (not unreasonable), and it took everybody a couple minutes to get their Powerpoint presentation running on the overhead projector. By the end we'd gone from allowing the first 3 or 4 people to give their full presentations, to "7 minutes each", to "5 minutes each", to "3 minutes each", to "run your demo program and sit down"; and at least one person (me) never did get a chance to go (I'm not sure if there were others). So, I wrote the URL of my blog on the whiteboard and told people I'd post the presentation there.

The presentation itself is here. I'd recommend that anyone who hasn't seen it before also look at my previous description of kd-trees and kd-tries, as I'm not sure the slides alone, without the spoken part of the presentation, will give you a clear explanation.

In addition, the demo program is also here, compiled with XNA 1.0 Refresh and XNA 2.0. The space bar toggles pause (initially paused); the H key toggles display of the kd-tree divisions, simulated view frustum, and statistics about the search for objects in the view frustum (the statistics are, in order: number of objects in the view frustum, number of objects evaluated, the percentage of objects evaluated that are in the view frustum, the number of tree nodes navigated, and the ratio of objects in the frustum to combined nodes and objects visited); the arrow keys move the view frustum. I'll probably post a rant about XNA in the near future; just make sure you download the version of my demo for the version of XNA you have installed. I'll probably post the written portion of the project tomorrow, after I finish it.

I'm still hoping to post the source to TextBreaker and kdTrieDemo, although that's gonna be at least a week from now (at the earliest), as finals are next week.

Sunday, May 04, 2008

Orthographic Language Identification Using Artificial Neural Networks

Finally finished adding in the figures from the presentation that weren't in the original version of the paper. I have no idea why those three Visio figures are so ugly; they don't look that way in Visio, Word, and Powerpoint, but magically turn ugly when printed with PDF reDirect (which has worked well before those three figures).

Orthographic Language Identification Using Artificial Neural Networks

I'd like to post the full source to TextBreaker, but a lot of it was written in a hurry and needs cleaning up and commenting. Combine my infamous laziness (and past experience with MPQDraft) and time will tell if I ever get around to it :P

Saturday, May 03, 2008

The Grand Unification

The assumptions you start with can often limit the set of conclusions you are able to arrive at through logical reasoning. Computer science people were having a heck of a time attempting to adapt the binary tree - the standard for in-memory search structures - to media with large seek time, particularly disk drives. Progress was slow and fairly unproductive while the basic definition of binary tree held. Finally, somebody questioned the assumption and thought: what if we built the tree from the bottom up? And so the B-tree was born, and remains the standard in index structures to this day, with incremental improvement.

Other times, when creating, the assumptions themselves are fun to play with and observe the results. In Caia, I initially envisioned all of the major words - nouns, adjectives, and verbs - to be nouns. Nouns became adjectives when used with an attributive particle analogous to "having" (e.g. "woman having beauty" vs. "beautiful woman"). Taking an idea from Japanese, nouns became verbs by an auxiliary verb meaning more or less "do" or "make" (e.g. "make cut" vs. "cut"). Unfortunately, I ultimately concluded that verbs had to be separate, due to both practical concerns (specifically, concerns about making thoughts too long) and theoretical concerns (different nouns require different semantics, which would produce inconsistent theoretical behavior).

With another one of my languages, I took a different route, with some very interesting results. As this language is synthetic (unlike Caia, which is strongly analytic and isolating), I had quite a bit more flexibility. This language was actually modeled on the Altaic languages - Japanese, Korean, Mongolian, Turkish, etc.; as such, I suppose I can't claim that I invented this (what I'm getting to), but merely took what existed in Altaic languages and perfected it to a degree that doesn't exist in nature - at least to my knowledge.

The result is a language is which nouns, adjectives, and verbs are, in fact, all verbs; they are conjugated exactly the same, and play the same grammatical role. Even though this particular language is fictional, and I'm not expecting anybody to actually speak it, this idea might show up in other languages of mine (possibly a real one), as I find it exceptionally elegant. However, I do like to refer to "nouns" as substantives, "adjectives" as attributes, and "verbs" as actions; this is done because there are slightly differences in meaning between the noun bases in the three cases.

I explained previously that Japanese verbs had several base forms - usually distinguished by one vowel - which were then agglutinated with other things to form complex verb forms. I won't describe them again, as I use different names for the ones in my language, which might result in confusion. In my language, there are at least five different base forms of each verb: neutral, conclusive, attributive, participial, instantiative, and conjunctive (note that these names are not final, and I'm open to suggestions).

The conclusive form is the same as the Japanese conclusive: it's the main verb of the last clause in a sentence; it indicates the end of the sentence, and often has a number of suffixes indicating various details about the sentence. The attributive form is the same as one of the two uses of the Japanese attributive: it's the verb of a relative clause; adjectives are attached to nouns by forming relative clauses (e.g. "cat that is fat" vs. "fat cat"). The participial form resembles the second use of the Japanese attributive form: it is a noun referring to the act named by the verb (essentially an English gerund or participle, as in "watching FLCL makes me want to kill people"). The instantiative is unique to my language, and refers to an instance of the verb's action; for example, "a run" would be an instance of the verb "run". The conjunctive is similar to the Japanese -te form (and also includes the Japanese conjunctive base); specifically, it is used in verbs not in the last clause of a sentence (I'll come back to that). Finally, the neutral form is used in agglutination, and has something of a flexible, context-dependent meaning.

So, how exactly does this framework allow the grand unification? Let's look at an example of the specific meanings of the different bases for each word type (substantive, attribute, and action). Although I should note that not every base is necessary to unify the three; some are simply part of the bigger picture for the language.

For the substantive "human":
Conclusive form: "is [a] human"
Attributive form: "who is [a] human"
Participial: "being human"
Instantiative: "human"
Conjunctive: "is [a] human, and..."

For the attribute "fat":
Conclusive form: "is fat"
Attributive form: "who is fat"/"fat"
Participial form: "being fat"
Instantiative form: "fat thing/person"
Conjunctive form: "is fat, and..."

For the action "travel":
Conclusive form: "travel"
Attributive form: "who travels"
Participial form: "traveling"
Instantiative form: "journey"
Conjunctive form: "travel, and..."

Thus we are able to use identical conjugation for each type of word, treating the first two as stative verbs and the last as an active verb, in an elegant unified system. The real key to this, I think, was the separation of participial and instantiative forms. Note that not all actions have an instantiative form; it only exists where it makes sense: where something is produced or performed.

Now that I've explained how the unification works, there's just one more loose end to tie up: the meaning of the conjunctive form. This form is somewhat foreign to English speakers, because English only works this way in one of the circumstances this language uses it for (specifically, conjunction - e.g. "Murasaki is seven years old and [is] surprisingly well-spoken").

In English, when we have multiple clauses in a sentence that are related in a particular way, they are generally joined by some linker word that carries information about the relationship between the clauses; furthermore, the verbs in all clauses are conjugated normally. In Japanese, all non-main clauses are simply joined, often without any indication of what the relationship is (for matters of time, this is not that unusual; many languages lack such words); as well, the verbs in all but the main clause are deficient - they lack various things like tense, mood, politeness auxiliaries, etc. This is a matter of economy; all that stuff they stick on the verb at the end of the sentence can be rather lengthy. Thus it uses a generic verb form which in some ways resembles the -ing form of English verbs; this form indicates a conjunctive relationship between sentences (note that the literal conjunctive, as in the example a bit above, is actually indicated with a separate form in Japanese, appropriately known as the "conjunctive form/base"; what I've done is merged the two uses).

Here are some examples of things that would use this conjunctive form in Japanese. The first version is how it would be said in Japanese (note that I'm conjugating all verbs here, even though only the last one would be conjugated in Japanese); the second sentences shows how we would typically say the same thing in English.

Simultaneity: "I looked at manga and she looked at novels"/"I looked at manga while/as she looked at novels"
Coincidence: "I went shopping and ran into a friend"/"I ran into a friend when I went shopping"
Sequence: "I got a haircut, [and] went to the bank, and went to the supermarket"/"I got a haircut, went to the bank, [and] then went to the supermarket"
Consequence: "I overslept and was late for class"/"I was late for class because I overslept"

So that's all the conjunctive form is. On an interesting random note, you might notice that in none of those examples does the first version sound unnatural, and might very well be used by native English speakers in addition to the more precise second versions (though of course it would have sounded extremely strange if I had only conjugated the last verb, like Japanese does). This indicates that even in English this kind of vagueness is used; and for that matter, there are ways of indicating some of those relationships explicitly in Japanese, as well - they just aren't always used.

Monday, April 28, 2008

Random Linguistic Fact of the Day

An affix is something in linguistics which attaches to the word it modifies. For example, the English plural suffix 's'/'es' is an affix, as shown in "books on the shelf". The suffix attaches to the word made plural - 'book'.

A clitic is something that attaches to something other than the word it modifies. An example of this in English is the possessive 's. Adding that to the previous example gives "books on the shelf's". Here, the possessive clitic is attached to 'shelf', even though the word it's actually modifying (the possessor) is 'books'.

This is the technical term for what Trique uses frequently (and I didn't know the name of until now). In Trique, personal and possessive pronouns often become clitics when following certain other types of words. Trique also uses clitic doubling, in some cases.

Friday, April 25, 2008

More Random Thoughts

Well, I've had two more random thoughts about English evolution (so far) today.

First, I realized that there is already a mechanism in common use for English to lose the past tense entirely. Figure it out, yet? I just used it. English has come to like to move auxiliary verbs ("do"/"did") before the subject in questions, e.g. "Did you figure it out, yet?" However, it's becoming common use to drop auxiliary verbs in English. This is one such case. As the auxiliary verb, not the main verb, carries the tense, this change leads to a loss of explicitly stated tense.

In the case of questions, the main verb is kept in the infinitive, which is identical in form to the present tense. Languages have a way of evolving based on analogy. It's not impossible that this could be applied to verbs in general, losing the past tense entirely (or at least a distinct form for the past tense; it's possible a periphrastic form, like "do"/"did" would then take it's place in all cases).

The third thought of the day came from me pondering the second one. This modern tendency to drop auxiliary verbs and sometimes the subject is not unique to questions. It's also applied to the perfect (e.g. "I've been" -> "Been") and the progressive (e.g. "I'm thinking" -> "Thinking"). If the latter case became the standard, the effect would basically be the same as with dropping "do"/"did". However, if the former occurred, it could result in the past participle replacing other forms, such as the perfect or past tense.

Here's where my random thought came in: the possibility of replacing the past tense is especially interesting, because it may have happened before in English. If you look back at Old English, you'll notice there are two distinct forms - past tense and past participle - for both strong verbs, which retain this distinction (verbs like "write"/"wrote"/"written"), as well as weak verbs (what became our modern regular verbs like "poke"/"poked"). The past tense for weak verbs, in Old English, was -de. Care to guess what the past participle was? It's -ed, the modern past tense suffix. In summary, the Old English part participle has become the modern English past tense form.

Random Thought of the Day

So, I was chatting with friends via instant messages, while doing some other things. I wrote the emote
*catches up on massive backlog of anime*
You might have noticed before how emotes tend to avoid the use of pronouns referring to the subject. If I had to take a guess, I'd say the reason for this is the fact that the sentence is actually first person (the speaker is also the subject), but the verb forms used refer to the third person; thus any pronoun referring to the speaker would seem out of place.

This got me thinking. While obviously omitting the subject for first-person sentences in spoken English would introduce ambiguity (as unlike in emotes, there isn't any indication that the speaker is referring to themself), in some cases, such as where the subject is the possessor of the direct object, this would not create an appreciable amount of ambiguity. Once convention takes over, you could say "He catches up on massive backlog of anime" (which is currently ungrammatical), and it would be understood that said backlog belonged to the subject. I wonder if we'll see this happen in the future.

For the skeptics, I note that this (not anything having to do with emotes, but rather the omission of the possessive pronoun) has already occurred for some nouns (especially body parts) in Spanish and Italian. For example, taken directly from one of my Spanish books, you would not say "El estudiante levantó su mano"(literally "The student raised their hand"), but simply "El estudiante levantó la mano" (literally "The student raised the hand", which is understood as belonging to the student).

Completely unrelated fact: In Indo-European languages that still have a male/female distinction for nouns, the word for "hand" is female.

Friday, April 18, 2008

Novel Method of Attack

Looks like the RIAA has just undertaken a novel new campaign against P2P.
"Are MP3s doing permanent damage to your ears?"

- sound bite in a commercial for the news, on an upcoming story. I had to stop what I was doing for a moment to convince myself that I hadn't misheard it.

Here Comes the Clue Train!

Poached from Slashdot Firehose:
Speaking at a Westminster eForum on Web 2.0 this week in London, Jim Cicconi, vice president of legislative affairs for AT&T, warned that the current systems that constitute the Internet will not be able to cope with the increasing amounts of video and user-generated content being uploaded.

"The surge in online content is at the center of the most dramatic changes affecting the Internet today," he said. "In three years' time, 20 typical households will generate more traffic than the entire Internet today."
...
"We are going to be butting up against the physical capacity of the Internet by 2010," he said.
Clue train says: Change happens; growth happens; why the hell have you waited this long to upgrade your network infrastructure?

Of course he then goes on to straight-out lie:
...the Internet only exists thanks to the infrastructure provided by a group of mostly private companies. "There is nothing magic or ethereal about the Internet--it is no more ethereal than the highway system. It is not created by an act of God but upgraded and maintained by private investors," he said.
The internet is paid for and maintained by the money of customers of those companies, who are the real "private investors", not the shareholders or executives like him. In some cases the internet infrastructure was even paid for directly by the government with taxpayer funds, then entrusted to private companies, who then go on to offer minimal service at maximum price by claiming they paid for what they're selling.

Q's pet peeve #5 (approximate number; lower numbers indicate higher hate): Companies who think that they can cope with the inevitable rise in demand for the internet simply by increasing overselling ratios (Comcast, etc.), blocking some types of traffic (Comcast, etc.), or charging for traffic (Rogers, etc.).

Oh, and while we're on the subject of greedy ISPs, I should note that this previous story has been recalled. The problem was found to be with a router device, not Comcast itself (this time).

Wednesday, April 16, 2008

Synthesis

One way of summarizing the characteristics of a given language, as well as to compare one language to another, is the index of synthesis: a measure of the degree of synthesis - the process of combining multiple units of meaning into smaller numbers of words - in a language. Thus, in other words, the index of synthesis is a measure of how much information is contained in each word.

The index of synthesis ranges from isolating to synthetic. Isolating languages contain exactly one unit of meaning per word, and adding additional meaning to some word requires adding other words that modify it. On the other end, a purely synthetic language (not known to exist), would have one word for each entire sentence.

English has been changing in favor of isolation for the last couple thousand years, and Modern English is close to the isolating end of the spectrum, with many words containing single units of meaning. Off the top of my head, I'm going to say English words other than nouns, pronouns, and verbs are isolating; for example, adjectives and prepositions contain only a single unit of meaning in each word. Verbs are becoming increasingly isolating, adding auxiliary verbs to indicate additional meaning to the base verb (e.g. 'should not have seen'). Nouns are also becoming less synthetic, as there are now only two flavors of most nouns: singular objective (e.g. 'cat'), and plural objective/possessive ('cats', 'cat's', and 'cats'', which are all pronounced the same when spoken).*

The romance languages are a little more synthetic than English. Spanish, for example, maintains distinctions between singular and plural, male and female (four forms total) in both nouns and adjectives; for example, 'gordo', 'gorda', 'gordos', and 'gordas' all mean the adjective 'fat', but they indicate singular male, singular female, plural male, and plural female, respectively. Spanish also contains more information in its verbs, chiefly the animacy level and number (singular/plural) of the subject.

Japanese is much further toward the synthetic end. In Japanese, some word types commonly undergo agglutination (agglutination and fusion are the two types of synthesis): the combining of multiple "independent" words into single words, when the context calls for it. For example, the pronoun 'watashi' ('me') can be fused with the suffix '-tachi' to form 'watashitachi' ('us'). Verbs are also fused with auxiliary verbs, commonly meaning things like the passive voice, causation, negation, or politeness; for example, the nine-syllable monstrosity 'kokoruminiawasezu' is formed from either 'kokorumi' ('trial') or 'kokorumiru' ('to test'; I'm not sure if this is the noun or verb, as they're both identical when agglutinated), 'niau' ('to suit; to match; to become; to be like'), '[sa]seru' ('cause to'), and 'zu' ('not'), from the Lord's Prayer meaning 'lead us not into temptation' (that's the popular archaic version of that line, but a more accurate Modern English version would be 'do not test us/our faith', which is closer to the Japanese version). As well, multiple types of words may be fused together, as in 'shakugan', formed from 'shaku' (this appears to be an abbreviation of 'shakuretsu', meaning 'burning') and 'gan' ('eye[s]'), from an anime called Shakugan no Shana ("Shana of the Burning Eyes").

A Turkish example from Wikipedia illustrates extreme levels of synthesis: 'Çekoslovakyalılaştıramadıklarımızdanmışsınız' (18 syllables, if I counted correctly), meaning "You are said to be one of those that we couldn't manage to convert to a Czechoslovak". Some people believe that Turkish is related to Japanese (the Altaic hypothesis), but this has not been conclusively proven.

* While English may have [sometimes large] agglutinated words such as 'antidisestablishmentarianism' (an agglutination of 'anti-', 'dis-', 'establish', '-ment', '-ary', '-an', and '-ism'), these are generally considered separate words, and so are not considered to be a synthesis of smaller words; this classification is supported by the fact that such agglutinations cannot be made freely, but must be words established through common use. In contrast, Japanese verbs may be freely agglutinated with other verbs/words that make sense (e.g. auxiliary verbs), without creating entirely new words.

Sunday, April 13, 2008

Random Linguistic Fact of the Day

Indo-European languages characteristically have distinct forms of verbs such that the verbs agree with some basic properties of the subject. English has almost entirely lost this property for most verbs (third-person singular being the only form that still agrees with the subject), although 'be', the most irregular verb in English, gives you a little bit of an idea of how things used to work:

I am
You are
He/she/it is
We/y'all/they are

In Spanish, the original (more complicated) Indo-European agreement system still exists. As is tradition for Indo-European languages, Spanish verbs agree in number and person (first, second, third) with the subject. At least, that's what most speakers and Spanish teachers will tell you. Here's the full list of forms for the indicative present:

Yo [I] creo [believe]
Nosotros [we] creemos
Tú [you, familiar] crees
Vosotros [y'all, familiar] creéis
Él [he]/ella [she]/usted [you, polite] cree (Spanish does not have a neuter gender, and inanimate objects are either male or female)
Ellos [male they]/ellas [female they]/ustedes [you, polite plural] creen

Now, there's something really weird about that system; did you see it? Usted/ustedes are second person pronouns, but verb agreement indicates that they're third person. How do we explain that?

This was something that mystified me until a couple years ago, when I bought the linguistics book I use as a reference, and learned about animacy/empathy hierarchies, which I've touched on before, and had one of those 'aha!' moments. The answer is that verbs don't actually agree with the person of the subject, but rather by the empathy level of the subject. It's very common in empathy hierarchies to see "first person > second person > third person" (which makes logical sense), so it isn't surprising that Indo-European verb agreement approximates the three persons.

What's out of place - that a second person pronoun is placed in the same level as third person pronouns - can be explained by noting that you would use 'tú' with friends, while you would typically use 'usted' with people you are less familiar with. You would clearly have greater empathy for your friends than random people you meet on the street. Thus it is not surprising that the familiar pronoun is higher on the empathy hierarchy.

Thus the actual hierarchy looks like this. There are five logical divisions, and they're listed from highest empathy to lowest. These five are then grouped into three discrete empathy levels, indicated by the numbers:

1. First person
2. Second person familiar
3. Second person polite
3. Third person
3. 'Fourth person' (this is a generic pronoun where we would use 'they' or the passive voice in English, not referring to anybody specific; e.g. "Se habla Español" - "Spanish is spoken")

Friday, April 11, 2008

Exercise for the Reader

Is it wrong to laugh at your own jokes?

So, tonight a friend asked me some hackingish-related questions, and what he was trying to do reminded me of a blog post I'd written a ways back. Looking back on the blog post, I thought it was amusing how many biology references/puns I used. Can you catch them all?

Thursday, April 10, 2008

E Terra Tree / Term Project Roundup

I already discussed some about my AI class term project: a language identifier. In this post I'll describe my other two term projects some.

In game programming class, we are making a tower defense game. This was decided by vote. We're still pretty early into development (still discussing gameplay design), so there isn't much to tell right now. It will be 2.5D, meaning that the gameplay will be two-dimensional, but the graphics will be three-dimensional. We decided to write it in C#/XNA, because XNA is a very nice self-contained platform for amateur game development (if you're looking to make a small-scale game, I'd definitely recommend XNA), and most of us have used it before in the introduction to game programming class.

In graphics programming class, I'm going to be making a commercial-grade space-partitioning tree for E Terra. Games have to do a lot involving spatial searches. Collision detection, range-finding, path-finding, and view frustum culling (only drawing objects that are actually visible on screen) are some of the things E Terra needs to do.

In the very beginning, I used a simple set to hold all objects on the playing field. This was just a temporary method, to allow me to work on other stuff before creating a space-partitioning structure. This worked okay when there were only a couple dozen things on the map, but as spatial search for anything in the set is O(n), obviously this would become a bottleneck as the number of units on the map increases. That was expected, and happened after not too long.

As I still didn't want to take the time to make a commercial-grade structure, I came up with something else that was much faster, yet still didn't take very long to code: a spatial hash table. The X and Y coordinates of each object on the map were quantized, and the objects were put into buckets corresponding to regular, fixed-size squares of the map. While this was still O(n), the actual time was much smaller, as the space which was searched in each case was much smaller than the entire map (with the maps I was using, things like collision detection were quite a few orders of magnitude faster). The problem is that this structure is only optimal if objects on the map are evenly distributed, which is extremely unlikely in a game (or anything, for that matter).

I'm still not certain which type of structure I'm going to use in the end, although I have it down to two: a kd-tree and a quadtree. Both are trees that partition space, but they do not partition space in fixed-size regions like my spatial hash table does. This allows them to maintain a roughly even distribution of objects per partition even when there isn't an even distribution of objects on the map, by creating more, smaller partitions in areas where there is a high object density. Search for the object nearest a given point/object is O(log n) for both; other algorithms are harder to give a definite complexity of.

A quadtree (diagram) is the simpler of the two. It's a two-dimensional space partitioning structure that separates a given region into four equal-sized squares, which may further be split. Thus it partitions the space by recursively splitting it in both dimensions at once. Partitions that have few units can be left large, while partitions that contain many objects can be further split into smaller partitions. The three-dimensional version of a quadtree is an octree, which partitions each region into eight equal-sized cubes.

A kd-tree (diagram from here) is a binary space-partitioning (BSP) tree. It also recursively divides the space in a region (this time into two parts), but the two partitions need not be equal in size. Furthermore, each partition may be along any axis (a kd-tree is a general structure that may represent spaces of any numbers of dimensions, and the logic is the same for any number of dimensions). The standard way of building a kd-tree is to partition each region such that half of the objects in it fall in one of the sub-partitions, and half in the other, producing uniform object density.

A quadtree has the benefit of using fixed-size partitions, making it very fast to add or remove partitions on the fly; this is beneficial for a game because objects often move frequently. As well, for maximum benefit of a kd-tree, the entire set of objects must be known in advance, to produce optimal partitioning. However, there is a fairly easy solution to these issues: for kd-trees that hold a dynamic set of objects, the tree behaves similar to a quad/octree, splitting each region in half as needed. Note also that because I require support for changes to the tree, I would use a kd-trie (diagram), rather than a true kd-tree (diagram; though I've been mostly using the terms interchangeably).

I'm actually leaning toward the kd-tree for several reasons. First, the same code can be used in any number of dimensions, making it highly reusable. Second, only minor modifications are necessary to make the kd-tree implementation work optimally for a non-changing set of objects yet still perform well for sets that change frequently (about as well as a quad/octree). Of course, a kd-tree will be more complicated to code, but I don't see that as a major deterrent.

Sunday, April 06, 2008

One Year Anniversary

Dear Gord, has it already been a year since I started this project? That's even lazier than I usually am. No, the project isn't abandoned (not quite); though I guess I'm lucky the project didn't get usurped while I've been off in my own little world.

I've finished cleaning up all of the code for the patcher DLL (the core of MPQDraft) and the SEMPQ stub code, and both are now in the repository. Unfortunately, as the GUI code isn't cleaned up yet, those parts aren't particularly useful apart from simply looking at how it works. I may have to start considering uploading the rest of the code immediately, and suffering through various laughing and insults for the icky code...

I've also added MPQDraft builds made with the code in the repository now and the code that hasn't yet been added, for testing. I did a substantial amount of code cleanup, and some portions were rewritten from scratch. I did some basic testing to make sure that it seemed to be working more or less, but it wasn't exhaustive, and I've only tested it with a couple of my own plugins.

Open-Source MPQDraft

For the Love of Kaity...

Recently, it has been observed that Comcast is disrupting TCP connections using forged TCP reset (RST) packets [1]. These reset packets were originally targeted at TCP connections associated with the BitTorrent file-sharing protocol. However, Comcast has stated that they are transitioning to a more "protocol neutral" traffic shaping approach [2]. We have recently observed this shift in policy, and have collected network traffic traces to demonstrate the behavior of their traffic shaping. In particular, we are able (during peak usage times) to synthetically generate a relatively large number of TCP reset packets aimed at any new TCP connection regardless of the application-level protocol. Surprisingly, this traffic shaping even disrupts normal web browsing and e-mail applications.

New traffic shaping can disrupt a Comcast Internet connection

I think I hear the entire Comcast tech support department committing seppuku.

From a technical standpoint, there are few options that are more idiotic than sending reset packets to kill BitTorrent connections, as Comcast was doing previously. They just did one: killing ALL connections in that manner. This option is so bad, in fact, that it leads me to seriously consider the possibility that Comcast is doing this intentionally to teach the FCC a lesson: that its resetting BT connections wasn't so bad. What could/should Comcast have done differently? Let's look at a few possibilities.

The best option (excluding improving their infrastructure) would be to monitor the amount of traffic going through each modem, and if a disproportionately large amount is coming from one modem, tell the modem to limit traffic rate for that one modem. Unfortunately, I'm told Comcast modems do not have the ability to change maximum speed without a reboot, so this isn't really possible.

Failing that, there is an extremely simple yet efficient method: simply start randomly dropping packets when there's congestion. While this may sound like a sarcastic suggestion, it's not. The TCP protocol uses two pieces of information to determine how fast to send data: the amount of buffer space the receiver has (used to prevent the sender on a fast connection from flooding a receiver on a slow connection), and dropped packets. If a dropped packet is detected, TCP lowers its send rate. And if BT is a large proportion of traffic, randomly dropping traffic will result is a larger number of BT connections getting throttled than non-BT connections.

Thus this is a simple and easy way to effectively lower the amount of data being sent in a way biased against heavy bandwidth users, without interrupting any of the connections. This works even better if their Sandvine hardware can detect BT traffic and selectively drop packets only from BT traffic. As long as they weren't dropping so many packets that BT slowed to a crawl, I wouldn't mind my ISP doing this.

Thursday, April 03, 2008

Sidebar Addition

For those who actually watch anime, I've added a list of the anime I'm currently watching to the sidebar on the right. I'll have to remember to update it as the info changes.