Search This Blog

Sunday, July 22, 2007

Middle East Loses What Little Was Left of Its Sanity

A few weeks ago, 14 squirrels equipped with espionage systems of foreign intelligence services were captured by [Iranian] intelligence forces along the country's borders. These trained squirrels, each of which weighed just over 700 grams, were released on the borders of the country for intelligence and espionage purposes. According to the announcement made by Iranian intelligence officials, alert police officials caught these squirrels before they could carry out any task.
- Iranian newspaper

Word spread among the populace that UK troops had introduced strange man-eating, bear-like beasts into the area to sow panic.

But several of the creatures, caught and killed by local farmers, have been identified by experts as honey badgers.
...
UK military spokesman Major Mike Shearer said: "We can categorically state that we have not released man-eating badgers into the area.
- Iraqi rumor

"We can categorically state that we have not released man-eating badgers into the area."
That is just the most godly quote ever.


Aww, aren't they cute? You'd never know they're man-eating badgers of death!

Saturday, July 14, 2007

Trique

Among various other things, the last few days I've been looking a bit more at the Trique language. I even started to write a mini-dictionary based on what I can make out from this Bible, which currently has several dozen words; this is more to help me remember the meaning of words I've figured out (and to remember meanings I've assigned them in case a different passage contradicts that meaning and I have to reconsider it) than a benefit for anyone else. Between what I've learned on my own and an article my grandpa sent me (I've actually tried to keep the information I acquire from my grandpa to a minimum, as I want to figure out as much of the language as I can on my own, just by reading this Trique Bible) on Trique morphology (the way words change form in different circumstances) which covers some of the most complicated grammar, I thought I'd write a blog post on some of the interesting features of Trique - those that are notably different from other languages, especially common Indo-European ones like English. At some point I might try to expand this post (and have it reviewed for correctness by my grandpa, a linguist fluent in Trique) into a Wikipedia entry on Trique.

First, my grandpa believes that Trique was initially an isolating language, though through various fusions it has become somewhat inflected (words change form in different circumstances). An isolating language is one in which each word carries exactly one piece of information. For example, while English indicates the plural of a noun with the suffix -(e)s, an isolating language would either have a separate word that indicates that the noun is plural, or only express plurals through numbers (Japanese is like this, although it isn't an isolating language; in Japanese, the number of nouns is usually not mentioned at all, and when present is usually simply specified by a number word). Likewise, while English specifies tense with various suffixes like -ed, isolating languages, if they specify time at all (some languages don't have tense at all), will add an additional word that indicates that the verb is a specific tense; Chinese is like this, though I don't have a specific example.

Trique stills resembles an isolating language, though it has acquired some degree of inflection. For example, plurals are indicated by the addition of additional words; I'm not sure if these are particles, adjectives, or something else. For example, the third person pronoun in Trique (it's base form, anyway) is "si3" (numbers indicate tones). To indicate the third-person dual (two people/things), you would say "ngüej5 si3". To indicate the third person plural (I'm not sure exactly when you start using the plural; I've only seen 'one', 'two', and 'many', though it's possible I jut haven't seen 'three' or 'four', etc.), you would say "nej3 si3". See how that works?

Unfortunately, Trique has become more complicated over time, in a fairly systematic way. There is evidence to suggest that prefixes and suffixes have developed through fusion - what was at one time two distinct words have become fused together, with the modifier becoming a prefix/suffix (both of which are seen in Trique). While some affixes are more obvious than others, in general this has severely degraded the "purity" of the language. For example, the verb 'ask' is, in its simplest indicative form, "achin21". The same verb in the simplest anticipatory form (don't ask my what the anticipatory form is; I don't know, yet - that's just what my grandpa called it) is "ga5chinj5". Even more interesting, you could say "I ask" (indicative mood) using the fused form "achin23" (there is no longer a separate word for 'I' in the fused form). As you can see, in this case, whatever the original suffix was, the sounds are completely gone, leaving only a change in tone. This should give you an idea of what I meant by saying this has made Trique morphology rather complicated.

Even more interesting, however, is a system of anticipatory inflection (I'm not sure what the technical term for it is; perhaps some strange kind of assimilation?). That is, where a word changes based on the word that follows it, even though the two words do not fuse like they did in the previous paragraph. For example, recall the dual marker mentioned earlier. Most of the time it takes the form shown previously; however, if you combine it with "re'5" to form the second person dual pronoun ('you two'), you get "ngüej5e3 re'5" - the form of the number has changed, even though it hasn't fused with any prefixes or suffixes. As far as I can tell, this is merely interaction between two particular words in a particular order; it doesn't appear to have any inherent meaning.

Modern Trique appears to have two cases: a standard case and a possessed case. The former is used for words that stand alone (or are the possessor of something else), with the various roles this case can play (subject, object, etc.) indicated by word order. The latter is used for nouns which are possessed by something else (the noun immediately following them, in all the cases I've seen). While the latter seems to be unusual in its own regard (Wikipedia lists only Tlingit as having a possessed case), what's even more unusual is the fact that some words (namely family relation nouns) in Trique have both inclusive and exclusive possessed forms (I wonder if you'd call those separate cases or the same). The difference is made obvious by comparison of two examples: "nej3 dinï1 [brothers of] si3 [him/her]" ('his brothers') and "nej3 dinïj5 Judá" ('Judah and his brothers').

My first "major" discovery, when I was just starting to look at this Trique Bible (and still didn't know any of the language, nor had I talked to my grandpa about it, yet), was learning how quotes - sequences of direct text one person says - are handled. A quote from Mark 6:37 illustrates this structure:

English: But Jesus answered, "You give them something to eat."
Trique: Sani4 gataj34 Jesús gunï3 nej3 si3, Ga'ui'5 nej3e3 re'5 si3 xa4 nej3 si3. Daj4 gataj34 so'2 gunï3 nej3 si3.

With a quote this short (and the Trique info I've already given you), the you should be able to make out the structure, though probably not the entire meaning. You can see that the quote is both preceded and followed by the phrase "gataj34 x gunï3 y"; this means "x said to y" (you could see that if you had more examples available to you). That is, Trique begins quotes by first saying who's saying what [to whom], and then concludes quotes by reiterating that fact (though one of the two may be omitted in short quotes; I actually did some looking to find an example that was both short and followed the typical customs); you can't see it in a quote this short, but usually the former will use pronouns, while the latter uses the full names of the speaker and listener. This was also how I learned the two basic third-person (first and second person pronouns were discovered somewhere else) pronouns and how plurals were constructed.

However, "x said to y" is not a literal translation. I knew from the beginning that "gataj34" meant 'said', but at first I thought that "gunï3" was simply a marker that indicated who was the target, experiencer, or some other role. This turned out to be incorrect. In fact, "gunï3" is a verb - to hear. What this is literally saying is "x said, y heard". This illustrates one of the interesting features of Trique: using short clauses with (preferably) intransitive verbs in series to form more complex thoughts and sentences.

This serial clause construction is also often used in places where languages like English would use relative clauses (I believe Trique does have relative clauses; they just aren't used as often), as is illustrated in the first line of the Lord's Prayer, which I posted previously. Another example (John 1:42) is also given, for a bit different structur:.

English: Our Father in heaven
Trique: Drej3 [father of] nej3 yunj2 [us] huuin2 [are] re'5 [you] nne2 [live in] re'5 [you] xata'4a [heaven].

English: You are Simon, son of John. But you will be called Cephas (Cephas means Peter).
Trique: Hue2 re'5 [you] huuin3 [are] Simón. Ni4 da'ni1 [son of] Juan huuin2 [are] re'5 [you] nej3. Sani4 hue2 re'5 dugu'na23 Cefas. Ni4 Cefas ruhuaj3 gata3 [means] huuej3'e [I'm guessing this is 'rock'].

Monday, July 02, 2007

Types of Multiprocessors

Following the Standard Operating Procedure for deciding what to write about for my blog posts (namely, whatever I have a whim to write at a particular moment; explains a lot, doesn't it?), today I'm gonna talk about the different types of multiprocessors. Multiprocessing refers to the parallel execution of one or more programs in multiple instances; more specifically, I'm going to talk about architectures where you explicitly code for parallelism, as opposed to implicit things, like the multiple arithmetic units present in most modern CPUs. There are lots of different types of multiprocessors (speaking from an architecture standpoint), but traditionally (before EDGE came along) they fall into one of these categories I'm going to describe. These categories form a neat gradient from single processing to processor clusters.

The class of processors most resembling single processors is the Single Instruction Multiple Data (SIMD) processors - processors which operate on multiple data items using the same sequence of instructions. Vector processors, the simplest SIMD processors, allow you to store multiple data items in large registers (vector registers), then perform a mathematical operation on all data items in a single instruction. I gave an example of this a while ago, when I made an MMX function to convert a string to uppercase in blocks of eight characters per instruction (MMX registers are 64 bits). The eight characters were loaded from (contiguous) memory in one instruction, several mathematical operations were performed, each in one instruction, and ultimately the block of upper-case characters was written back to (contiguous) memory in one more instruction.

The problem with vector processors is that they can only read vectors from memory at regular intervals - either contiguous memory, like MMX/SSE, or every X bytes, as I'm told Cray computers can do. If you don't like it, you can manually load each value into a general register, then push it into a vector. Obviously this will be slow, as it requires a couple instructions per data item, as opposed to loading a single vector in one instruction. Though you may not have much choice, if all you've got is a Core 2 with SSE3.

Stream processors, taking this SIMD idea one step further, answer this problem. While vector processors are like a single CPU operating on an array of data items, stream processors are like an array or processors, each with individual word-sized registers, working on the same sequence of instructions. While both operate on the same sequence of instructions, each pipe (for lack of a better word) in a stream processor has its own set of registers. If, for example, a particular register is a pointer, if each pipe has a different value in that register, that pipe will load a different value from memory and operate on it; what's more, the reads and writes to memory need not be contiguous. This is how Graphics Processing Unit shaders work.

Of course, stream processors still have the limitation that there's only a single program, even if it's being executed in parallel. To that end, we next have Multiple Instruction Multiple Data (MIMD) processors, or, more specifically, Symmetric MultiProcessing (SMP). SMP systems consist of two or more relatively independent processors or cores sharing a common bus and memory, each executing their own program. 'Symmetric' refers to the fact that all CPUs have equal access to the shared hardware; (with some exceptions) a request to something like memory will be processed exactly the same, regardless of which processor made the request. This is what current x86 CPUs are, though I've heard that AMD's next x86 architecture will be NUMA.

The problem with this is that there's an obvious bottleneck: the shared bus (or, for more innovated systems like the Opteron, which lack a shared bus, the memory itself). As the number of processors and memory or other hardware accesses increases, this becomes a big problem for performance.

Consequently, systems with more than just a few processors are often Non-Uniform Memory Architectures (NUMA). On a NUMA, there are two or more banks of memory, located at different places on the ystem. It's possible that there may be one bank per X CPUs (perhaps those X CPUs and memory bank reside on an expansion card), or one bank per CPU. In either case, each processor has access to all memory in the system, just like with SMP; however, the access time to access a particular location in memory varies, based on which bank that address resides in. If each CPU has the data for its program relatively self-contained in its local memory bank, so that requests to other memory banks are rare, this allows significantly better performance in parallel systems.

The last bottleneck remaining in NUMA is the fact that any processor can request reads or writes from any memory bank. While this is good from a programming perspective, as the size of the system continues to grow, there comes a point where you can't afford to make remote memory requests - it's more efficient to perform explicit communication between processors (either synchronously or asynchronously) and keep a copy of all the data one processor needs in its own memory bank. This setup is called a cluster, and is now very common in distributed systems. Each processor (which is an entire computer, in the case of distributed computing), runs independently, but uses explicit communication to optimize costly transfers of data between systems. This is perhaps the hardest to program when dealing with a program that can't be neatly decomposed by domain (having each computer process a portion of a common amount of data), but is becoming increasingly necessary.

However, while clusters are often composed of whole computers, that isn't a requirement. For a rather peculiar example of a cluster-like system on a single chip, we need only look at the Cell - the CPU of the Playstation 3. The Cell is composed of one main core, which has access to main memory, and 8 mini-cores, each with (and limited to) its own local memory bank; blocks of memory are copied between the local banks and main memory by explicit asynchronous Direct Memory Access (DMA), analogous to asynchronous network I/O in network clusters. The combination of the 8 mini-core memory banks actually residing on the chip itself (cache, in other words) and memory copying done asynchronously allow the Cell to process at blazing speed, with all memory accesses taking only as long as a cache hit (by definition). But between the architecture itself, and the fact that game data tends to be interdependent, making breaking games down into that many threads very difficult, the Cell is incredibly (and notoriously) hard to make full use of. This makes the Cell a peculiar choice for a game system, and it would probably be better suited to scientific applications, with simpler computations in large quantities.