Search This Blog

Sunday, August 26, 2007

Writing Systems and...

As with most of my posts, this post is written on a whim. Specifically, I just happened to be talking with someone on an IM about this topic earlier today, and I had the whim to write a post about it.

There are four major types of writing systems: alphabets, abjads, syllabaries, and ideographic/logographic systems (those two are not the same thing, but for our purposes they're in the same category). You should be very familiar with alphabets. Alphabets are writing systems in which, more or less, each character corresponds to a single spoken sound, though the nature of language change makes it impossible for there to be a 1:1 relationship between sounds and letters indefinitely (some alphabets originally did have a 1:1 relationship). Some common examples are the Roman and Greek alphabets, Roman being perhaps the most widely used writing system in the world.

Next are the abjads. I believe I mentioned these a few posts ago. These are incomplete alphabets - that is, only some sounds (typically consonants) are written, and the rest are omitted. Sometimes additional characters may be represented by diacritic marks. Arabic, Hebrew, and Tengwar are examples of this. Arabic may, depending on the writer and the application, either omit short vowels (long vowels have characters just like consonants), or represent short vowels with diacritic marks (these marks are what give Arabic its distinctive glittery appearance, in cases where short vowels are written). Some writing systems may have an implied vowel after each consonant, and diacritics are only written when the vowel differs from the implied vowel; some may even have characters for all sounds, but omit some at the discretion of the writer.

Syllabaries, also mentioned previously, consist of one character for each syllable possible in the language (though they are also subject to loss of 1:1 correspondence due to language change). The syllabaries most familiar to me are the Japanese hiragana and katakana, although there are others.

Finally, we have a class that I don't know of a single name for, and include systems of ideographs and logographs. Logographs are when a single character represents an entire word - that is, there is a (near) 1:1 relationship between words and characters; the most well known logographic system, and the other contender for most used writing system in the world, is the Chinese writing system. Ideographs are similar, but in this case each character represents some idea or abstract concept. Contrary to common misconception, Chinese is not an ideographic system; however, the Japanese use of Chinese characters bears some resemblance to an ideographic system, where words frequently use multiple kanji (which is part of what makes the Japanese writing system harder than the Chinese system, as I mentioned in the distant past). For example, the Japanese word for goat (山羊) is written mountain (山) + sheep (羊).

Finally, there are complications/impurities in all of these. For example, some writing systems, such as Devanagari (an abjad with implied vowels), has one character for each consonant/syllable, but also has some characters which represent combinations of consonants/syllables; in other words, some characters may be combined to form entirely new compound characters in nontrivial ways.

Next post, resolve willing, I'll get to where I'm going with this (and why I've written this post so briefly and hastily - it's only background for the main topic).

Saturday, August 25, 2007

Thanks (not)

Got my monthly notice that my monthly tuition payment was due, a couple days ago.

Statement Date [this is the date printed on it, not the date it was actually received]: 08/17/07
Payment Due Date: 08/14/07

Talk about helpful. Makes you wonder why they bothered to send it at all; could have saved some on postage.

Tuesday, August 21, 2007

Awesome!

Blizzard negotiating with researchers for virtual epidemic study

Around this time last year, a strange phenomenon struck the virtual inhabitants of World of Warcraft. A disease designed to be limited to areas accessed by high-level characters managed to make it back to the cities of that virtual world, where it devastated their populations. At the time, Ars' Jeremy Reimer noted, "it would be even more interesting if epidemiologists in the real world found that this event was worthy of studying as a kind of controlled experiment in disease propagation." The epidemiologists have noticed, and there may be more of these events on the way for WoW players
...
On balance, the analysis in Epidemiology felt that virtual worlds might provide a useful supplement to traditional models of disease spread, and suggested working with game programmers to test a variety of disease conditions. "Multiplayer online role-playing games may even be useful as a testing ground for hypotheses about infectious disease dissemination," the author said, "Game programmers could allow characters to be inflicted by various infectious diseases, some of which may not be visible to the player, and track the dissemination patterns of the disease in specific subpopulations." It looks like something of the sort is in the works. A report from the Agence France-Presse indicates that Nina Fefferman, a researcher from Tufts University, is currently negotiating with Blizzard about running epidemiological tests in WoW.
Maybe I should go apply at Blizzard, now :P

Thursday, August 16, 2007

Everything Coming Up Roses

So, I've arrived back at home, after spending the summer at my job from last summer (the place Skywing works, and invited me to). Fortunately, the trip home was very uneventful (no jet engines falling off, getting flown into major landmarks, or anything). So, one of the first things I have to do is catch up on the stuff I've neglected over the summer, for the reason that it would have been difficult to do them away from home.

First, I checked what manga came out over the summer. But did that surprise me (and in a good way). It appears that two series I really liked (Gunslinger Girl and Yotsuba&!), and I thought had been discontinued (the last releases were over two years ago), have been resumed. Also, House season 3 is coming out very shortly, and a few other things that were expected also came out, as well. I guess my birthday came exactly 1 month early, this year :P All together, I'm ordering the following (and I'd recommend all of these series, with the possible exception of GTO: Early Years):
- Crest of the Stars novel (the manga based on the novel) part 3
- Death Note volume 12
- Gunslinger Girl volume 4
- GTO: The Early Years volume 4
- School Rumble volume 6
- Yotsuba&! volume 4

Also, as I've got more money (not that I didn't have money before; I was just too cheap to spend it), I'm gonna get some soundtracks I'd been putting off for various reasons. Lo and behold, it appears that there's a new Wild Arms 1 soundtrack from last year. The original "OST" of Wild Arms 1 was released with the game, in 1997 (that's actually more recent than I was thinking). However, there were a number of problems with it. If memory serves, it used different arrangements and synthesis than the original music from the game; as well, it only included about half the tracks. This new one has all tracks and is at least close enough to the game version that I can't tell it apart from memory (I had no problem telling the "OST" from the game). So, I'm buying:
- Final Fantasy XII OST. Had downloaded this and really liked it, but put off buying it due to laziness.
- Wild Arms Complete Tracks
- Xenogears OST. I never bought this because I ripped all the music straight from the console; but I suppose I should buy it as a token gesture at some point.
- Xenosaga I OST. Part of the same series as Xenogears, and by the same composer. Downloaded this to try it; while I'm not as fond of it as some others, I suppose it's worth the money, at least.

Lastly, I should list the various anime I've been watching/manga I've been reading over the summer. Describing each one would be too involved for my laziness, so I'll just let Anime News Network do the talking.
- Bleach
- Bokurano
- Busou Renkin
- Code Geass. Only got this one from Squid; haven't watched it, yet.
- D.Gray-Man. Don't ask me what's up with the format of the name.
- Death Note
- Lucky Star
- Negima
- Pumpkin Scissors. Strangely endearing.
- Skip Beat. *shrug*
- So Long, Mr. Despair

And now I need to go send in my two mice for warranty replacement. One as a tendency to fall asleep at inappropriate times, and the other tends to wander.

Sunday, August 12, 2007

& Debates - Quantum Physics

So, I had a random thought that started a debate thread on Star Alliance (remember that from way, way back?). And boy is it a whopper. Registration on the forum is required to participate, so I'll copy some of the bigger posts in the debate here.

The opening post:
So, I had a random thought; that's, of course, rarely a good thing. Now, let's see if this can turn into a full debate. *ahem* Have you ever considered that some of the most puzzling aspects of quantum physics could be logically explained by the universe being a computer simulation? Let's go over a couple examples.

- As best we can tell, mass, distance, and time all appear to be quantized; that is, they're integer values. Any computer constructed by a physics system remotely like ours is only capable of representing quantized values.
- One of the hardest to grasp concepts in quantum physics is that variables associated with things, particularly subatomic particles, don't appear to have values assigned until that variable is actually used, and values can even be lost once they are assigned. When those variables do not have values assigned, they are represented simply by probability distributions, with the actual value chosen randomly when it is needed. There's a saying in computer science (one of those things that you should be careful not to take too absolutely) - never store anything you can recalculate later; quantum physics appears to take this one step further, not bothering to store anything that you don't need at the moment. The point, of course, to massively reduce the amount of memory required for the simulation by only storing essential values.

Input?
My most recent post:
I myself have considered that the Planck length and Planck time could be a limit to the universe's "resolution".
Of course. As far as we know at the moment, the Planck constants are the resolution of the universe.
If the universe was a huge simulation, would everyone else be part of the simulation, or separate entities within the simulation, or all we all part of the simulation without any free will.
Unknown. There does appear to be full simulation of individual units (unknown exactly what that is; protons/neutrons/electrons/photons, quarks, etc.) in some cases, but it's possible there are optimizations to process clusters of those as a group. Perhaps the fact that particles also behave like waves is a trick to allow the behavior of particles to be calculated in bulk at once, using simpler computations. There are other examples, as well. Like some things cells do seem odd, in that it doesn't entirely seem like all the atoms in a cell are working independently; that in some things the cell or part of a cell seems like it's acting as an single unit.

Of course there are counter-examples, as well. If the simulation was abstracting as much as possible, it's strange that Brownian motion would exist, as it indicates individual atoms are being processed in a case where it would be logical to abstract them into a group.
Quote
Are we all bots on a counterstrike server, is just one of us a bot, or are there no bots?
There would be at least three basic possibilities, which could be combined:
- We are being controlled by players, either directly (e.g. an FPS or RPG), or indirectly (the player is able to manipulate things like basic predispositions, though actual actions are a result of physical processes using those basic predispositions; think like Populus). Possible, but seems less likely to me than the other two.
- We are entirely naturalistic constructions. That is, there is nothing different about us than anything else in the simulated universe; we are nothing more than a result of the laws of physics being simulated. This is the atheist/naturalist world view.
- We are programmed constructs - AIs. We have programs which operate independently but within the constraints of the laws of physics.

But holy crap. Now THAT is an interesting idea. Obviously you could call the programmer(s) God (in the monotheistic, omnipotent sense). But it's also possible that different programmers and/or players (if such a thing exists) could form an entire pantheon. In that case, it's entirely possible that every god that has ever been worshiped throughout history actual does exist (or existed at one time; it's possible gods "die off" as players/programmers lose interest in playing them).

Thursday, August 09, 2007

Of Codes and Languages - Trans-Roman Alpha

For those who haven't played it, World of Warcraft contains 10 different playable races organized into 2 factions. Each race has its own distinct language, with 1 on each faction known by all races of that faction. Most text spoken by players or other characters in game is in a particular language. If you know that language, it appears in English (or whatever language) exactly as it was typed/said. If you don't know that language, it gets translated into the language of the character that said it, making it unintelligible to you. Besides lending a touch of realism to the game, this also serves as a language barrier to prevent communication between opposing factions.

For example, the following items are spoken by some of the enemies in one instance, in Common - the native language of humans, and, as the name suggests, the common language known by all races on the Alliance:

Common: Andovis ras waldir
English: Release the hounds!

Common: Ras garde hamerung nud nud valesh noth. Hir bur dana bor.
English: The light condemns all who harbor evil. Now you will die.

If you carefully draw a line through a handful of data points, you can get an idea of how this works just from these two examples. Blizzard has created a small (several dozen words) vocabulary for each language - approximately 6 words of each length. When translating text, each word is processed individually; the word is hashed and used to choose a translated word of the same size, in a lossy, many-to-one relationship. An elegant, simple but effective algorithm.

This got me thinking - could you create a coding system such that you could reversibly encode data in something that looks like a foreign language? The point, of course, being to use the fact that it looks like a language as a decoy, while the information is actually in something like an encrypted form.

My first attempt at this was an algorithm I called Trans-Roman Alpha ("trans-Roman" because it used the Roman alphabet). This was an extremely simple algorithm: reversibly convert a word into a numeric form (basically treating it as a base-26 number), then decoding it in an opposite direction, using a different mapping of "digit" to letter. A few other complications were also added in, such as word fusions and splitting to hide the original word lengths. Some familiar phrases, in Trans-Roman Alpha:

"pqr bxq pgy psc dywddw jjf"
"psc dynts bl dytg djp mckgy bzz cy gcxy sdwcn jfydkltd yd htc yn r vy lzt fcypt jc"

As you can see, the fact that the algorithm is too dumb to form pronounceable syllables means that the best the algorithm can do is to either work with syllabaries like Hiragana, which represent one syllable in each character, or to use an abjad writing system: a system where only consonants are written. In the latter case, the resulting phrases would be far longer than the source text, making it somewhat impractical.

While use of a syllabary would not have the problem of length (and in fact would be about the best you could do with this algorithm), both it and the abjad solution have a more significant problem: both will generate a fairly even distribution of characters, in a nearly random order. This is entirely unlike real languages, which do not form an even distribution in this regard, nor do they occur in random order (though encryption systems do both). Such a system would be unlikely to hold up to the most basic tests used to identify languages (or at least make a best guess), and so would not be particularly likely to fool anyone knowledgeable about the topic.