I was just reading the paper Analysis of Approximate Nearest Neighbor Searching with Clustered Point Sets, which, as the title indicates, performs performance analysis of 3 different kd-tree splitting policies. The policies used are the standard policy, the sliding-midpoint policy, and the minimum-ambiguity policy, which is new in this paper.
The minimum ambiguity method takes into account not only the full set of data points, like all the others, but also takes into account a set of training regions which represent the distribution of regions to find points within - that is, the future searches themselves. As with the other methods, the goal of the algorithm is to minimize the average number of nodes in the tree that overlap each search region; however, when both the searches and data points are known, the minimum-ambiguity method can do it better than the others.
Two different scenarios analyzed are of particular interest. In all cases the data points were clustered; the two correspond to the distribution of the training regions: the same clustered distribution as the data points, or uniform distribution. In the case of both using the same clustered distribution, the minimum-ambiguity policy > the standard policy > sliding-midpoint policy (here using the internet use of '>' as "is superior to"). In the case of searches distributed uniformly, the sliding-midpoint policy > minimum-ambiguity policy, with both far superior to the standard policy.
So, what's it mean for writing a kd-tree for a game? Well, it provides some pretty interesting information, though it doesn't change the bottom line. As mentioned in my paper, more complex splitting policies like sliding-midpoint and minimum-ambiguity are only viable for data sets that are essentially fixed. In a game, this corresponds to immobile objects that are either unchanging (e.g. cannot die) or change extremely infrequently; in E Terra, this corresponds to doodads - objects which take up space but do not have any function - and immobile game objects such as grass (which is edible but not consumable).
As also mentioned previously, the distribution of points is not expected to be uniform - it's expected that there will be clusters of things at various focal points on the map. Furthermore, in the case of mobile objects, the search distribution will roughly equal the distribution of the data points themselves.
Unfortunately, neither of these facts is useful to us. Despite the mostly known distribution of searches, we cannot use the minimum-ambiguity policy in any of our trees because the set of search regions - corresponding mostly to the mobile game objects - is dynamic. Furthermore, it wouldn't be of any particular benefit to use the data points in the static trees as the search region distribution, as the majority of searches will be from the mobile objects, for things like collision detection and sight radii.