Thursday, June 05, 2008

Empirical Data and the RIAA

A bit ago I wrote up a rather lengthy list of factors which could, in theory, produce false-positives in identifying users sharing copyrighted files via peer-to-peer programs. Most of these risks could be mitigated by thorough investigation, though I noted that as the RIAA clearly cuts every corner they can, it's likely that few if any of these mitigating measures are taken in actual investigations.

Now the University of Washington has demonstrated some of these risks in actual occurrence in their project Tracking the Trackers: Investigating P2P Copyright Enforcement. While they've only looked at a couple of the risks I suggested, the results show quite a few false positives, indicating that my prediction that measures to minimize these risks are not being applied was accurate.

The research paper is here, if you don't want to go through the project's web site itself. The New York Times blog has also picked up this story. They also have a cute logo/illustration:

This was actually a study I've been wanting to see done for some time. The other study that I think is very important but has not yet been done is to determine empirically how, on a system like eDonkey, where users search all peers for a certain file, the number of requests a single computer gets for a single file varies with the popularity of the file. The basis of this investigation is the claim by RIAA and others that users could be sharing thousands or millions of copies of each copyrighted work, therefore constitutional limitations on civil damage awards do not apply.

Clearly files that are popular (e.g. the latest hit song) will be downloaded more (in total) than files which are unpopular. But does this mean any single computer will upload popular files significantly more often than unpopular files? I believe the answer is no, for the reason that because the files are more popular, not only are they downloaded more, but they are also available from more computers. In theory, the increase in demand is accompanied by a proportionate increase in supply, keeping the ratio invariant regardless of demand. According to this belief, I have argued on forums (one example here) that most of the people the RIAA has sued have, according to simple probability, not uploaded more than a single copy of each file, on average (so about $0.70 of damage per file, if you assume 1 download = 1 lost sale, which itself is highly suspect).

