Search This Blog

Wednesday, February 24, 2010

& Content Filtering

Following a brief scuffle with Patrick Ross in the comments of the Copyright Alliance blog, I thought the topic I briefly discussed deserved elaboration: content filtering. That is, the analysis and identification of illicit content packets as they pass through a router, based on hashing (not so effective) or fingerprinting (more effective), usually followed by dropping the packet.

I'll discuss the full range of content filtering, though one thing at a time.

First of all, the specific type I referred to previously: filtering at the internet service provider level. Imagine an ISP that is being legally pressured to do something about file-sharing by copyright industry representatives, a situation hundreds of ISPs around the world find themselves in at this very moment.

Now, Mr. Sales Rep, from another company that makes deep packet inspection hardware, offers to sell a product to Mr. Business Suit from the ISP, that will solve the file-sharing problem, as many sales reps are currently doing. This company's tests show that the product is 95% effective at identifying and blocking traffic containing illicit copyrighted material, and has a low rate of false-positives. Naturally, Mr. Business Suit at the ISP looks at this product, and sees an amazing solution to all of their problems. Mr. Engineer at the same ISP looks at the product and sees a million-dollar paperweight, and tells Mr. Sales Rep to get out of his office. Nevertheless, convinced by Mr. Sales Rep, Mr. Business Suit purchases the product, and has Mr. Engineer install it.

Now the product has been installed, and everybody watches eagerly, as Mr. Engineer turns the new product on. Immediately the product begins logging transfers of copyrighted content by the thousands, and successfully blocks them. Yet Mr. Engineer looks at his network statistics and sees that not only is the product having 0 effect on the amount of internet traffic, there is still just as much illicit content being successfully uploaded by users of the ISP.

What could possibly have gone wrong? And why were the appraisals of the product so drastically different between Mr. Business Suit and Mr. Engineer, to begin with? Is the product defective? Did Mr. Sales Rep lie?

Well, not exactly.

What happened is that the ISP's users adapted effortlessly to the new piece of filtering hardware. While it's certainly viable, if implemented competently, to detect and block things like copyrighted content, this is only possible if you have access to the data being transmitted. The universal Achilles heel of such identification algorithms is encryption.

Modern file-sharing software supports end-to-end encryption - the same kind used to secure credit card transactions online: the uploader encrypts the data, the downloader decrypts it, and nothing in the middle can access the data in between the two, because nothing else has the encryption key. This "nothing" includes that million-dollar product our ISP just bought.

Now, this encryption is not a technology that needs to be developed, nor does it need to be downloaded and installed by the user. It's already there. If a user is able to share files through a P2P application, the encryption code is already in that P2P application; it needs only to be enabled by a user clicking a check box. And, of course, you can be certain that it will be turned on by default in future versions of said software if content filtering by ISPs becomes common.

In other words, each of those "blocked" uploads the product registers is merely the first of two attempts. A blocked upload is merely an upload that will succeed seconds later, after the user clicks the box to enable encryption (though if content filtering is widely deployed, users won't even need to do that). Thus, to make a long story short, while you have successfully prevented file-sharers from uploading unencrypted illicit content, you haven't actually prevented a single copyright infringement.

This is a theoretical problem, not an implementation issue. As such, there is no basis to hope that this is a limitation that will ever be overcome in the future.

But look on the bright side: Mr. Sales Rep got a nice commission off that million bucks the ISP paid his company, and as he technically never lied, the ISP has no legal recourse to argue fraudulent advertising.

However, the fact that ISP-level filtering is a technological dead-end should not be taken to mean that all filtering technology is useless. As stated, filtering technology can be effective, given that it has access to the data. Of course, the hackers of the world will continually work to find new ways to evade such filtering algorithms, but it should still be possible to successfully filter enough to justify the cost of the filtering hardware/software.

One example where this works to a satisfactory degree, both in theory and in practice, is YouTube. Because YouTube actually processes and decodes the content uploaded to it, it's impossible for it to not have access to the data - it couldn't function otherwise. As such, it always has access to the full, unencrypted content uploaded, at which point filtering of that content is possible, and in fact is already being performed.

Dumb file storage sites - sites like RapidShare - which store data without any regard to what type of data it is, fall somewhere in the middle. As they do not require access to the data itself, encryption is entirely possible, and would indeed be capable of evading any filtering of uploaded content done on the part of the site. However, use of this type of encryption would be much more of an inconvenience than is the case with encryption in P2P programs; in this case, encryption must be done manually, by the user, through a completely separate program (almost anything that can make ZIP files can encrypt them, for instance), and the encryption key must be distributed through other channels, such as forums that link to the encrypted file. As such, while filtering at the level of such sites will certainly not prevent such encrypted transmission of content (nor probably even a majority of total transmission), it's possible that filtering systems might reduce sharing of illicit content by some sufficiently valuable fraction by means of sheer annoyance.

No comments: