Thursday, September 30, 2004

File sharing data analysis

The RIAA has recently made a big deal about targeting file sharing service users who download music, and has said that they've been making progress in reducing the amount of music that is downloaded. Let's hear it in their own words:
"Last year, illegal file sharing was soaring, outpacing even the surge of bandwidth penetration. Peer-to-peer services were viewed as ‘legitimate,’ ... Today, we are in a very different world. Traffic on one of the largest peer-to-peer file sharing systems is down, even with the exponential increase in bandwidth penetration. Awareness about the law, legal alternatives, and the security and privacy risks of file sharing systems, has skyrocketed."
So, while they don't come right out and say it (at least in this press release), they strongly imply that their litigious activities have significantly reduced the amount of file sharing that occurs.

Now, let's look at this logically. One hypothesis certainly is that the RIAA's actions have decreased file sharing traffic overall, and this is the hypothesis that the RIAA believes is supported by the data. However, an alternate hypothesis is that file sharing traffic overall has not decreased, but has simply moved to other networks whose users are not prosecuted by the RIAA. An implicit assumption in both of these hypotheses is that peer-to-peer traffic levels are determined primarily by music-file sharing.

So, how do we test these hypotheses? We look at traffic levels. BoingBoing recently linked to a bandwith analysis by CacheLogic that provides some data we can use.

Slide 9 of CacheLogic's presentation presents the proportional traffic volume on different peer-to-peer networks over time. Their data clearly show that KaZaA's (FastTrack's) traffic reduced in the six month period they examined (46% of traffic in Jan 2004 to 19% of traffic in June 2004), and this does match the prediction of hypothesis #1 (as the RIAA happily advertises), though it also matches the prediction of our alternate hypothesis (which the RIAA fails to mention). BitTorrent's traffic increased during the same time period (26% to 53% of the overall traffic), which supports our alternate hypothesis, and not the RIAA's hypothesis. Unfortunately the presentation doesn't provide overall traffic volume analyses (that I saw), so we can't determine whether overall file sharing volume has decreased, but the presentation implies that it has not.

Other researchers (PDF of their paper) have attempted to estimate the actual traffic volume that occurs across peer-to-peer networks. They found that, at least until January 2004, the traffic levels of peer-to-peer networks had been increasing, not declining as the RIAA implies (and they also find the same trend of growth in BitTorrent and decline in KaZaA / FastTrack).

The CacheLogic presentation also summarizes some other important facts about file sharing that the RIAA typically fails to mention:
  • "The vast majority of Peer-to-Peer traffic volume comes from large objects >100MB in size," implying that MP3 file sharing is not driving file sharing network volume. (slide 10)
  • "Many free software projects now use BitTorrent to distribute large CD images." (slide 10)
  • File sharing is not done by just a few users. "Nearly 10% of the number of broadband connections in the world" are logged on to Peer-to-Peer networks at any given time, and "75% of European Broadband subscribers use Peer-to-Peer Networks every month". (slide 11)

