Friday, July 08, 2005

Privacy on the web

[Part one of a three-part series detailing the uses of Privoxy and Tor.]

Most websites these days are filled with ads (note, dear reader, that this site is ad-free), and the majority of websites track their users' behavior through the use of cookies, webbugs (1x1 pixel gifs), scripts, server logs, and other tools. Even my humble site tracks your activity; the counter service I use (sitemeter; which is frequently used by bloggers) places a little script on each page, and based on that script I can typically determine when you visited this site, your IP address, which page you entered the site on, how many pages you loaded, how long you kept loading pages, and what link you followed to get to this page.

Don't worry, I'm not out to track your individual behavior. My counter service only lets me see data on the last 100 visitors (because I'm cheap and don't pay them), and I use the data solely to see how many hits I'm getting, if I'm getting any referrals from sites I don't already know about, and what pages people are viewing. The service only tracks your IP; I don't know your name.

The point is, though, that it's terribly easy to track people's activities on the web. If a person is running their own server, and thus has access to the server logs, their website doesn't even have to have a script running on the page to get all the information I described above; it occurs automatically when you request a page from the server.

While your IP address can't give a website administrator your name, the cookies on your computer can. I recently went to a Pepsi website (based on their recent Star Wars movie promotion), and clicked on a form to enter a sweepstakes prize number. Even though this was the first time I'd ever visited a Pepsi website, all of my personal information (name, address, e-mail) was automatically filled in on the form.

How had they gotten the information? They'd used cookies placed on my computer to identify that I was a Yahoo! user, and had apparently gotten the information from Yahoo!. Pepsi was doing this in good faith (to make registering easier), but the point is clear: it's easy to pull private information from the content contained in cookies. I was recently browsing through the text of all the cookies on my computer, and found my full (real) name, multiple personal e-mail addresses, passwords, and other private information, all stored in plain text in the cookies. There were a lot more cookies that contained encrypted (or at least unreadable) information; I have no idea what information they contained. It wouldn't be hard for a website to access that information, link it to my IP address, and then keep a database of what I, personally, do on the web. Sites like Amazon do it all the time, and there's even some evidence that they alter their prices based on the presence (or absence) of cookies on your computer..

If you don't like all this, there are things you can do to help prevent sites from tracking you. One option is to restrict how sites can set cookies. Unfortunately, many websites now rely on cookies to function (ever wonder how Flickr and Gmail remember that you're logged in?), so preventing all websites from setting cookies isn't a workable solution. However, you can set Firefox (and other browsers) to force all cookies to be "session only", which means that they'll be deleted once you close your browser. Additionally, most browsers allow you to add sites to a "blocked" list, which prevents those sites from ever leaving cookies. Firefox extensions like cookie culler (which I wrote about here) can make cookie editing somewhat easier, but it's not a complete solution to the problem.

However, manually configuring your browser can only do so much. It's a pain to try to figure out which sites need cookies, and which don't, and it's frustrating to have to filter through your cookies day in and day out, slowly building up a block list.

And cookies are only part of the problem; even if you were to block all cookies, websites can still log a great deal of information about you just based on your page accesses. That's where programs like Privoxy and Tor come in. Privoxy focuses on filtering web content to enhance your privacy and reduce the number of ads rendered on pages, whereas Tor focuses on preventing websites from tracking you by anonymizing your IP address.

In the next post in this series I'll discuss installing, using, and configuring Privoxy; in the third post in this series I'll do the same with Tor.

No comments: