Q: What information do you collect, and why?
A: In addition to any information you offer through a form on the site, I collect as much data as possible about your activities on the site, with the primary goal of improving this site’s quality, and your experience while visiting.
Q: Who sees this information?
A: Mostly me, though since I use their tracking platform(s), some data is also disclosed to Google, Quantcast, and Reinvigorate. Given the volume of data they deal with, I doubt they look at it.
Q: How is this information collected, and what is it used for?
A: Short answer? By mixing header data, tracking scripts, and server logs with human insight and speculation.
A: Long answer? Grab some coffee, and read on.
HTTP Header: “Referer”
This tells me the previous page you were on before the one you requested. I use this to find two things.
- How you found my site, and if the answer is search, what your query terms were.
- Your navigation path when on the site.
Together, these tell me what you were looking for, and (combined with other data) if you found it. I use this to improve site navigation and make it easier for you to find things here.
You can opt out of this by configuring your browser to not send referer data with HTTP requests.
HTTP Header: “User-Agent”
This tells me which operating system and browser you are using, along with version information for both.
You can opt out of this by configuring your browser to send an empty User-Agent string. Please don’t populate this with false information, it just pollutes my data.
HTTP Header: “Accept-Language”
This tells me which languages your system accepts. I use it to decide which articles would benefit from translation.
I’m not sure how you can opt out of this, or why you would want to.
Server Logs: URL and Request Status
When you access the site, this tells me which page you requested, and what happened to the request (success or error code). I use this to figure out when parts of the site are having problems, and what those problems are. I use the successes to assess the relative popularity of different parts of the site.
You cannot opt out of this.
Server Logs: Date and Time
This tells me when the server logged a request. I use it to sort the rest of the log, find traffic patterns and peak times, and to measure change over time.
You cannot opt out of this.
Server Logs: IP Address
This is the unique network address assigned to you, typically by your Internet Service Provider (ISP). I use this for two main things:
- This combined with referer data from HTTP headers (see above), and timestamps in server logs, enables me to group on-site activity into “sessions”, and analyze it as a whole session, rather than fragmented actions.
- Geolocation: Knowing your rough location1 helps me understand where the information published has relevancy. I would compare it to a publisher tracking regional sales of a book. I may also use this information to filter out ads that are not sensible for a given region (for instance, omitting an ad for a product that doesn’t ship to your country).
You can opt out of this by using a proxy, though many proxies are known to mangle JavaScript, which may affect functionality here on the site. A Virtual Private Network is a more robust solution that will not affect your browsing experience, and gives you more control over how servers see you (including geographic location).
Tracking Scripts: Google Analytics, Reinvigorate, and Quantcast
While an exhaustive list of what these tools can do is beyond the scope of this document2, I primarily Google Analytics it as a “bird’s-eye view” of activity on the site, Reinvigorate for more detailed activity analysis, and Quantcast for demographic data.
These systems use many of the methods above (excluding server logs) to help me understand how you interact with the site, and give me the requisite figures to guide changes and improvements to it.
You can opt out of all three by disabling third-party scripts in your browser, or selectively blocking content from the following sites:
- Google Analytics: www.google-analytics.com
- Reinvigorate: include.reinvigorate.net
- Quantcast: edge.quantserve.com
Cookies
In addition to being delicious, cookies help me treat your visit as a session, rather than individual page views. Some examples:
- They help you stay logged in, if you have done so.
- When you vote on things, they remember how you voted, so your browser doesn’t have to ask the server.
- They remember if you’ve shared something on Facebook or Twitter.
For notes on opting out of this, see the Addendum section towards the end.
Browser Capabilities
Detected through HTTP headers and JavaScript, this tells me what your browser can do. I use this to find the best way to serve content is best served to your browser, and I use historic logs of this to decide when I can do things like not caring about Internet Explorer 6, using PNG transparency in my images, or deploying newer technologies (HTML5 and CSS3, for instance).
Q: Can I opt out of this data collection?
A: Partially, though I would appreciate if you didn’t. As noted above, data collection helps me improve this site. In case you feel strongly on the matter, I have included instructions for selectively preventing data collection, where possible.
There are certain things you simply cannot opt out of, like server-side logs. Though you can use a proxy to obscure your IP in these logs.
Addendum
I’d like to make a few notes, on the impact of some more aggressive blocking methods:
- All modern browsers offer a means of disabling cookies. I would not expect the disabling of third-party cookies to cause any problems, though I would strongly recommend allowing first-party cookies. Some parts of the site may not function correctly without them.
- While it is possible to disable all JavaScript entirely, I can assure you that this will make the site quite unusable. Check if your browser has a way to disable third-party JavaScript only. I would expect that to cause fewer (but not zero) problems.
Copyright secured by Digiprove © 2011 Chris Olstrom- Geolocation has acceptable accuracy on a country level, with the accuracy diminishing as the estimate gets more specific. [↩]
- Seriously, there are entire books dedicated to the subject. [↩]