Maintaining my own Pageview/Visit Database

I have started maintaining my own pageview/visit database for both mattberseth.com (this blog) and mattberseth2.com (my demo site).

 

Why?

I currently use Google Analytics (GA) and I really like it.  But ...

  • You can't (yet) export your raw visit data from GA
  • You can't (yet) browse your GA cube
  • I have other data in my MoveableType database (post dates, categories, commentors, tracebacks, etc...) that GA just doesn't know about.  This information would be interesting to get into the cube.

 

What have I done so far?

Added a small bit of JavaScript to the pages I want to track on on both mattberseth.com and mattberseth2.com.  The script extracts the pages referrer and a few other tokens and sends them a HttpHandler that I have setup on mattberseth2.com (hosted by DiscountASP).  Here is a sample of what is the request/response looks like.

image

The handler extracts the tokens and any http header values and pushes them to the SQL Server 2005 database I have setup (also with DiscountASP).  Below is a screen shot showing these records ...

image 

Then, I added a linked server connection to this database from my home PC and setup a SQL Job that calls a sproc that moves the data from the remote server to the local PC on my desktop.  This keeps the live transaction table relatively empty.

Then, again on my local PC, I setup a scheduled task that processes the visit table and creates a master/detail structure for each of the raw visit records where the header contains stuff like the url requested, the pages title, the referring url, the IP of the client that requested the page and the date.  The detail table is filled with the key/value pairs taken from any querystring or header tokens.

Here is a quick screen shot of a handful of records from my header table.  Only 4 columns and ~20 records, but still you can see what kind of information you can get from this data (I blacked out all of the IP's except mine).

image

 

How is it working?

It seems to be working OK.  I am benchmarking my hit counts with what GA is reporting to make sure I have an accurate baseline.  So far they seem to be relatively close.

 

What's Next?

Building the cube.  Should be fun!

 

That's it.  Enjoy!


TrackBack

TrackBack URL for this entry:
http://mattberseth.com/blog-mt/mt-tb.fcgi/153

Comments


I did the same thing except I'm tracking it a different way. I like looking at individual user sessions and random statistics. The only down side is I am getting a lot of bot traffic that is a pain to filter out. What I've done is create a table with bot ip's and user agents and just dont save that traffic after I've identified something as a bot. Is there anyone else that is recording their own data like this?

Posted by: Renato Golia on September 26, 2008 01:11 AM

it would be nice if Google gives us a way (webservice?) to access our own data!

waiting for the cube! ;)

Posted by: J on September 26, 2008 01:39 PM

What's a GA 'Cube' ?

Your discipline in delivering top notch articles really strikes me.

Maybe you can do for us a post onhow to build such discipline every successfull developer needs it.

Thanks.

Any chance we could see the basics of your httphandler, I was also looking to add tracking for certain parts of a website I am working on.

Great Article

This reminds me of what I attempted to setup with LogParser at one point in time. Too bad Microsoft ditched that project. Codinghorror has listed some bullet points about LogParser http://www.codinghorror.com/blog/archives/000369.html .

The method you propose though has more flexibility and allows you to manipulate the data better. I am always amazed at your posts!

Did you have plans to attend MicrosoftPDC? It would be nice to meet you face to face.

@Rush -
Are you looking at your server logs or do you have use script like GA does?

@Renato Golia -
Or even if they don’t have a webservice, it would also be nice to get an export (csv or something) of the raw pageview data (request variables and http headers). I just want some way to get the pre-aggregate data out …

@J -
GA stands for Google Analytics. It’s a free web application (up to 5 million pageviews per month) that google provides that allows you to view pageview/visit statistics for your web site. It works by adding a bit of JavaScript to your page that sends some basic information about your visitors back to googles servers.

Cube = A data structure for analyzing highly dimensional data.

Google populates a cube from the raw pageview/visit data it collections and then provides access to the aggregate information via charts, grids and graphs from their web application. If you have a blog or run a web site you should check it out.

@Joseph Ghassan -
Thanks Joseph. I will add the topic for to the queue. And I really appreciate the compliment.

@Bobby -
Absolutely. I had some serious issues obtaining the same hit count as GA. I don’t have all of the problems solved, but you can expect a post soon discussing what issues I hit and how I am currently working around them.

@Justin Wenlandt -
Thanks Justin. No – no PDC for me this year. But if you are ever in Jacksonville FL, send me an email and we’ll meet up for a beer.

Posted by: TK on October 1, 2008 07:01 AM

Another great article Matt! I am working on a project that is using SQL Analysis Server Cubes to process survey results. We currently are not capturing user statistics, but I would like to add this information to the data we are collecting. Does anyone know of a good way to detect connection speed in .NET? It needs to very light and unobtrusive for the clients.

Thanks, keep up the hard work!

Posted by: David on October 1, 2008 02:34 PM

Matt, very interesting article. I have been toying with something similar for a while and even built a cube. I have been using a combination of parsing the IIS logs, and collecting data during the usual page lifecycle on the server. What's the thinking behind using a client side script to gather the data and post this to a handler instead of using the original aspx request to collect and store the data.

Thanks, David.

Posted by: Onat on October 5, 2008 11:31 AM

Hi Matt, i would like to know how you can get ref. urls. i haven't use GA yet, is it a GA feature?

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

Consulting Services

Yep - I also offer consulting services. And heck, I'll do just about anything. If you enjoy my blog just drop me an email describing the work you need done.

Recent Comments

  • Onat wrote: Hi Matt, i would like to know how you can get ref. urls. i haven't use GA yet, is it a GA feature?...
  • David wrote: Matt, very interesting article. I have been toying with something similar for a while and even built...
  • TK wrote: Another great article Matt! I am working on a project that is using SQL Analysis Server Cubes to pro...
  • Matt Berseth wrote: @Rush - Are you looking at your server logs or do you have use script like GA does? @Re...
  • Justin Wendlandt wrote: This reminds me of what I attempted to setup with LogParser at one point in time. Too bad Microsoft ...
  • Bobby wrote: Any chance we could see the basics of your httphandler, I was also looking to add tracking for certa...
  • Joseph Ghassan wrote: Your discipline in delivering top notch articles really strikes me. Maybe you can do for us a post ...
  • J wrote: What's a GA 'Cube' ?...