Fun with Perl & search logs

Perl December 2nd, 2006

A while ago AOL released a large amount of anonymised search data from its users. This consisted of over 4 GB of data.

Now that’s a hell of a lot of search queries to be looking through manually. However, thanks to some Perl & Excel trickery, it’s possible to get some useful info from all this. ‘Perl’ stands for ‘Practical Extraction & Reporting Language’, so is naturally perfect for the task.

I’m far from being a Perl expert, but here’s the script I wrote to parse this data:

!/usr/bin/perl -w
use strict;

my $i = 0;

until ($i > 9) {
open TXTFILE, “searchlogs_0$i.txt” or die “couldn’t open file $!”;

open OUTFILE, “>>test.txt” or die “couldn’t create file $!”;

while (<TXTFILE>) {
while ( /test/g ) {
print OUTFILE;
}
}
$i++;

}

The search logs come in 10 separate ~200MB files (or at least mine did) so this script will go through each programatically. Of course this can be tweaked if it’s all in a single file. All you need to do to find what people are searching for on your term is to come up with the relevant regular expression to replace the highlighted ‘test’ text. This can also tell you what search terms people are using when they come to your site (provided your site is logged).

Next, you’ll want to open up your generated text file in MS Excel. Now specifically with the AOL search logs, there are a lot of identical searches from the same person. To filter out multiple searches from one individual, select the User ID column, click Data > Filter > Advanced Filter…, select unique records only and copy it into another worksheet.

So, the next task is to identify patterns within this data that show the most common search patterns…

Share and Enjoy:
  • Digg
  • del.icio.us
  • Furl
  • Slashdot
  • Technorati
  • YahooMyWeb
  • Google

One Response to “Fun with Perl & search logs”

  1. Best search marketing tool of 2008 so far | Pagespank.com - web 0.0 Says:

    [...] gateway for the average search marketer/webmaster into overall traffic volumes on the web (barring AOL’s search info leak). Share and Enjoy: These icons link to social bookmarking sites where readers can share and [...]

Leave a Reply