Fresh & New(er)

discussion of issues around digital media and museums by Seb Chan

Fresh & New(er) header image 2

The new Google Analytics

May 9th, 2007 by Seb Chan

Avinash Kaushik writes a very detailed post about the new-look Google Analytics that is rolling out across accounts right now.

Now, more than ever web analytics are essential. In my experience, web analytics at museums have been the last thing on people’s busy timetables. Most organisations report the most basic level of statistics to their funding bodies and leave it at that, yet they offer the best and most immediate opportunity to see exactly what users are doing on your site and how they are using it.

Google Analytics is free. And because of this, the new version is going to put a lot of pressure on vendors whose commercial solutions are charged on usage or fixed fee models. Also, it puts a lot of pressure on the slightly underdone open source solutions which, whilst also ‘free’, don’t offer the level of detail and analysis that the new look Google Analytics does.

The question for some organisations will be – do you want all your usage data to be held by Google?

But if you have a site where adding the necessary tracking code to a common sitewide page element is relatively easy then I’d suggest the new look version is certainly worth trialling – even if in tandem with your existing package to compare accuracy.

Google’s official announcement post also available.

Tags: 15 Comments

  • http://electronicmuseum.wordpress.com Mike

    Hi Seb

    Google Analytics is pretty good. But…

    1. We have issues in our museums deciding how / if to move from log based stats to page tagging. The former is what we’ve always reported to our funders; the latter is more accurate and easier but in our experience always reports lower – usually by about 10-20%. So we have a “wait and all leap together” thing going on which doesn’t look like resolving itself any day now.

    2. One of the major issues – which you touch on – is that the data is in the cloud. I don’t have a problem with this per se (all our data is pretty much public knowledge anyway) BUT it does raise a question from an audit perspective. Although I’m confident that Google’s data is backed-up and available 24/7 (at probably far more 99999’s than we could ever hope to provide), I’d feel much happier using GA as our only source of stats if I could *get* at my data. A downloadable db backup, an XML file, etc.

    Final thought – when is a major stats system like GA going to start providing data via a public API…? It’d be great to be able to use a feed to help generate content on our sites (“most popular article”, “most searched for phrase”..)

  • http://www.freshandnew.org Seb Chan

    Hi Mike

    In response to your point #1 – the key reason why log stats are always higher than page tagged stats is because of spiders and bots. I see this almost everytime I look at other people’s stats packages.

    When we started excluding bots and spiders from our analysis of our logs our traffic took a 10-15% hit. Fortunately we timed the reporting of new stats with a rise in overall traffic so we didn’t ‘lose out’.

    Spiders and bots – especially the Yahoo bots can add thousands of visits and page views to your site each day. And realistically these are NOT actual visitors and their presence in your webstats doesn’t add anything positive to your figures other than sheer volume. If anything they make it much harder to work out which parts of your site are actually performing well or poorly.

    This is partially why I am keen to see museums (and web projects in general) start moving to other forms of measurement that are more reflective of real usage AND also more useful in identifying underperforming parts of your site.

  • http://blog.liverpoolmuseums.org.uk Billy

    We’ve been using Google Analytics alongside Web Trends for the last nine months. I was never happy with using Web Trends for day-to-day analysis of the website, we continue to run it for the same reporting purposes as Mike.

    GA is great for tracking trends on the fly. We can share access to the stats (or just one part of the stats) with anyone in the organisation who has a Google account. The layout has been the biggest drawback but I’m looking forward to seeing this redesign.

    Even discounting robots/spiders, we still see a pretty big gap between our two sets of stats. I’ve yet to come across a satisfactory explanation for this, I’m not convinced it can be totally due to browsers with javascript disabled.

  • http://electronicmuseum.wordpress.com Mike

    I’m pretty sure the disparity isn’t just spiders, as Billy says. We’re obiously filtering what we can from our logged stats, and have done for some time. It could be that GA is better at this than our current log-based software (Summary). Either way, I think the best approach is to run both. The question we have is which we use for reporting purposes. At the moment it’s logs, but that could change as we get better at using both sets of tools.

  • Seb Chan

    We use Web Trends for our log files – the latest version – which has very difficult licensing arrangements (charged by page view).

    I’d be interested to know if the GA figures were closer to your WT or Summary figures if you subtracted ALL visits under 1 or 2 seconds in length as well as all that are identified automatically as bots and spiders. That would effectively remove any non-human traffic – including deep linked image leeching which might be as much as another 10-15% if your site is image rich and your MySpace or forum referrer traffic is high.

    I’m not sure we really wanting to be counting people who use an image from our websites as an avatar in a forum or on their MySpace page as a ‘visitor’. And if we do the first time then we certainly do not want to do so each time someone views their MySpace profile or forum post (which will be counted in the logs as a ‘visit’).

    Unfortunately web metrics have historically been devised for selling advertising and justifying the price of advertising on a website. For those who don’t do this, these metrics are not reliable or particularly useful measures of engagement or use.

  • http://www.finds.org.uk Dan Pett

    Hi Seb and Mike
    I just implemented GA across the entire portfolio of British Museum websites and did a presentation on what we’re finding out from using it. It highlighted lots of interesting usage stats for certain sites – false figures in some cases due to dodgy site structures and pop ups etc.

    See http://www.finds.org.uk/wordpress/index.php/271

    Anyone know how I can determine if a website is blocked by the Great Chinese firewall? I’m really bemused about lack of traffic from China.

    Ultimately, I think it shows that the DCMS needs to rethink the way they collect statistical data from national Museums and cultural institutions.
    Re: api – there are a few articles that claim to make use of GA data eg: http://www.thinkingphp.org/2006/06/19/google-analytics-php-api-cakephp-model/ and a java based XML api
    http://code.google.com/p/janalytics/wiki/quickstartguide

    Don’t think they actually work that well yet!

    Dan

  • Seb Chan

    Hey Dan

    That is a super excellent presentation. I really enjoyed the detail and it was pretty comprehensive. Thanks for sharing.

    I like how you have tweaked it to deliver the financials based on ecommerce transactions. That is the extra step most people don’t take.

    Seb

  • http://www.museumlab.org Juha

    Dan,

    perhaps this website is what you are looking for: http://www.greatfirewallofchina.org/ . It is a project that is run here in Amsterdam. I met them a year ago, when they held the pre-launch presentation.

    Looks like it works pretty well: your site seems to be blocked (as is ours).

    Juha

  • Seb Chan

    Dan, Juha

    That Dutch project is cool except I’m not sure that it is accurate at the moment. As it says, it relies on only one server in China for pingback and that it may be unreliable.

    I ran http://www.powerhousemuseum.com through the site and it came up blocked too . . . except our bloggers in China, rural China to be precise, who are using internet cafes are frequently connecting and updating the blog from China. They have found Wikipedia to be blocked but Powerhouse is fine.

    I would surmise that the reason for low traffic from Asia is a matter of language and language specific search. We have had huge traffic spikes from China for some image leeching – one prominent Chinese blog used an image of a Chairman Mao cap from a school case study on our site and in 3 hours we had 51,000 visitors from China. We scrubbed them from our reported stats becuase they just leeched the image (which was deep linked) and left.

    We almost made our recent Great Wall of China exhibition bi-lingual and we did actually do tri-lingual versions of our 1000 Years of the Olympic Games content back in 2000 . . . however even when we have done translations the traffic hasn’t been as much as we expected.

    We do, however, get quite a lot of public enquiries from the subcontinent and China from traders and exporters who think our collection database is some kind of list of products we sell – and they offer their products at wholesale prices to us! Some even try to place orders . . . we had one a few weeks ago for several thousand tonnes of steel . . . Surprisingly this isn;t spam and they are actually manually filling out our ‘contact us’ form!

  • http://www.finds.org.uk Dan

    Seb
    Thanks for reading that, if you’re interested I can send you our report for the year to date and you can see more.

    Juha
    I’d seen that site, but hadn’t thought it was very reliable. The reason why I am surprised is the amount of Chinese tourists I see in the Museum at the moment allied to the fact that the warriors from Xian are here later in the year. Management were rather surprised!
    D

  • http://www.museumlab.org Juha

    Seb, Dan,

    seems I wasn’t too critical about this one. I can ask the Rijksmuseum webmaster how he deals with this, since they have mainly tourists visiting their museum.

    Juha

  • http://www.indianhandicraftsexporters.com Samay

    We have just installed google analytics on our website. The classic that is the old format is much easier for us as we are new to it. However the new one is a bit complex.

  • Rob

    My organization has been running WebTrends for 40+ web sites averaging at least 1 million page views each for the past 4 years.

    I migrated us to the WebTrends SmartSource Data Collector (SDC) method of capturing web usage information last Fall. This uses a javascript that is embedded within each page and makes a call back to a customized Apache server.

    I also implemented Google Analytics at the same time last year. I was interested in the adwords integration.

    My findings:
    1. Log files aren’t accurate. Due to client-end caching services (we use Akamai) or ISP caching you won’t realize all actual visitors to your site.
    2. The SDC method is most accurate. This ignores any system that doesn’t interpret the page (e.g. spiders), but that’s fine.
    3. The GA numbers are consistently around 50% lower than the SDC numbers.

    I’ve validated these numbers to ensure I’m not double counting anywhere. GA is just not accurate at collection.

    GA has great reports and I like the approach, but without accurate collection it’s not a worthy tool.

  • http://www.freshandnew.org/ Seb Chan

    Hi Rob

    I know that GA doesn’t count as much as WT does but I would also suggest that GA is going to be much more aggressive and up to date with spiders and bots – especially those that don’t identify themselves as such. Also, GA’s measure of visits is different from WT – and WT bases its licensing on page views . . .

    Anyway, I’m not so sure that actually measuring page views or visits is necessarily the point of analytics – rather that we should be concentrating on attention metrics (time spent) and conversions (what people actually do).

  • Rob

    I agree with time spent and conversions, but would add it should really be focused to the needs of the specific site. I have some sites that are very focused to visitors and conversions. I have others where time on site, loyalty (returning visitors), and gross usage growth are important.

    Spiders by default don’t interpret pages, therefore they won’t execute the javascript. If they don’t execute the javascript then no record is made in the logging server. Unless the people are being very sneaky about disguising their headers and also having spiders intepret pages, they’d be excluded.

    I’d like for GA to be better, but I just can’t trust the inconsistency. Even when I look through other tools GA is way off.