Categories
Conferences and event reports Web metrics

The value of museum content, attention, and time

It is that time of year again.

In a few weeks time I’ll be running the nth iteration of my annual ‘web metrics for museums’ workshop at Museums and the Web. This year I’m joined by the Smithsonian’s analytics guru Brian Alpert. As usual we will be working through the realities of a museum’s web presence and the new ways to measure how it is performing and how to communicate that to the rest of the organisation.

Every year it gets harder.

There’s now more people than ever before with access to the web, and with that brings the unrealistic expectation from management that those new web users are going to flock to a museum’s content, even though it was likely never created or designed with them in mind.

Let me digress.

I spent most of my spare time in my twenties and early thirties involved in music. My friends and I put on a huge number of gigs, we toured international artists, put out some CDs, ran a weekly club night for a decade, put on festivals, ran a music magazine, and did a weekly radio show on public radio (equivalent of US college radio) for nearly two decades.

We were doing this just as the web became mainstream and the way that music was distributed, consumed, and the cultures that grew around it was in rapid transformation. The music scene that we were involved in was niche but not small – some of the larger parties drew as many as 4000 – and there was only one or two international tours that we lost money on. In a city the size of Sydney that wasn’t too bad. The value of what we did in those years was best measured in its long term impact – not on an event-by-event basis.

We knew how to make it work financially but over the years we also realised that there was a difference between ‘growing a scene’ and ‘sustaining a community’.

The former reaches a point at which the bubble bursts and the scene rapidly contracts, whilst the latter keeps supporting the social needs of the people involved as they get older, their tastes change, and in some cases, pair off into domesticity.

What the web brought to music was two-fold. Firstly it opened the gates for ‘publishing’ – anyone could upload their music, release it, and cut out (or downgrade) the middleman. Second, it opened the gates for ‘fans’ – anyone could, in theory, get access to all this music, talk about it, and build communities around it by themselves

Music discovery metastasized. Personal networks exploded globally, record stores began to be eaten by chains and then die, music media was no longer constrained by ‘issues’ and freight, and then Napster/SoulSeek/torrents took over at the turn of the millennium. Online music media, YouTube and Spotify and similar services have replaced much of what there used to be in terms of music magazines (especially NME/Melody Maker in the 1980s), record stores and music discovery through radio.

So what we have is easier publication, easier access, and, transformed discovery. (Arguably music has gained more than it has lost, although that doesn’t mean musicians have gained)

What didn’t change was people’s time to listen to music, or their urge to listen to music. Listeners just don’t have more hours in their days.

It is worse for museums.

We make short videos. We record long epic lectures. We write essays and ebooks. We publish these online. We ‘effectively utilise social media’ (whatever that means these days). And then we foolishly expect that the world is all going to rush to watch/listen/read them.

But we misunderstand the value of what we’ve made. Unlike the transactional parts of our websites, these are all things that will only reveal their value over the long term.

We barely create time and momentum for people to interrupt their busy lives to consider visiting a museum with their precious spare time – how can we expect it to be an different with our online content?

If you have doubts, the Culture24 Lets Get Real project reports are essential reading.

Its not just museums, everyone is struggling with this.

More at Museums and the Web in Baltimore.

Categories
User behaviour Web metrics

Let’s Get Real report from Culture24 now available

Over in the UK right now Culture 24 are launching a report I worked on with them and many of the major cultural institutions in the UK. Coming from a need amongst web/digital people to find better ways of measuring the effectiveness of their work in the sector, the report – Let’s Get Real – pulls together analytics data from 3 years of activities online and in social media and makes a number of recommendations that are aimed at kickstarting, in the words of Culture24 Director, Jane Finnis, “a dramatic shift in the way we plan, invest and collaborate on the development of both the current and next generation digital cultural activities”.

The inability to effectively communicate the connection between delivering the institutional mission and digital projects is an ongoing concern to everyone working in museums. And at a time when there are increasing calls for museums to take roles that are more akin to broadcasters and publishers in the digital space, yet the majority of internal and external stakeholder value is still perceived as coming from visits to exhibitions and buildings, there is a pressing need to keep thinking about the ways digital projects report success (or otherwise!).

From my perspective, working with this diverse group of institutions was a lot of fun and very illuminating. It helped consolidate much of my thinking about the state of digital projects in the cultural sector and the long road ahead to really transform the way, particularly museums (less so the performing arts), use and adequately resource digital in their institutions. At the same time there were many unexpected surprises – the very different geographies of online visitors between institutions, and the comparatively low impact of social media in terms of website traffic, even for particularly well-promoted campaigns were revealing. The social media work by Rachel Clements also demonstrated that the easy option – reporting the numbers – greatly undersells the value of social media. The alternative, qualitative analysis, is much harder and requires more time and an understanding of why you are active in social media in the first place.

Have a read of the report (PDF) and see what you think.

For those involved in the project there was a lot more than number crunching – there were some amazingly productive working sessions and meetups – and the launch conference that is taking place right now in Bristol (check the #C24LGR hastag conversations!). In many ways the report captures only a fragment of the ‘value’ of the project as a whole.

Categories
Developer tools Web metrics

Fixing document download and link tracking with the Google Analytics asynchronous tracking code

If you’ve been using the gatag.js from Good Web Practices in conjunction with your Google Analytics code for the past few years you may have noticed that it stopped working when you updated to the newer, better asynchronous Google Analytics tracking code.

What was nice about the gatag.js code was that it was quick and easy to implement and tracked downloads of PDFs and other file types as well as traffic following any outgoing links. For cultural institutions which are full of such PDFs and external links, tracking these as distinct ‘EVENTS’ in Google Analytics was very useful for understanding user behaviour on your site.

There’s not been a clean and simple fix for this problem and until I saw Stephen Akins’ solution using jQuery I thought we’d have to go back to other methods.

Our developer, Carlos Arroyo, made some minor modifications to Stephen’s code so that the way in which downloads and external links appeared in the reports would stay the same as those used by gatag.js allowing for historical comparisons.

First remove your references to gatag.js then place this after your Google Analytics asynchronous tracking code. (If you already load jQuery on your site then you probably want to check the version and you can omit the first section.)

(And of course you use this at your own risk!)

<script type="text/javascript">
		if(typeof jQuery != 'function'){
		var script = '<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js"></script>';
		document.write(script);	
	}

</script>


<script type="text/javascript">

	$(document).ready(function(){
		
		$('a').mouseup(function(){
  			href = $(this).attr('href');
  			href_lower = href.toLowerCase(); 
  			if(href_lower.substr(-3) == "pdf" || href_lower.substr(-3) == "xls" || href_lower.substr(-3) == "doc" ||
  			   href_lower.substr(-3) == "mp3" || href_lower.substr(-3) == "mp4" || href_lower.substr(-3) == "flv" ||
  			   href_lower.substr(-3) == "txt" || href_lower.substr(-3) == "csv" || href_lower.substr(-3) == "zip") {
   				_gaq.push(['_trackEvent', 'Downloads', href_lower.substr(-3), href]);
  			} 
  			if(href_lower.substr(0, 4) == "http") {
   				var domain = document.domain.replace("www.",'');
  				if(href_lower.indexOf(domain) == -1){
				href = href.replace("http://",'');
				href = href.replace("https://",'');
 					_gaq.push(['_trackEvent', 'Outbound Traffic', href]);
   				}
  			}
 		});
	});
</script>
Categories
User behaviour Web metrics

A/B headline switching for museum content

Regular readers will know that I’ve been fascinated by the overlap between museum curatorial practice and journalism over the past while. Similarly I’ve also been very interested in the impact of behavioural data on these professions that is emerging at scale and in real-time on digital platforms.

So I was very excited to find that out of a Baltimore Hackcamp that a ‘headline tester’ plugin for WordPress has been released.

You will have noticed how the headlines on news websites change throughout the day for the same article. This has been the subject of several online projects like The Quick Brown that tracked changes in Fox News headlines, and News Sniffer that tracks full article edits in the UK.

This sort of A/B testing is usually the kind of activity that takes a lot of work, planning, and is hard to deploy at a daily level with the kind of resources that museums have available to them. In news journalism time is of the essence – readership fluctuations directly impact commercial model in a highly competitive environment – so it makes a lot of sense to have systems in place for journalists to track and edit their stories as they go. Museums don’t face these pressures but do face the same competition for attention.

What this plugin allows us to do is – like a news website – pose two different headlines for the same blog post, then, over time, the one that generates the most clicks becomes the one that sticks for that post. Visitors and readers effectively vote through their actions for the ‘best’ title.

We’ve just started to deploy this on the Photo of the Day blog and it will progressively roll out over the others as we go.

Today’s Photo of the Day post introduces a camera from our collection. So which out of these two headlines do you think would generate the most traffic?

Are you interested in hearing about our camera collection?
or
The Bessa 66 folding camera

Paula Bray who wrote the post expected the first headline would be most popular. And now we can test that hypothesis!

Surprisingly, right now it is the second more direct headline – ‘The Bessa 66 folding camera‘ – that is generating the most traffic by almost 2 to 1.

Over time we will be able to better refine our headlines that are written by curators and other staff who blog. And of course this feeds back into improving the effectiveness of the writing style of museum in these digital mediums.

Categories
User behaviour Web metrics

Testing an engagement metric and finding surprising results

As regular readers know I’ve been working on web metrics for a few years now and experimenting with different models for cultural institutions. So it was with interest I read the Philly.com’s equation for online engagement over at Nieman Journalism Lab.

… two months ago, philly.com, home of the Philadelphia Inquirer and Daily News, began analyzing their web traffic with an “engagement index” — an equation that goes beyond pageviews and into the factors that differentiate a loyal, dedicated reader from a fly-by. It sums up seven different ways that users can show “engagement” with the site, and it looks like this: Σ(Ci + Di + Ri + Li + Bi + Ii + Pi)

[…snip…]

One possibility they considered was measuring engagement simply through how many visitors left comments or shared philly.com content on a social media platform. But that method “would lose a lot of people,” Meares said. “A lot of our users don’t comment or share stories, but we have people — 45 percent — [who] come back more than once a day, and those people are very engaged.”

They ultimately decided on seven categories, each with a particular cutoff:

Ci — Click Index: visits must have at least 6 pageviews, not counting photo galleries
Di — Duration Index: visits must have spend a minimum of 5 minutes on the site
Ri — Recency Index: visits that return daily
Li — Loyalty Index: visits that either are registered at the site or visit it at least three times a week
Bi — Brand Index: visits that come directly to the site by either bookmark or directly typing www.philly.com or come through search engines with keywords like “philly.com” or “inquirer”
Ii — Interaction Index: visits that interact with the site via commenting, forums, etc.
Pi — Participation Index: visits that participate on the site via sharing, uploading pics, stories, videos, etc.

Philly’s equation draws heavily on Eric T. Peterson and Joseph Carrabis’ “Measuring the Unmeasurable: Visitor Engagement” (pdf) .

I started thinking about how to apply this equation to the Powerhouse’s web metrics.

Click (6 pages or more) and Duration (5 minutes or more) indexes are fine. However Recency set at daily visitation is simply not achievable for museums – especially where through the door museum visitors are likely to average out at around once a year – and our online content is never going to be as responsive as ‘news’ has to be. So in thinking about Recency I settled on a 90 day figure.

Here’s an eight quarter look at how we’ve been tracking against a variant of this metric – downplaying the interaction and participation indexes as our content type and site doesn’t work evenly for these.

I’ve added a column for Sydney-only visitors so you can get a sense of how geographically specific this engagement metrics is for a museum such as ours.

Philly-style High Value % Philly-style High Value Sydney %
Q3 2010 3.73% 8.10%
Q2 2010 3.20% 7.78%
Q1 2010 2.38% 7.69%
Q4 2009 1.60% 5.56%
Q3 2009 1.73% 5.14%
Q2 2009 1.75% 5.67%
Q1 2009 2.12% 7.24%
Q4 2008 1.45% 4.59%


Taking a closer look at Q3 2010 and the Sydney Philly-style high-value segment there are some interesting data.

This apparently highly-engaged segment that comprises 8.10% of all Sydney traffic to the Powerhouse website for the period. 71.25% of this segment are new visitors to the Powerhouse, looking at a remarkable average of 17.3 pages per visit and spending and average of 19:44 minutes on the site up until the final page of their visit. These are clearly a highly desirable group of web visitors.

So what do they do?

Interestingly it turns out that these are primarily what we used to call ‘traditional education visitors’. I’ve written about them before in my paper for Museums & the Web earlier in the year.

31.47% visit Australian Designers at Work, a resource built and last modified in 2004
15.45% visit Australia Innovates, a curriculum resource built in 2001
7.58% visit exhibition promotional pages
7.54% visit the online collection

Perhaps unsurprisingly for such committed, but traditional, web visitors, they also accounted for 50% of the online membership purchases during the period.

Categories
Collection databases User behaviour Web metrics

Actual use data from integrating collection objects into Digital NZ

Two months ago the New Zealand cultural aggregator Digital NZ ingested metadata from roughly 250 NZ-related objects from the Powerhouse collection and started serving them through their network.

When our objects were ingested into Digital NZ they became accessible not just through the Digital NZ site but also through all manner of widgets, mashups and also institutional website that had integrated Digital NZ’s data feeds.

So, in order to strengthen the case for further content sharing in this way, we used Google Analytics’ campaign tracking functionality to quickly and easily see whether users of our content in Digital NZ actually came back to the Powerhouse Museum website for more information on the objects beyond their basic metadata.

Here’s the results for the last two months.

Total collection visits from Digital NZ – 98 (55 from New Zealand)
Total unique collection objects viewed – 66
Avg pages per visit – 2.87
True time on site per visit (excluding single page visits) – 11:57min
Repeat visits – 37%

From our perspective these 55 NZ visitors are entirely new visitors (well, except for the 8 visits we spotted from the National Library of NZ who run Digital NZ!) who probably would never have otherwise come across this content so that’s a good thing – and very much on keeping with our institutional goals of ‘findability’.

For the same period, here are the top 6 sources for NZ-only visitors to the museum’s collection (not the website as a whole) –

(click for larger)

Remember that the Digital NZ figure is for around only 250 discrete objects and so we are looking at just under 1 new NZ visitor a day to them via Digital NZ, whereas the other sources are for any of the ~80,000 collection objects.

However, I don’t have access to the overall usage data for Digital NZ so I can’t make a call on whether these figures are higher, lower, or average. But maybe one of the Digital NZ team can comment?

Categories
Conceptual User behaviour Web metrics

Museum implications of the Columbia report on metrics for digital journalism

Web analytics is a tricky game and often the different ways of measuring things confuse the very people they are there to help make better decisions.

For the museum sector, analytics seems even more foreign, largely because we’ve never had a very good way of generating such huge amounts of quantitative data about our visitors before.

We’re not alone in this.

As you’ve probably read in recent weeks there has been a fair bit of discussion, debate, and doomsday predictions coming out of the journalism world as it was revealed that, lo and behold, newspapers were using web analytics in their newsrooms.

This month, though, Lucas Graves, John Kelly and Marissa Gluck at the Tow Center for Digital Journalism at Columbia University, have published an excellent report on the different types of traffic metrics that news websites are confronted with.

Provocatively titled Confusion Online: Faulty Metrics & the Future of Digital Journalism the report explains history and reasons for the widely divergent figures resulting from different types of reader measurement – panel and census-based approaches.

A lot of these reasons have to do with who the figures are being generated for, and the historical role that readership figures have played in the pricing and sale of advertising. So we need to take this into account when we in the museum sector work with the same types of measurement tools.

Indeed, the resistance to shifting from the historical panel-based measurement to site-based (or as the authors call it, census-based) measurement is largely to do with the enormous commercial implications for how advertising is priced and sold that would result. (Fortunately museums cannot afford the panel-based solutions so we’re already mostly committed to census-based site analytics.)

There are two telling sections –

This is the case at both the New York Times and the Wall Street Journal, which sell most online inventory on a CPM [cost per thousand impressions] or sponsorship basis and do not participate in ad networks (other than Google’s AdSense, which the Times uses). “We sell brand, not click‐through,” declares the Journal’s Kate Downey flatly. “We’re selling our audience, not page counts.”

Marc Frons echoes the sentiment, pointing out that the Times can afford to take the high road. “For us as the New York Times, brand is important,” he says. “You really want to make the Internet a brand medium. To the extent CPC [cost per click] wins, thatʹs a bad thing.”

and

. . . the rise of behavioral targeting represents a distinct threat to publishers: By discriminating among users individually, behavioral targeting diminishes the importance of a site’s overall brand and audience profile. Suddenly the decisive information resides not with the publisher but in the databases of intermediaries such as advertising networks or profile brokers. A similar threat may be emerging in the domain of demographic targeting. As it becomes more possible to attach permanent demographic profiles to individual users as they travel the Web, the selection of outlets will matter less in running a campaign.

This is why online media outlets tend not to participate in third‐party ad networks if they can avoid it. “We donʹt want to be in a situation where someone can say, ‘I can get you somebody who reads the Wall Street Journal while theyʹre on another site that costs half as much,’” explains Kate Downey.

Museums and others in the cultural sector operate on the web as (mostly) ad-free publishers. We’ve traditionally thought of our websites as building the brand – in the broadest possible terms. In fact we don’t usually use the term ‘brand’ but replace it with terms like ‘trustworthiness’. Now we’re not ‘selling ad space’ but we are trying to build a loyal visitor base around our content – and that relies on building that ‘trustworthiness’ and that only happens over time and through successful engagement with our activities.

We invest in making and developing content other than the opening hours and what’s on information – the brochure parts of our web presences – because it builds this sense of trust and connection with visitors. This sense of trust and connection is what makes it possible to achieve the downstream end goals of supporting educational outcomes and the like.

But just as the base unit of news becomes the ‘article’, not the publication, we are also seeing the base unit of the ‘online museum experience’ reduce from the website (or web exhibit) to the objects, and in some cases to just being hyperlinked ‘supporting reference material’. This is where we need to figure out the right strategies and rationales for content aggregation, unless we do this is going to continue to cause consternation.

We also need to pay a lot more attention to the measurement approaches that best support the different needs we have to advertising supported publishers.

Categories
Web 2.0 Web metrics

Tip #461: Segmenting and counting Facebook fans with the Ad Planner tool

Another thing that has emerged from the web analytics discussions has been the lack of clarity over how to consider the success or otherwise of museum Facebook fan pages. Not surprisingly there is a lot of superficial focus on the total number of fans, but this doesn’t give the necessary granularity you are going to need to justify the investment in these platforms going forward.

Is a museum with 100,000 fans doing better than one with 10,000 fans? Maybe not if both have 5,000 fans from their home city. Worse, what if a considerable number of your Facebook fans were other museum professionals! But how would you discover this?

One very very simple thing you can do is to use the Facebook Ad Planner tool to interrogate and segment your fans (and those of others as well!).

To do this, go to any Facebook Fan Page you are an administrator for. (You can create a new one if you need). In the right hand column you will see an advertisement encouraging you to ‘Get more connections’. Click it.

Next you will land at a page that looks like this. Just click ‘Continue’.

Now the useful part.

Now on the screen you should have ‘Targeting’. Here’s where you become brutally aware of what happens to your data when you become a ‘fan’ of something, join a group on Facebook, or list an interest in your profile. Yes, you are now a target market.

You now need to select a country (and then you can drill down into a city or region). You can add up to 25 countries if you want and you can also tweak the demographic facets like ‘age’ and ‘gender’ if you want.

Now in ‘Likes and interests’ start typing and choose another organisation or topic. Once selected you will see the ‘Estimated reach’ box in the right hand column update. That’s the information you want.

Here’s some from our profile.

Now it looks like there might be 40 people in the UK or USA who express a ‘like’ for us but haven’t yet become ‘fans’ on the fan page.

And we could definitely reach more people in Sydney who like the Art Gallery of NSW but not yet the Powerhouse! And you can see how that also gives us an insight into the geographic segmentation of our friends over at the Art Gallery of NSW‘s near 10K fans, as well as a better comparative picture of how we are going. Not surprisingly The Art Gallery of NSW are doing a great job – much better than us!

Go on, try it out for yourself. Better to know how the tools you unwittingly contribute data to, work, than not.

Categories
User behaviour Web metrics

Which social web platforms create the most return visitors to our website?

I’m in Europe right now doing a slew of web analytics health checks, workshops and evaluations to help various institutions are get the most out of the their digital initiatives in a rapidly constricting financial environment. Everyone is rushing to figure out which initiatives are performing better for them than others – especially as decisions need to be made as to which ‘experiments’ are worth continuing and which have been ‘learning experiences’.

In several workshops so far the ‘return visitor’ has been highlighted as a valuable key user of digital resources. Return visitors, the argument goes, are more likely to be engaged with the organisation (and the ‘brand’), and also more likely, where geographically possible, to engage with the institution offline as well as online. And, at a time where we are all tweaking our digital content strategies, design and interfaces, they are also the visitors with whom we can measure the relative effectiveness of techniques.

And so one of the questions raised more than once has been – “which, out of Flickr, Wikipedia, Facebook and Twitter” – is best at turning casual visitors into return visitors?

Now obviously the intentions of visitors who come in from these third-party sites is going to differ (not to mention the difficulties in accurately tracking visitors from Twitter), but we’re interested in the broad patterns.

I did some digging through six months (March to August) of Powerhouse data and this is what I found.

Unsurprisingly Organic search generates 72.34% of site visitation. 20.36% of this traffic are return visitors.

Direct traffic (browser bookmarks, typing the URL, etc) generates 13.88% of site visitation. 16.05% of this are return visitors. Interestingly this skews low because of the inclusion of several very popular educational resources in curriculum kits – students follow a very task-oriented link given by their teacher and don’t look around or come back.

Third party referring sites (people following links from other websites) as a whole generate 13.21% of site visitation. 20.36% of this are return visitors.

So let’s break down those top referring sites and look at traffic coming in from Wikipedia, Flickr, Twitter, and Facebook. None of these are generating enormous volumes of traffic but there are significant differences between them.

Site % of total visits % repeat
Wikipedia 0.63% 11.95%
Flickr 0.28% 42.64%
Facebook 0.49% 32.74%
Twitter 0.18% 34.50%
All referrers 13.21% 20.36%
Overall (100%) 20.60%

Of the four sites we are interested in, Wikipedia delivers the most traffic. However it brings the lowest percentage of return visitors at only 11.95%. This is well below the site average and also well below the average for all referring sites.

Facebook is next and 32.74% of these are return visitors performing well above the site average. Flickr brings the most return visitors at 42.64% whilst Twitter brings also performs well at bringing return visitors at 34.50%.

So ranked in order of traffic volume Wikipedia is a clear winner but in terms of those valuable return visitors the list transforms with Flickr bringing the proportionally more returning visitors.

Flickr is delivering nearly 3.5x the return visitor proportions than Wikipedia and the two social communication platforms Facebook and Twitter, almost 3x as much.

Thinking about why Wikipedia performs so poorly as a source of return traffic, it is clear that there is a difference in the user intentions. A visitor coming in from Wikipedia is likely coming for additional information on a subject or topic. But it looks like there is minimal brand association with that information retrieval goal – they get the answer and don’t explore further at a later date. This is what I’d call the ‘trivia quiz’ visitor.

I looked at which Wikipedia articles were sending the most traffic and the top five were a little unexpected. Wikipedia articles in order of volume of traffic were on Thrust bearing, Easy edges, Powerhouse Museum, Crumpler, Liberty bodice and a long tail of several hundred others. Other than ‘Powerhouse Museum’ this traffic is the equivalent of the casual visit traffic we also receive via the long tail of search – but is less likely to return to the site later.

Informational websites deal increasingly with entirely commoditised content, and this throws up the issue of where to dedicate resources.

The effort expended in the more social web platforms – social communication platforms (Facebook, Twitter) and social object platforms like Flickr – is working to create more valuable return visitors than the informational sites like Wikipedia and organic long tail search.

I was a bit surprised by this result so I narrowed it down a bit and looked at only traffic from Sydney. Here’s the results.

Site % total Sydney visits % Sydney repeat
Wikipedia 0.35% 26.87%
Flickr 0.20% 32.31%
Facebook 0.82% 46.43%
Twitter 0.16% 37.34%
All referrers 12.81% 34.09%
Overall (100%) 34.18%

Sydney-only and Wikipedia performs much better in terms of generating return visitors – but is greatly outpaced as a traffic source by Facebook. Here we find that it is clearly the social communication platforms that are generating the repeat visitation as well as the volumes.

Of course the overall volumes here are very low so there is a fair degree of statistical error creeping in but this is something I’ll be keeping an eye on – I’m certainly interested in why Wikipedia is creating proportionally more repeat visitors in Sydney than globally and whether this correlates to some notion of ‘brand awareness’.

More questions than answers.

Categories
MW2010 User experience Web metrics

Tracking what gets ‘used’

Theres been a fair bit of excitement around the traps today about the revealing of Amazon’s tracking of highlighting on their Kindle devices.

In fact this sort of interaction tracking has been going on on the web for quite a while – but the Kindle example is one of the first where this data is being used to encourage serendipitous discovery and interest.

I started doing some work around this on the Powerhouse collection site in July last year and it forms the basis of the paper I presented at Museums and the Web this year (as well as briefly mentioning it at Webstock in February).

We’ve been trying to figure out alternative ways of measuring the success or otherwise of making large amounts of our content available on the web. Traditional web metrics just don’t cut it – millions of views of your content isn’t really helpful in improving the content you make available. And whilst qualitative research is invaluable it is generally expensive and just doesn’t scale.

So in July last year we started using a tool called Tynt Tracer.

What Tynt does is intercepts cut & paste using Javascript. It records what is copied, and, inserts into the buffer the license information and a unique hyperlink. We chose to use Tynt because it was the least intrusive and most anonymous of the options available to do the same task (there are quite a number of similar solutions out there). Tynt was also the option that made the least mention of ‘enforcement’ – which seems to be the selling point of the other options.

We aren’t interested in ‘enforcement’ or preventing visitors from cutting and pasting content – but we are primarily interested in learning about what parts of our content is the most useful to cut & pasters, and where it ends up so we can improve it and its structure.

Here’s what Tynt says about their service.

Tynt Insight anonymously detects when content is copied from your site, and can help determine what they are doing with it. At Tynt we believe content copying can be beneficial to the site owner. We find that most people copy content innocently because they are your fans. They copy content to either preserve it for themselves or to share it. Half of copied content is still shared by email because it is still the easiest and most familiar way to share content.

My paper explores how we applied this in a fair bit of detail as well as some of the findings of roughly six months’ worth of data. Suffice to say, it isn’t perfect and the paper ended up revealing that there is far less educational use of our collection in schools than we hoped for (education users being the ones we’d expect would most likely cut & paste!) – but that’s another blogpost.

Nearly 3 million words had been cut and pasted during the sample period. That’s possibly a better measure of the success, or ‘usefulness’, of our collection metadata than object views.

During a six-month period, 20,749 copies were made: 5% of these copies were images – predominantly thumbnails and, curiously, the Museum’s corporate logo; 36% (7,601) were copies of 7 words or less in length. Tynt calls these ‘search copies’ and implies that their likely use was for use in search. These search copies do not have licence and linkback text appended to them. The remaining 58% (12,608) were copies of greater than 7 words and thus had license and linkback details added to them. These 12,608 copies contained nearly 3 million copied words (2,906,330 words).

We’ve been looking at the resultant heatmaps that highlight the content that gets most cut and pasted. These offer the opportunity for us to learn and think about how we present and refine content for certain types of users.