Year: 2008

Some new technologies talked about at the Horizon.au Inaugural Meeting – July 2008

Post author By Seb Chan
Post date July 10, 2008
3 Comments on Some new technologies talked about at the Horizon.au Inaugural Meeting – July 2008

It has been an interesting day down in Melbourne brainstorming many of the technologies that might impact on the higher education sector in the next 5 years. This brainstorming is forming the basis of the upcoming Horizon.Au Report – a version of the Horizon Report tailored specifically for the Australian and New Zealand community.

The North American 2008 report is available from Horizon, and there is a special Museums Report coming very very soon too.

Metadata Tools

A web citation tool – dealing with impermanent references

Post author By Seb Chan
Post date July 9, 2008
5 Comments on A web citation tool – dealing with impermanent references

We’re all working hard to ensure that our own content is identified with persistent URLs – a referrer that will stand the test of time – but often when we are writing a paper we need to refer to someone else’s URL, most of which are not designed to be permanent.

Traditionally when we reference something on a website we put ‘accessed on X date’ but that is of little use to a reader who follows up a reference only to find the original has moved or gone.

That’s where WebCite comes in. WebCite is a bit like TinyUrl or any number of URL shortening services, a social bookmarking tool like Del.icio.us, combined with a snapshotting tool. It provides a ‘shorter’ URL and it also keeps a copy of the entire page you have cited in its archive. This means that readers can read the exact same page, as it was when you were referencing it, at any time into the future – even if that page changes regularly (like the front page of a newspaper website).

You can also add custom DC metadata.

Here’s a WebCite capture of the Sydney Morning Herald’s front page as it was at the time of this post. http://www.webcitation.org/5ZAbxFdgI

As you can see there are some problems in that it has been unable to capture the CSS to lay out the page properly, but for references to the text contained in a page it does a pretty good job.

Here’s a capture of an article from an online journal, D-Lib, which being predominantly text, works better. http://www.webcitation.org/5ZAcGpnPz

There’s even a bookmarklet to add to your browser toolbar to make capturing even easier. Otherwise use the service manually via their archiving submission page. A submission takes about 20 seconds to capture.

Conferences and event reports Interactive Media Young people & museums

Henry Jenkins – notes from CCI ‘Creating Value Between Commons and Commerce’ conference, Brisbane, 2008

Post author By Seb Chan
Post date June 28, 2008

I’ve been in Brisbane the last few days – presenting the Powerhouse Museum’s Creative Commons and public domain projects and also managed attend one day of the CCI’s conference ‘Creating Value Between Commons and Commerce‘. In amongst some truly awful examples of how not to use Powerpoint, there were some interesting presentations and papers.

Here’s the first of a set of notes scribed during the main sessions.

Web metrics

Google Trends does basic comparative metrics

Post author By Seb Chan
Post date June 22, 2008

Google Trends has started to allow domain level searches. This means that you can now pull up rough traffic figures, as calculated by Google, on any top level domain (subdomains like play.powerhousemuseum.com or artgallery.nsw.gov.au won’t work), and compare them to others. This moves Google Trends into territory covered by services like Compete, Quantcast (both US-centric) and, to a lesser extent, Hitwise.

Metadata Semantic Web

Collaborative collective classificiation – BBC Labs on using Wikipedia as metadata

Post author By Seb Chan
Post date June 14, 2008
1 Comment on Collaborative collective classificiation – BBC Labs on using Wikipedia as metadata

Chris Sizemore at the BBC’s Radio Labs demonstrates an experiment in automated metadata, much akin to Open Calais.

Sizemore has taken Wikipedia and has built a simple web application that uses Wikipedia to disambiguate entities in a block of text and suggest broad categories for the content. Because Wikipedia has broad coverage of topics and deep coverage of specific niches, it can provide, as Sizemore writes, for some areas (especially popular culture), a good enough data source for automated classification.

Here’s Sizemore’s methodology –

1. Download entire contents of the English language Wikipedia (careful, that’s a large 4GB+ xml file!)

2. Parse that compressed XML file into individual text files, one per Wikipedia article (and this makes things much bigger, to the tune of 20GB+, so make sure you’ve got the hard drive space cleared)

3. Use a Lucene indexer to create a searchable collection (inc. term vectors) of your new local Wikipedia text files, one Lucene document per Wikipedia article

4. Use Lucene’s ‘MoreLikeThis’ to compare the similarity of a chunk of your own text content to the Wikipedia documents in your new collection

5. Treat the ranked Wikipedia articles returned as suggested categories for your text

Basically what is going on here is that the text you wish to classify is compared to Wikipedia articles and the articles with the ‘closest match’ in terms of content, have their URLs thrown back as potential classification categories.

Combine this with Open Calais and there will be some very interesting results across a broad range of text datasets.

As regular readers will know, we’ve been experimenting quite a bit with Open Calais at the Powerhouse with some exciting initial results. We’ve been looking at the potential of Calais in combination with other data sources including Wikipedia/dbPedia/Freebase and we’ll be watching Sizemore’s experiment with interest.

Perhaps my throwaway line in recent presentations that ‘humans should never have to create metadata’ might actually be becoming closer to a reality.

Collection databases Search Web metrics

OPAC2.0 – Examining Delta Goodrem’s dress again / more on search

Post author By Seb Chan
Post date June 14, 2008
2 Comments on OPAC2.0 – Examining Delta Goodrem’s dress again / more on search

The most popular object in our online collection database is still a dress worn by Delta Goodrem.

I’ve previously written about how the popularity of this dress was driven in part by coverage on a number of Delta Goodrem fan forums. But this neglects the criticality of search. Google has always driven traffic to this object and looking at last months analytics where Google search represented 86% of referrers to the object, the top 5 keywords used to discover this dress were these –

1. lisa ho – 11.24%
2. evening dresses – 4.55%
3. lisa ho dresses – 2.71%
4. formal dress – 2.13%
5. chiffon dress – 1.07%

Because of the frequency of the keywords ‘lisa ho’ in the title, description and body text of the object record, and the trusted PageRank of the Powerhouse Museum domain, we rank 11th in Google search results for ‘lisa ho’; 2nd for ‘lisa ho dress’; and 4th for ‘lisa ho dresses’.

Fortunately for us, this external traffic isn’t fleeting. Visitors to this object view almost double the average number of pages viewed by others on our site; and they spend more time on the site too.

Looking at the internal search terms for that same object the results are very different.

1. Australian fashion (also a subject classification)
2. tennis (user tag)
3. lisa ho
4. delta goodrem
5. elegant (user tag)

External search has effectively driven nearly 10 times the traffic of internal users to this object. It has also brought audiences to the object who have very little behavioural similarities to those who search within the context of our own site (internal search). This creates many new challenges in terms of usability and user experience.

Over the entire collection there are pockets of objects for which the difference between internal and external search is not as great however this needs much greater data analysis (and may be the subject of a future post or paper).

Search User experience

SEO (search engine optimisation) basics and museums

Post author By Seb Chan
Post date June 14, 2008

One of the most common questions asked over the past few years has been “how do I get the best out of SEO for my museum?”. This comes up in casual conversations and without fail at conferences. We are all becoming increasingly aware of the higher and higher proportion of our traffic coming via search, and that as content on the web grows exponentially the chance of our content lying buried deep in search engine results increases.

Often the problem for museums with search relates to the diversity of their web presence. Other than our brand name, our content, especially those held in collections, is often very diverse and our exhibitions equally so. I’ve previously written about the need to tackle exhibition naming so that at least on the web exhibition titles are more ‘search-friendly’, but this is very tricky to apply to collection and education content.

The news media have taken to rewriting headlines for search – knowing that timeliness and findability are crucial to their success of their content – Scott Gledhill’s fantastic SEO presentation from Web Directions South 2007 is an eye-opening look at how News Limited journalists in Australia are maximising the reach of their articles (link is to a full Slidecast).

Is this possible with museum content?

Should (and can) curators, education staff, marketing staff, get a quick dashboard that reports the web performance of the content they are creating? Should (and can) they iterate their content, improving it, guided by real world performance? If museums are ‘slow media’, then is performance-guided content creation even a desirable outcome? (Update: do we really want to get to a situation like this parodied in the Slate?)

Maybe you need to tackle the basics first – getting your key content more visible. So where do you start?

Fortunately there are plenty of great SEO resources on the web and plenty of ways of testing SEO performance for free or very low cost. Last month Web Designers Wall posted a simple introduction to SEO which is worthwhile reading for the very basics. This along with Scott’s presentation should provide a good start point.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: