Fresh & New(er)

discussion of issues around digital media and museums by Seb Chan

Fresh & New(er) header image 2

The museum APIs are coming – some thoughts on interoperability

May 28th, 2008 by Seb Chan

At MW08 there was the beginnings of a push amongst the technically oriented for the development of APIs for museum data, especially collections. Driven in part by discussions and early demonstrations of semantic web applications in museums, the conceptual work of Ross Parry, and the presence of Eric Miller and Brian Sletten of Zepheria; Aaron Straup Cope and George Oates of Flickr, MW08 might well be a historic turning point for the sector in terms of data interoperability and experimentation.

Since April there has been a lot of movement, especially in the UK.

The ‘UK alpha tech team’ of Mike Ellis, Frankie Roberto, Fiona Romeo, Jeremy Ottevanger, Mia Ridge are leading the charge all working on various ways of connecting, extracting and visualising data from the Science Museum, Museum of London and the National Maritime Museum in new ways. Together with them and a few other UK commercial sector folk, I’ve been contributing to a strategy wiki around making a case for APIs in museums.

Whilst the tech end of things is (comparatively) straight forward, the strategic case for an API is far more complex to make. As we fiddle, though, others make significant progress.

Already a community project, dbPedia, has taken the content of Wikipedia and made it available as an open database. What this means is that it is now possible to make reasonably complex semantic queries of Wikipedia – something I’m yet to see done on a museum collection. There are a whole range of examples and mini-web applications already built to demonstrate queries like “people born in Paris” or “people influenced by Nietzsche“. More than this, though, are the exciting opportunities to use Wikipedia’s data and combine it with other datasets.

What should be very obvious is that if Wikipedia’s dataset is made openly available for combining with other datasets then, much as Wikipedia already draws audiences away from museum sites, then their dataset made usable in other ways, will draw even more away. You might well ask why similar complex queries are so hard to make in our own collection databases? “Show me all the artwork influenced by Jackson Pollock?”

On June 19 the MCG’s Museums on the Web UK takes place at the University of Leicester with the theme of “Integrate, federate, aggregate“. There’s going to be some lovely presentations there – I expect Fiona Romeo will be demoing some lovely work they’ve been doing and Frankie Roberto will be reprising his high entertaining MW08 presentation too.

The day before, like the MCGUK07 conference, there will be a mashup day beforehand. Last year’s mashup day produced a remarkable number of quick working prototypes drawing on data sources provided by the 24 Hour Museum (now Culture24). This year the data looks like it will be coming from the collection databases of some of the UK nationals.

Already Box UK and Mike Ellis have whipped up a really nice demonstration of data combining – done by scraping the websites of the major museums with a little bit of PHP code. Even better, the site provides XML feeds and I expect that it will be a major source of mashups at MCG UK.

I like the FAQ that goes along with the site. Especially this –

Q: Doesn’t this take traffic away from the individual sites?

We don’t think so, but not many studies have been done into how “off-site” browsing affects the “in-site” metrics. Already, users will be searching for, consuming, and embedding your images (and other content) via aggregators such as Google Images. This is nothing new.

Also, ask yourself how much of your current traffic derives from users coming to explicitly browse your online collections?

The aim is that by syndicating your content out in a re-usable manner, whilst still retaining information about its source, an increasing number of third-party applications can be built on this data, each addressing specific user needs. As these applications become widely used, they drive traffic to your site that you otherwise wouldn’t have received: “Not everyone who should be looking at collections data knows that they should be looking at collections data”.

I’ve spoken and written about this issue of metrics previously, and these and the control issues need to be sorted out if there is going to be any real traction in the sector.

Unlike the New York Times (who apparently announced an API recently), and the notable commercial examples like Flickr, the museum sector doesn’t have a working (business) model for their collections other than a) exhibitions, b) image sales and possibly c) research services.

Now back to that semantic query, wouldn’t it be useful if we could do this – “Play me all the music videos of singles that appear on albums whose record cover art was influenced by Jackson Pollock?”. This could, of course be done by combining the datasets of, say the Tate, Last.FM, Amazon and YouTube – the missing link being the Tate.

Tags: 6 Comments

  • I’m starting to put my money where my mouth is by releasing Science Museum APIs at http://api.sciencemuseum.org.uk/

    There will be more coming over the next few weeks. I’d be interested to hear any feedback either on the feeds, the documentation or the basic premise!

  • P.S I love the idea of being an ‘alpha tech team’ – but surely you should be including yourself in this team?!

  • We’ve been working on a semantic wiki called placeography.org for the last several months. Feel free to add in some articles for your favorite places and pull them in through an API. It has a special page to export to RDF, but the rendered page looks only like it is pointing to an ontology so I don’t know if it is working right.

  • Nice post, Seb. You’re right, it is an exciting time and we’re lucky to have amongst our number people who are both thinkers and doers, notably yourself. As Mike is always banging on, there’s only so much dithering you should do, so much worrying about which standard, which audience and exactly what the payoff will be, but at the same time we need a certain amount of that. As you say, if we’re going to invest significant resources in APIs then we have to be able to make a convincing case to the hands holding the purse strings.
    That said, there’s also the trojan horse approach, in that building around a set of APIs internally has a set of sound arguments of its own, and ones that can be directly translated from the world of “business”. This should therefore be easier than making a case couched in museological terms. Once the internal architecture is prepared like this, opening the API to the public is a political decision more than a resource one.

    @Rose: I’m off to look at placeography now, sounds cool. I’m hoping I’ll also find a way to push up sets of data…?

  • Seb, Jeremy — I agree with y’all. I’m challenged right now by a project we’ve embarked upon to search cross collections of cultural organizations in our state and a few bordering states. The project, Great Rivers Network, greatriversnetwork.org (but there’s not much at that web site yet) needs to deal with content that goes beyond museum collections databases and OAI. It needs to deal with one-off stuff like our own indexes to birth and death certificates. You can see it working at our web site – http://www.mnhs.org/peoplefinder, a people-oriented search, and a google-type search by entering a search term in our search boxes at the top right of the header. We’re working on figuring out the “threshold of participation” for our partners … it may be that they need to supply content in OAI, PastPerfect (a collections database for smaller museums), Omeka, and perhaps other methods such as Sitemaps.org protocol. Moreover, after attending the Eric Miller’s and Brian Sletten’s workshop at MW2008, we’ll be exploring semantic web technologies. Bottom line, I’ll be following this conversation closely. Thank you Seb for having a useful blog that hosts such conversations.

    Jeremy — Let me know what you think of placeography. As for uploading sets of data, I’m not sure about that … this is MediaWiki software with semantic web and semantic form plug-ins applied. I know content can be exported and there is RDF output, but I don’t see a special page for import. My searches from several months back indicated imports for html pages, though I don’t know how good they are.

  • Rose, I like the placeography and I hope it builds momentum beyond Minnesota. The People Finder search engine is great, and you’ve gathered a huge amount of data. Have you got any plans to put the person data in any exportable or interoperable form? GEDCOM might be attractive to the family history community.