Categories
API Collection databases Conceptual Interviews Metadata

Making use of the Powerhouse Museum API – interview with Jeremy Ottevanger

As part of a series of ‘things people do with APIs’ here is an interview I conducted with Jeremy Ottevanger from the Imperial War Museum in London. Jeremy was one of the first people to sign up for an API key for the Powerhouse Museum API – even though he was on the other side of the world.

He plugged the Powerhouse collection into a project he’s been doing in his spare time called Mashificator which combines several other cultural heritage APis.

Over to Jeremy.

Q – What is Mashificator?

It’s an experiment that got out of hand. More specifically, it’s a script that takes a bit of content and pulls back “cultural” goodies from museums and the like. It does this by using a content analysis service to categorise the original text or pull out some key words, and then using some of these as search terms to query one of a number of cultural heritage APIs. The idea is to offer something interesting and in some way contextually relevant – although whether it’s really relevant or very tangential varies a lot! I rather like the serendipitous nature of some of the stuff you get back but it depends very much on the content that’s analysed and the quirks of each cultural heritage API.

There are various outputs but my first ideas were around a bookmarklet, which I thought would be fun, and I still really like that way of using it. You could also embed it in a blog, where it will show you some content that is somehow related to the post. There’s a WordPress plugin from OpenCalais that seems to do something like this: it tags and categorises your post and pulls in images from Flickr, apparently. I should give it a go! Zemanta and Adaptive Blue also do widgets, browser extensions and so on that offer contextually relevant suggestions (which tend to be e-commerce related) but I’d never seen anything doing it with museum collections. It seemed an obvious mashup, and it evolved as I realised that it’s a good way to test-bed lots of different APIs.

What I like about the bookmarklet is that you can take it wherever you go, so whatever site you’re looking at that has content that intrigues you, you can select a bit of a page, click the bookmarklet and see what the Mashificator churns out.

Mashificator uses a couple of analysis/enrichment APIs at the moment (Zemanta and Yahoo! Terms Extractor) and several CH APIs (including the Powerhouse Museum of course!) One could go on and on but I’m not sure it’s worth while: at some point, if this is helpful to anyone, it will be done a whole lot better. It’s tempting to try to put a contextually relevant Wolfram Alpha into an overlay, but that’s not really my job, so although it would be quite trivial to do geographical entity extraction and show amap of the results, for example, it’s going too far beyond what I meant to do in the first place so I might draw the line there. On the other hand, if the telly sucks on Saturday night, as it usually does, I may just do it anyway.

Beside the bookmarklet, my favourite aspect is that I can rapidly see the characteristics of the enrichment and content web services.

Q – Why did you build it?

I built it because I’m involved with the Europeana project, and for the past few years I’ve been banging the drum for an API there. When they had an alpha API ready for testing this summer they asked people like me to come up with some pilots to show off at the Open Culture conference in October. I was a bit late with mine, but since I’d built up some momentum with it I thought I may as well see if people liked the idea. So here you go…

There’s another reason, actually, which is that since May (when I started at the Imperial War Museum) it’s been all planning and no programming so I was up for keeping my hand in a bit. Plus I’ve done very little PHP and jQuery in the past, so this project has given me a focussed intro to both. We’ll shortly be starting serious build work on our new Drupal-based websites so I need all the practice I can get! I still no PHP guru but at least I know how to make an array now…

Q – Most big institutions have had data feeds – OAI etc – for a long time now, so why do you think APIs are needed?

Aggregation (OAI-PMH‘s raison d’etre) is great, and in many ways I prefer to see things in one place – Europeana is an example. For me as a user it means one search rather than many, similarly for me as a developer. Individual institutions offering separate OPACs and APIs doesn’t solve that problem, it just makes life complicated for human or machine users (ungrateful, aren’t I?).

But aggregation has its disadvantages too: data is resolved to the lowest common denominator (though this is not inevitable in theory); there’s the political challenge of getting institutions to give up some control over “their” IP; the loss of context as links to other content and data assets are reduced. I guess OAI doesn’t just mean aggregation: it’s a way for developers to get hold of datasets directly too. But for hobbyists and for quick development, having the entirety of a dataset (or having to set up an OAI harvester) is not nearly as useful or viable as having a simple REST service to programme against, which handles all the logic and the heavy lifting. And conversely for those cases where the data is aggregated, that doesn’t necessarily mean there’ll be an API to the aggregation itself.

For institutions, having your own API enables you to offer more to the developer community than if you just hand over your collections data to an aggregator. You can include the sort of data an aggregator couldn’t handle. You can offer the methods that you want as well as the regular “search” and “record” interfaces, maybe “show related exhibitions” or “relate two items” (I really, really want to see someone do this!) You can enrich it with the context you see fit – take Dan Pett’s web service for the Portable Antiquities Scheme in the UK, where all the enrichment he’s done with various third party services feeds back into the API. Whether it’s worthwhile doing these things just for the sake of third party developers is an open question, but really an API is just good architecture anyway, and if you build what serve’s your needs it shouldn’t cost that much to offer it to other developers too – financially, at least. Politically, it may be a different story.

Q – You have spent the past while working in various museums. Seeing things from the inside, do you think we are nearing a tipping point for museum content sharing and syndication?

I am an inveterate optimist, for better or worse – that’s why I got involved with Europeana despite a degree of scepticism from more seasoned heads whose judgement I respect. As that optimist I would say yes, a tipping point is near, though I’m not yet clear whether it will be at the level of individual organisations or through massive aggregations. More and more stuff is ending up in the latter, and that includes content from small museums. For these guys, the technical barriers are sometimes high but even they are overshadowed by the “what’s the point?” barriers. And frankly, what is the point for a little museum? Even the national museum behemoths struggle to encourage many developers to build with their stuff, though there are honourable exceptions and it’s early days still – the point is that the difficulty a small museum might have in setting up an API is unlikely to be rewarded with lots of developers making them free iPhone apps. But through an aggregator they can get it in with the price.

One of my big hopes for Europeana was that it would give little organisations a path to get their collections online for the first time.
Unfortunately it’s not going to do that – they will still have to have their stuff online somewhere else first – but nevertheless it does give them easy access both to audiences and (through the API) to third party developers that otherwise would pay them no attention. The other thing that CHIN, Collections Australia, Digital NZ, Europeana and the like do, is offer someone big enough for Google and the link to talk to. Perhaps this in itself will end up with us settling on some de facto standards for machine-readable data so we can play in that pool and see our stuff more widely distributed.

As for individual museums, we are certainly seeing more and more APIs appearing, which is fantastic. Barriers are lowering, there’s arguably some convergence or some patterns emerging for how to “do” APIs, we’re seeing bold moves in licensing (the boldest of which will always be in advance of what aggregators can manage) and the more it happens the more it seems like normal behaviour that will hopefully give others the confidence to follow suit. I think as ever it’s a matter of doing things in a way that makes each little step have a payoff. There are gaps in the data and services out there that make it tricky to stitch together lots of the things people would like to do with CH content at the moment – for example, a paucity of easy and free to use web services for authority records, few CH thesuari, no historical gazetteers. As those gaps get filled in the use of museum APIs will gather pace.

Ever the optimist…

Q – What is needed to take ‘hobby prototypes’ like Mashificator to the next level? How can the cultural sector help this process?

Well in the case of the Mashificator, I don’t plan a next level. If anyone finds it useful I suggest they ask me for the code or do it themselves – in a couple of days most geeks would have something way better than this. It’s on my free hosting and API rate limits wouldn’t support it if it ever became popular, so it’s probably only ever going to live in my own browser toolbar and maybe my own super-low-traffic blog! But in that answer you have a couple things that we as a sector could do: firstly, make sure our rate limits are high enough to support popular applications, which may need to make several API calls per page request; secondly, it would be great to have a sandbox that a community of CH data devotees could gather around/play in. And thirdly, in our community we can spread the word and learn lessons from any mashups that are made. I think actually that we do a pretty good job of this with mailing lists, blogs, conferences and so on.

As I said before, one thing I really found interesting with this experiment was how it let me quickly compare the APIs I used. From the development point of view some were simpler than others, but some had lovely subtleties that weren’t really used by the Mashificator. At the content end, it’s plain that the V&A has lovely images and I think their crowd-sourcing has played its part there, but on the other hand if your search term is treated as a set of keywords rather than a phrase you may get unexpected results… YTE and Zemanta each have their own characters, too, which quickly become apparent through this. So that test-bed thing is really quite a nice side benefit.

Q – Are you tracking use of Mashificator? If so, how and why? Is this important?

Yes I am, with Google Analytics, just to see if anyone’s using it, and if when they come to the site they do more than just look at the pages of guff I wrote – do they actually use the bookmarklet? The answer is generally no, though there have been a few people giving it a bit of a work-out. Not much sign of people making custom bookmarklets though, so that perhaps wasn’t worthwhile! Hey, lessons learnt.

Q – I know you, like me, like interesting music. What is your favourite new music to code-by?

Damn right, nothing works without music! (at least, not me.) For working, I like to tune into WFMU, often catching up on archive shows by Irene Trudel, Brian Turner & various others. That gives me a steady stream of quality music familiar and new. As for recent discoveries I’ve been playing a lot (not necessarily new music, mind), Sharon van Etten (new), Blind Blake (very not new), Chris Connor (I was knocked out by her version of Ornette Coleman’s “Lonely Woman”, look out for her gig with Maynard Ferrguson too). I discovered Sabicas (flamenco legend) a while back, and that’s a pretty good soundtrack for coding, though it can be a bit of a rollercoaster. Too much to mention really but lots of the time I’m listening to things to learn on guitar. Lots of Nic Jones… it goes on.

Go give Mashificator a try!