Categories
API Conceptual

Things that didn’t get made #754 – the ‘eBay/museum API valuation’ web service

One of the things that is most commonly asked of a museum’s collection is “so, how much is it worth?”.

In an art museum context this question is usually asked with an air of incredulity – as in “That much? Really? For that?”. In a history museum it is often asked because the inquisitive person has something similar sitting gathering dust in their attic or shed.

In both situations the museum is mute. And with good reason – even if it sometimes results in uncomfortable exchanges.

So one of the digital products that sat unmade but staring everyone in the face at the Powerhouse was an eBay/museum API mashup. The idea was that ‘recent prices’ would be shown just like, say, Discogs does for its own marketplace.

discogs-price-index
(example Discogs sale history)

It made a lot of sense for much of the social history collection. We even talked internally about how many public enquiries such a service would reduce for the museum.

But these things can’t be made inside an institution.

Now harvesting the auction house sales prices from Blouin’s Art Sales Index and making a browser plugin that revealed recent sale prices as you hovered over artist names on art museum websites, would be a thing. In fact I’m sure it is already on Blouin’s roadmap.

But more useful and less provocative would be to build that more prosaic, less political, and more useful, social history collection eBay lookup service. Think of what it could do for thrift store hunts.

This came to mind again as I was reading one of Dan Hon’s recent daily letters (a veritable treasure trove). Dan mentioned, in passing, Amazon’s Flow app (iOS and Android)- “the idea of being able to point a camera at anything and being able to find out its current worth via a simple lookup on Amazon Marketplace or eBay”. Right now, Flow is aimed at buying new consumer goods and isn’t about secondhand items, but it won’t be long.

It would make for a nice two day project for a student . . . just not one working inside a museum. DPLA or Europeana APIs, anyone?

Categories
API Collection databases

More on museum datasets, un-comprehensive-ness, data mining

(Another short response post)

Thus far we’ve not had much luck with museum datasets.

Sure, some of us have made our own internal lives easier by developing APIs for our collection datasets, or generated some good PR by releasing them without restrictions. In a few cases enthusiasts have made mobile apps for us, or made some quirky web mashups. These are fine and good.

But the truth is that our data sucks. And by ‘our’ I mean the whole sector.

Earlier in the year when Cooper-Hewitt released their collection data on Github under a Creative Commons Zero license, we were the first in the Smithsonian family to do so. But as PhD researcher Mia Ridge found after spending a week in our offices trying to wrangle it, the data itself was not very good.

As I said at the time of release,

Philosophically, too, the public release of collection metadata asserts, clearly, that such metadata is the raw material on which interpretation through exhibitions, catalogues, public programmes, and experiences are built. On its own, unrefined, it is of minimal ‘value’ except as a tool for discovery. It also helps remind us that collection metadata is not the collection itself.

One of the reasons for releasing the metadata was simply to get past the idea that it was somehow magically ‘valuable’ in its own right. Curators and researchers know this already – they’d never ‘just rely on metadata’, they always insist on ‘seeing the real thing’.

Last week Jasper Visser pointed to one of the recent SIGGRAPH 2012 presentations which had developed an algorithm to look at similarities in millions of Google Street View images to determine ‘what architectural elements of a city made it unique’. I and many others (see Suse Cairns) loved the idea and immediately started to think about how this might work with museum collections – surely something must be hidden amongst those enormous collections that might be revealed with mass digitisation and documentation?

I was interested a little more than most because one of our curators at Cooper-Hewitt had just blogged about a piece of balcony grille in the collection from Paris. In the blogpost the curator wrote about the grille but, as one commenter quickly pointed out, didn’t provide a photo of the piece in its original location. Funnily enough, a quick Google search for the street address in Paris from which the grille had been obtained quickly revealed not only Google Street View of the building but also a number of photos on Flickr of the building specifically discussing the same architectural features that our curator had written about. Whilst Cooper-Hewitt had the ‘object’ and the ‘metadata’, the ‘amateur web’ held all the most interesting context (and discussion).

So then I began thinking about the possibilities for matching all the architectural features from our collections to those in the Google Street View corpus . . .

But the problem with museum collections is that they aren’t comprehensive – even if their data quality was better and everything was digitised.

As far as ‘memory institutions’ go, they are certainly no match for library holdings or archival collections. Museums don’t try to be comprehensive, and at least historically they haven’t been able to even consider being so. Or, as I’ve remarked before, it is telling that the memory institution that ‘acquired’ the Twitter archive was the Library of Congress and not a social history museum.

Categories
API Collection databases Search

Museum collection meets library catalogue: Powerhouse collection now integrated into Trove

The National Library of Australia’s Trove is one of those projects that it is only after it is built and ‘live in the world’ that you come to understand just how important it is. At its most basic,Trove provides a meta-search of disparate library collections across Australia as well as the cultural collections of the National Library itself. Being an aggregator it brings together a number of different National Library products that used to exist independently under the one Trove banner such as the very popular Picture Australia.

Not only that, Trove,has a lovely (and sizeable) user community of historians, genealogists and enthusiasts that diligently goes about helping transcribe scanned newspapers, connect up catalogue records, and add descriptive tags to them along with extra research.

Last week Trove ingested the entirety of the Powerhouse’s digitised object collection. Trove had the collection of the Museum’s Research Library for a while but now they have the Museum’s objects too.

So this now means that if, in Trove, you are researching Annette Kellerman you also come across all the Powerhouse objects in your search results too – not just books about Kellerman but also her mermaid costume and other objects.

The Powerhouse is the first big museum object collection to have been ingested by Trove. This is important because over the past 12 months Trove has quickly become the first choice of the academic and research communities not to mention those family historians and genealogists. As one of the most popular Australian Government-run websites, Trove has become the default start point for these types of researchers it makes sense that museum collections need to be well represented in it.

The Powerhouse had been talking about integrating with Trove and its predecessor sub-projects for at least the last five years. Back in the early days the talk was mainly about exposing our objects records using OAI, but Trove has used the Powerhouse Collection API to ingest. The benefits of this have been significant – and surprising. Much richer records have been able to be ingested and Trove has been able to merge and adapt fields using the API as well as infer structure to extract additional metadata from the Powerhouse records. Whilst this approach doesn’t scale to other institutions (unless others model their API query structure on that of the Powerhouse), it does give end-users access to much richer records on Trove.

After Trove integration quietly went live last week there was a immediately noticeable flow of new visitors to collection records from Trove. And as Trove has used the API these visitors are able to be accurately attributed to Trove for their origin. The Powerhouse will be keeping an eye on how these numbers grow and what sorts of collection areas Trove is bringing new interest to – and if these interests differ to those arriving at collection records on the Powerhouse site through organic search, onsite search, or from other places that have integrated the Powerhouse collection as well such as Digital NZ.

Stage two of Trove integration – soon – is planned to allow the Powerhouse to ingest any user generated metadata back into the Powerhouse’s own site – much in the way it had ingested Flickr tags for photographs that are also in the Commons on Flickr.

This integration also signals the irreversible blending of museum and library practice in the digital space.

Only time will tell if this delivers more value to end users than expecting researchers to come to institutional websites. But I expect that this sort of merging – much like the expanding operations of Europeana – do suggest that in the near future museum collections will need to start offering far more than a ‘rich catalogue record’ online to pull visitors in from aggregator products (and, ‘communities of practice’) like Trove to individual institutional websites.

Categories
API Collection databases Metadata open content Semantic Web

Things clever people do with your data #65535: Introducing ‘Free Your Metadata’

Last year Seth van Hooland at the Free University Brussels (ULB) approached us to look at how people used and navigated our online collection.

A few days ago Seth and his colleague Ruben Verborgh from the University Ghent launched Free Your Metadata – a demonstrator site for showing how even irregular metadata can have valued to others and how, if it is released rather than clutched tightly onto (until that mythical day when it is ‘perfect’), it can be cleaned up and improved using new software tools.

What’s awesome is that Seth & Ruben used the Powerhouse’s downloadable collection datafile as the test data for the project.

Here’s Seth and his team talking about the project.

F&N: What made the Powerhouse collection attractive for use as a data source?

Number one, it’s available for everyone and therefore our experiment can be repeated by others. Otherwise, the records are very representative for the sector.

F&N: Was the data dump more useful than the Collection API we have available?

This was purely due to the way Google Refine works: on large amounts of data at once. But also, it enables other views on the data, e.g., to work in a column-based way (to make clusters). We’re currently also working on a second paper which will explain the disadvantages of APIs.

F&N: What sort of problems did you find with our collection?

Sometimes really broad categories. Other inconveniences could be solved in the cleaning step (small textual variations, different units of measurement). All issues are explained in detail in the paper (which will be published shortly). But on the whole, the quality is really good.

F&N: Why do you think museums (and other organisations) have such difficulties doing simple things like making their metadata available? Is there a confusion between metadata and ‘images’ maybe?

There is a lot of confusion about what the best way is to make metadata available. One of the goals of the Free Your Metadata initiative, is to put forward best practices to do this. Institutions such as libraries and museums have a tradition to only publish information which is 100% complete and correct, which is more or less impossible in the case of metadata.

F&N: What sorts of things can now be done with this cleaned up metadata?

We plan to clean up, reconcile, and link several other collections to the Linked Data Cloud. That way, collections are no longer islands, but become part of the interlinked Web. This enables applications that cross the boundaries of a single collection. For example: browse the collection of one museum and find related objects in others.

F&N: How do we get the cleaned up metadata back into our collection management system?

We can export the result back as TSV (like the original result) and e-mail it. Then, you can match the records with your collection management system using records IDs.

Go and explore Free Your Metadata and play with Google Refine on your own ‘messy data’.

If you’re more nerdy you probably want to watch their ‘cleanup’ screencast where they process the Powerhouse dataset with Google Refine.

Categories
API Collection databases

Powerhouse Object Name Thesaurus now available via our API!

Luke Dearnley is at LOD-LAM this week and he and Carlos Arroyo are pleased to publicly announce that the Powerhouse Object Name Thesaurus is now available through our API.

The Object Name Thesaurus was developed by the Powerhouse Museum to standardise the terms used to describe its own collection. It was first published in 1995 as the Powerhouse Museum Collection Thesaurus. Since then, many new terms have been added to the thesaurus within the Powerhouse’s collection information and management system. The print version has long been popular with collecting institutions to assist in the documentation of their own collections.

Whilst you have been able to download the thesaurus as a PDF for a fair while, the API now makes it possible to build applications on top of the thesaurus to do things like explain terms or even expand the search on your own website to show results from ‘related or child terms’. And of course, if you’ve built applications using the Powerhouse Collection you can now show related parent and child objects. The thesaurus, like the rest of the API defaults to a CC-BY-NC license although you can approach the Museum for a variation on request.

The hierarchical structure of the thesaurus assists in searching. By organising object names, the relationships between objects can be made explicit. Object names are organised according to their hierarchical, associative or equivalence relationships. The object name thesaurus allows for more than one broader term for each object name. Any term is permitted to have multiple broader terms, for example ‘Bubble pipes’ has the broader terms of ‘Pipes’ and ‘Toys’. There is no single hierarchy in which an object name is located, enabling it to by found by searchers approaching with different concepts in mind.

Here’s an example of the sort of return you can now get from the API.

{
    "status": 200, 
    "end": 50, 
    "start": 0, 
    "result": 50, 
    "terms": [
        {
            "status": "APPROVED", 
            "scope_notes": "Any of a variety of brushes used to remove dirt and lint from clothing.", 
            "term": "Clothes brushes", 
            "num_items": 4, 
            "num_narrower_items": 0, 
            "relations": {
                "narrower": [
                    {
                        "status": "APPROVED", 
                        "scope_notes": null, 
                        "term": "Hat brushes", 
                        "num_items": 2, 
                        "num_narrower_items": 23, 
                        "id": 5104
                    }
                ], 
                "broader": {
                    "status": "APPROVED", 
                    "scope_notes": null, 
                    "term": "Laundry equipment", 
                    "num_items": 11, 
                    "num_narrower_items": 0, 
                    "id": 1189
                }, 
                "related": {
                    "status": "APPROVED", 
                    "scope_notes": "Used to remove dust and dirt from clothing by beating.", 
                    "term": "Clothes beaters", 
                    "num_items": 0, 
                    "num_narrower_items": 0, 
                    "id": 2802
                }

The code snippet above shows the usage of terms (sometimes a bit like a definition) and the broader/narrower relationships between the terms themselves.

Laundry equipment is a broader term under which Clothes brushes sits. Clothes brushes are used as “Any of a variety of brushes used to remove dirt and lint from clothing.” and they have a single narrower term Hat brushes.

Not only that, but Clothes brushes are related to Clothes beaters which are “Used to remove dust and dirt from clothing by beating”.

If you were, say, running a collection search (or even an ecommerce system) for old washing machines and related equipment your application could use the Thesaurus in the API to make recommendations on your own site using the broader/narrower terms from our system. In that sense a user searching for “hat brushes” on your website could also be expanded to show them results for “clothes brushes” and “clothes beaters”.

And of course, you can also get the Powerhouse objects under each of these categories.

Rough documentation is available (with better documentation coming soon).

We’ll be adding to this over the coming months and we’d love your thoughts on how this might be useful to you in your own applications.

Categories
API Collection databases Developer tools Museum blogging Tools

Powerhouse Museum collection WordPress plugin goes live!

Today the first public beta of our WordPress collection plugin was released into the wild.

With it and a free API key anyone can now embed customised collection objects in grids in their WordPress blog. Object grids can be placed in posts and pages, or even as a sidebar widget – and each grid can have different display parameters and contents. It even has a nice friendly backend for customising, and because we’re hosting it through WordPress, when new features are added it will be able to be auto-upgraded through your blog’s control panel!

Here it is in action.

So, if you have a WordPress blog and feel like embedding some objects, download it, read the online documentation, and go for it.

(Update 22/1/11: I’ve added a new post explaining the backstory and rationale for those who are interested)

Categories
API Interviews

Quick interview with Amped Powerhouse API winners – Andrea Lau & Jack Zhao

Andrea Lau & Jack Zhao were the winners of the Powerhouse Museum challenge at the recent Amped hack day organised by Web Directions in Sydney.

As part of their prize they won a basement tour to see all the things that the Powerhouse doesn’t have out on display. Renae Mason, senior online producer at the Museum, bailed them up for a quick Q&A in the noisy confines of the basement.

Apologies for the noisy audio! Museum storage facilities can be surprisingly loud places!

Categories
API Collection databases Conceptual Interviews Metadata

Making use of the Powerhouse Museum API – interview with Jeremy Ottevanger

As part of a series of ‘things people do with APIs’ here is an interview I conducted with Jeremy Ottevanger from the Imperial War Museum in London. Jeremy was one of the first people to sign up for an API key for the Powerhouse Museum API – even though he was on the other side of the world.

He plugged the Powerhouse collection into a project he’s been doing in his spare time called Mashificator which combines several other cultural heritage APis.

Over to Jeremy.

Q – What is Mashificator?

It’s an experiment that got out of hand. More specifically, it’s a script that takes a bit of content and pulls back “cultural” goodies from museums and the like. It does this by using a content analysis service to categorise the original text or pull out some key words, and then using some of these as search terms to query one of a number of cultural heritage APIs. The idea is to offer something interesting and in some way contextually relevant – although whether it’s really relevant or very tangential varies a lot! I rather like the serendipitous nature of some of the stuff you get back but it depends very much on the content that’s analysed and the quirks of each cultural heritage API.

There are various outputs but my first ideas were around a bookmarklet, which I thought would be fun, and I still really like that way of using it. You could also embed it in a blog, where it will show you some content that is somehow related to the post. There’s a WordPress plugin from OpenCalais that seems to do something like this: it tags and categorises your post and pulls in images from Flickr, apparently. I should give it a go! Zemanta and Adaptive Blue also do widgets, browser extensions and so on that offer contextually relevant suggestions (which tend to be e-commerce related) but I’d never seen anything doing it with museum collections. It seemed an obvious mashup, and it evolved as I realised that it’s a good way to test-bed lots of different APIs.

What I like about the bookmarklet is that you can take it wherever you go, so whatever site you’re looking at that has content that intrigues you, you can select a bit of a page, click the bookmarklet and see what the Mashificator churns out.

Mashificator uses a couple of analysis/enrichment APIs at the moment (Zemanta and Yahoo! Terms Extractor) and several CH APIs (including the Powerhouse Museum of course!) One could go on and on but I’m not sure it’s worth while: at some point, if this is helpful to anyone, it will be done a whole lot better. It’s tempting to try to put a contextually relevant Wolfram Alpha into an overlay, but that’s not really my job, so although it would be quite trivial to do geographical entity extraction and show amap of the results, for example, it’s going too far beyond what I meant to do in the first place so I might draw the line there. On the other hand, if the telly sucks on Saturday night, as it usually does, I may just do it anyway.

Beside the bookmarklet, my favourite aspect is that I can rapidly see the characteristics of the enrichment and content web services.

Q – Why did you build it?

I built it because I’m involved with the Europeana project, and for the past few years I’ve been banging the drum for an API there. When they had an alpha API ready for testing this summer they asked people like me to come up with some pilots to show off at the Open Culture conference in October. I was a bit late with mine, but since I’d built up some momentum with it I thought I may as well see if people liked the idea. So here you go…

There’s another reason, actually, which is that since May (when I started at the Imperial War Museum) it’s been all planning and no programming so I was up for keeping my hand in a bit. Plus I’ve done very little PHP and jQuery in the past, so this project has given me a focussed intro to both. We’ll shortly be starting serious build work on our new Drupal-based websites so I need all the practice I can get! I still no PHP guru but at least I know how to make an array now…

Q – Most big institutions have had data feeds – OAI etc – for a long time now, so why do you think APIs are needed?

Aggregation (OAI-PMH‘s raison d’etre) is great, and in many ways I prefer to see things in one place – Europeana is an example. For me as a user it means one search rather than many, similarly for me as a developer. Individual institutions offering separate OPACs and APIs doesn’t solve that problem, it just makes life complicated for human or machine users (ungrateful, aren’t I?).

But aggregation has its disadvantages too: data is resolved to the lowest common denominator (though this is not inevitable in theory); there’s the political challenge of getting institutions to give up some control over “their” IP; the loss of context as links to other content and data assets are reduced. I guess OAI doesn’t just mean aggregation: it’s a way for developers to get hold of datasets directly too. But for hobbyists and for quick development, having the entirety of a dataset (or having to set up an OAI harvester) is not nearly as useful or viable as having a simple REST service to programme against, which handles all the logic and the heavy lifting. And conversely for those cases where the data is aggregated, that doesn’t necessarily mean there’ll be an API to the aggregation itself.

For institutions, having your own API enables you to offer more to the developer community than if you just hand over your collections data to an aggregator. You can include the sort of data an aggregator couldn’t handle. You can offer the methods that you want as well as the regular “search” and “record” interfaces, maybe “show related exhibitions” or “relate two items” (I really, really want to see someone do this!) You can enrich it with the context you see fit – take Dan Pett’s web service for the Portable Antiquities Scheme in the UK, where all the enrichment he’s done with various third party services feeds back into the API. Whether it’s worthwhile doing these things just for the sake of third party developers is an open question, but really an API is just good architecture anyway, and if you build what serve’s your needs it shouldn’t cost that much to offer it to other developers too – financially, at least. Politically, it may be a different story.

Q – You have spent the past while working in various museums. Seeing things from the inside, do you think we are nearing a tipping point for museum content sharing and syndication?

I am an inveterate optimist, for better or worse – that’s why I got involved with Europeana despite a degree of scepticism from more seasoned heads whose judgement I respect. As that optimist I would say yes, a tipping point is near, though I’m not yet clear whether it will be at the level of individual organisations or through massive aggregations. More and more stuff is ending up in the latter, and that includes content from small museums. For these guys, the technical barriers are sometimes high but even they are overshadowed by the “what’s the point?” barriers. And frankly, what is the point for a little museum? Even the national museum behemoths struggle to encourage many developers to build with their stuff, though there are honourable exceptions and it’s early days still – the point is that the difficulty a small museum might have in setting up an API is unlikely to be rewarded with lots of developers making them free iPhone apps. But through an aggregator they can get it in with the price.

One of my big hopes for Europeana was that it would give little organisations a path to get their collections online for the first time.
Unfortunately it’s not going to do that – they will still have to have their stuff online somewhere else first – but nevertheless it does give them easy access both to audiences and (through the API) to third party developers that otherwise would pay them no attention. The other thing that CHIN, Collections Australia, Digital NZ, Europeana and the like do, is offer someone big enough for Google and the link to talk to. Perhaps this in itself will end up with us settling on some de facto standards for machine-readable data so we can play in that pool and see our stuff more widely distributed.

As for individual museums, we are certainly seeing more and more APIs appearing, which is fantastic. Barriers are lowering, there’s arguably some convergence or some patterns emerging for how to “do” APIs, we’re seeing bold moves in licensing (the boldest of which will always be in advance of what aggregators can manage) and the more it happens the more it seems like normal behaviour that will hopefully give others the confidence to follow suit. I think as ever it’s a matter of doing things in a way that makes each little step have a payoff. There are gaps in the data and services out there that make it tricky to stitch together lots of the things people would like to do with CH content at the moment – for example, a paucity of easy and free to use web services for authority records, few CH thesuari, no historical gazetteers. As those gaps get filled in the use of museum APIs will gather pace.

Ever the optimist…

Q – What is needed to take ‘hobby prototypes’ like Mashificator to the next level? How can the cultural sector help this process?

Well in the case of the Mashificator, I don’t plan a next level. If anyone finds it useful I suggest they ask me for the code or do it themselves – in a couple of days most geeks would have something way better than this. It’s on my free hosting and API rate limits wouldn’t support it if it ever became popular, so it’s probably only ever going to live in my own browser toolbar and maybe my own super-low-traffic blog! But in that answer you have a couple things that we as a sector could do: firstly, make sure our rate limits are high enough to support popular applications, which may need to make several API calls per page request; secondly, it would be great to have a sandbox that a community of CH data devotees could gather around/play in. And thirdly, in our community we can spread the word and learn lessons from any mashups that are made. I think actually that we do a pretty good job of this with mailing lists, blogs, conferences and so on.

As I said before, one thing I really found interesting with this experiment was how it let me quickly compare the APIs I used. From the development point of view some were simpler than others, but some had lovely subtleties that weren’t really used by the Mashificator. At the content end, it’s plain that the V&A has lovely images and I think their crowd-sourcing has played its part there, but on the other hand if your search term is treated as a set of keywords rather than a phrase you may get unexpected results… YTE and Zemanta each have their own characters, too, which quickly become apparent through this. So that test-bed thing is really quite a nice side benefit.

Q – Are you tracking use of Mashificator? If so, how and why? Is this important?

Yes I am, with Google Analytics, just to see if anyone’s using it, and if when they come to the site they do more than just look at the pages of guff I wrote – do they actually use the bookmarklet? The answer is generally no, though there have been a few people giving it a bit of a work-out. Not much sign of people making custom bookmarklets though, so that perhaps wasn’t worthwhile! Hey, lessons learnt.

Q – I know you, like me, like interesting music. What is your favourite new music to code-by?

Damn right, nothing works without music! (at least, not me.) For working, I like to tune into WFMU, often catching up on archive shows by Irene Trudel, Brian Turner & various others. That gives me a steady stream of quality music familiar and new. As for recent discoveries I’ve been playing a lot (not necessarily new music, mind), Sharon van Etten (new), Blind Blake (very not new), Chris Connor (I was knocked out by her version of Ornette Coleman’s “Lonely Woman”, look out for her gig with Maynard Ferrguson too). I discovered Sabicas (flamenco legend) a while back, and that’s a pretty good soundtrack for coding, though it can be a bit of a rollercoaster. Too much to mention really but lots of the time I’m listening to things to learn on guitar. Lots of Nic Jones… it goes on.

Go give Mashificator a try!