Last year Seth van Hooland at the Free University Brussels (ULB) approached us to look at how people used and navigated our online collection.
A few days ago Seth and his colleague Ruben Verborgh from the University Ghent launched Free Your Metadata – a demonstrator site for showing how even irregular metadata can have valued to others and how, if it is released rather than clutched tightly onto (until that mythical day when it is ‘perfect’), it can be cleaned up and improved using new software tools.
What’s awesome is that Seth & Ruben used the Powerhouse’s downloadable collection datafile as the test data for the project.
Here’s Seth and his team talking about the project.
F&N: What made the Powerhouse collection attractive for use as a data source?
Number one, it’s available for everyone and therefore our experiment can be repeated by others. Otherwise, the records are very representative for the sector.
F&N: Was the data dump more useful than the Collection API we have available?
This was purely due to the way Google Refine works: on large amounts of data at once. But also, it enables other views on the data, e.g., to work in a column-based way (to make clusters). We’re currently also working on a second paper which will explain the disadvantages of APIs.
F&N: What sort of problems did you find with our collection?
Sometimes really broad categories. Other inconveniences could be solved in the cleaning step (small textual variations, different units of measurement). All issues are explained in detail in the paper (which will be published shortly). But on the whole, the quality is really good.
F&N: Why do you think museums (and other organisations) have such difficulties doing simple things like making their metadata available? Is there a confusion between metadata and ‘images’ maybe?
There is a lot of confusion about what the best way is to make metadata available. One of the goals of the Free Your Metadata initiative, is to put forward best practices to do this. Institutions such as libraries and museums have a tradition to only publish information which is 100% complete and correct, which is more or less impossible in the case of metadata.
F&N: What sorts of things can now be done with this cleaned up metadata?
We plan to clean up, reconcile, and link several other collections to the Linked Data Cloud. That way, collections are no longer islands, but become part of the interlinked Web. This enables applications that cross the boundaries of a single collection. For example: browse the collection of one museum and find related objects in others.
F&N: How do we get the cleaned up metadata back into our collection management system?
We can export the result back as TSV (like the original result) and e-mail it. Then, you can match the records with your collection management system using records IDs.
Go and explore Free Your Metadata and play with Google Refine on your own ‘messy data’.
If you’re more nerdy you probably want to watch their ‘cleanup’ screencast where they process the Powerhouse dataset with Google Refine.