Collection databases Web 2.0 Web metrics

OPAC2.2 – New look and new features on our collection database

Today we have also made live a slightly enhanced version of our collection database search.

You will notice a few useful cosmetic usability tweaks such as the tabbed navigation bar at the top which allows you to quickly get to the tag cloud and the top level of category browsing. We did this to make it easier for users who landed on an object to be able to get access to the tags and categories, as well as the search. We have also removed the tag cloud and category browser from the front page and prettied it up with a few selected objects which can act as entry points.

Under the hood we have done some optimising of the ‘related searches’ and also improved the ability for searching for foreign characters which we noticed weren’t previously searchable. We have also added a stack of new images (with still many more to come) and quite a few new acquisitions.

My paper (Tagging and Searching – Serendipity and museum collection databases) for Museums & the Web 2007 which gives a background to the OPAC project and presents some preliminary results from our ever growing datastore is now online.

Those who will be present for the paper in San Francisco will get an updated set of statistics as well as quite a bit of material that couldn’t be fitted into the written version.

Collection databases Web metrics

10 millionth object!

Early this morning one lucky internet user became the person who viewed the 10 millionth object on our online collection database.

More than 10 million objects have been viewed online in the last 295 days and we are currently averaging 50,000 objects each day.

Collection databases Interactive Media MW2007 Web 2.0

Does your audience want Web 2.0? Lessons from SFMOMA

When ploughing through the M&W2007 papers (more are still going up), pay particular attention to Do You Know Who Your Users Are? The Role Of Research In Redesigning by Dana Mitroff and Katrina Alcorn from SFMOMA looking at the evaluation and redesign process behind their forthcoming new SFMOMA website.

Of particular pertinence to discussions about implementing, encouraging, (and sometimes requiring) user interaction comes this caveat/warning –

Example 3. Web 2.0

The finding: When we talked with our users about potential Web 2.0 features we could offer on our site (blogs, wikis, etc.), they showed surprisingly little interest in them. The users we interviewed were fairly passive about the types of interactive things they would like to do on our site. Instead of asking an artist a question, they would rather read what other people asked. Instead of giving feedback about an exhibition, they would rather read what other people wrote.

The insight: We realized that if we were going to add any of these new types of Web 2.0 features, we should not invest in designing things that our visitors would not use. And if we were to incorporate any of these features in the future, they should extend the interpretation dimension and make the artwork more accessible.

The design: In addition to providing an authoritative museum perspective on an artwork, we must include features that incorporate perspectives from a variety of users, from front-line staff to visitors. On the “On View” main page, for example, we plan to include a feature called something along the lines of “Guest Take” that will present rotating works from SFMOMA’s collection selected by prominent local community members, artists, writers, museum members, etc. These guests will write about what the works mean to them and share their personal reactions, thoughts, and musings. Another feature, called something like “In Focus,” will allow museum staff members at different levels throughout the organization to select works from the collection and share their personal thoughts and reactions. This informal, multi-vocal approach will bring Web 2.0 values to the site and complement what we are already doing with SFMOMA Artcasts, our podcast audio-zine. SFMOMA Artcasts feature “Guest Take” commissions of music, poetry, and prose in response to works on view as well as “Vox Pop” pieces that capture live reflections from visitors in the galleries. We see these as methods of engaging the community in a dialogue of art and ideas; they are excellent ways to bring Web 2.0 values to the interpretative dimension of the museum experience.

Nina Simon picks up on the importance (and dominance) of lurkers in commercial 2.0 applications and reconsiders in the context of museum.

We would concur.

Of the most “2.0” aspects of the Powerhouse Museum’s collection database – the tagging – it is important to note that out of nearly 10 million object views there have been only about 4000 tags. That’s 0.04% of views resulting in a tag – at most. Some views result in multiple tagging of the same object by the same person.

However, because lurkers can gain benefit from other people’s tags (frictionlessly/effortlessly) tags represent up to 40% of search interactions – they add usability and thus access points to content.

Collection databases Metadata

Linden on ‘end of federated search?’ and Google

Greg Linden speculates that Google is pulling back from the notion of federated search. (via O’Reilly)

Google instead prefers a “surfacing” approach which, put simply, is making a local copy of the deep web on Google’s cluster.

Not only does this provide Google the performance and scalability necessary to use the data in their web search, but also it allows them to easily compare the data with other data sources and transform the data (e.g. to eliminate inconsistencie and duplicates, determine the reliability of a data source, simplify the schema or remap the data to an alternative schema, reindex the data to support faster queries for their application, etc.).

Google’s move away from federated search is particularly intriguing given that Udi Manber, former CEO of A9, is now at Google and leading Google’s search team. A9, started and built by Udi with substantial funding from, was a federated web search engine. It supported queries out to multiple search engines using the OpenSearch API format they invented and promoted. A9 had not yet solved the hard problems with federated search — they made no effort to route queries to the most relevant data sources or do any sophisticated merging of results — but A9 was a real attempt to do large scale federated web search.

If Google is abandoning federated search, it may also have implications for APIs and mashups in general. After all, many of the reasons given by the Google authors for preferring copying the data over accessing it in real-time apply to all APIs, not just OpenSearch APIs and search forms. The lack of uptime and performance guarantees, in particular, are serious problems for any large scale effort to build a real application on top of APIs.

Google has put its energies into Google Co-Op which allows users to create their own sub-Google search engines using the Google database as the datasource. This has the effect of encouraging traditionally deep web databases like museum collection databases to become spiderable, indexed and cached by Google. For individual end users this makes sense – they probably already go to Google first, but does it make sense for content providers?

Try this example.

Here is a search for ‘heater’ using the Powerhouse’s own collection search.

Top five –

B1431 Solar heater, plus base, wood/metal, Lawrence Hargrave, Australia, [1870-1915]
K693 Immersion water heater, electric, made in Australia, late 1930s (OF).
93/176/15 Light globe, heater lamp, glass/metal, British Thompson Houston, England, 1920
93/176/16 Light globe, heater lamp, glass/metal, Osram, England, 1950
85/69 Brochure, Instruction and Operating Chart for Emmco Fryside heater

Here is the same search for ‘heater’ using a Google Coop search of the same data within the same collection (using a Coop search I created).

Top five –

86/676 Gas heater – Malley’s No. 1, copper, Metters, Australia …
97/331/1 Convection heater, domestic, portable gas, metal/paint …
H7061 Water heater, “The Schwer”, constructed of copper & can be …
B1538 Water heater model, steam, “Friar”, [Australia or UK]; A A …
95/117/1 Kerosene water heater and instruction sheet, Challenger …

So which is more accurate?

Google’s Coop bases it results on a number of different factors, all of which are unknown to the searcher, and most of which are unknown to the content provider. At least with our internal search we can tweak the ordering and relevance of results using our own known variables.

Collection databases Digitisation Web 2.0 Young people & museums

Demspey on ‘getting with the flow’, Morville on ‘findability’

OCLC’s Lorcan Dempsey’s idea of libraries “getting with the flow” (from 2005) is something that has resonated well beyond the library world.

The importance of flow underlines recurrent themes:

– the library needs to be in the user environment and not expect the user to find their way to the library environment

– integration of library resources should not be seen as an end in itself but as a means to better integration with the user environment, with workflow.

Increasingly, the user environment will be organized around various workflows. In fact, in a growing number of cases, a workflow application may be the consumer of library services.

For libraries, as evidenced also in the discussions by Holly Witchey at Musematic who has been covering the Webwise IMLS conference with regular session reports, and Guenter Waibel from RLG’s follow-up commentary, libraries are at a far more pointy end of changes in customer/user behaviour than most museums. Waibel raises the very hefty 290 page OCLC report titled Perceptions in which the survey suggests 84% of general users begin an information search with a search engine, and only 1% with a library website (PDF page 35/1-17). If conducted again now I would expect Wikipedia to rate highly.

Libraries are seen as more trustworthy/credible and as providing more accurate information than search engines. Search engines are seen as more reliable, cost-effective, easy to use, convenient and fast. (PDF page 70/2-18)

Where are museums in this? Is your content in the “flow”? Do users need to come to your site to your onsite search to be able to find it? If so, they are probably going to look elsewhere first, if they haven’t already.

Over at the University of Minnesota they have just held the CLC Library Conference titled “Getting In The Flow” with Dempsey as one of the speakers. There are some great summaries of the presentations including slides over in their conference blog.

Other than Dempsey one of their speakers was Peter Morville who some readers may remember from his first O’Reilly book Information Architecture for the World Wide Web, or the less technically oriented
Ambient Findability (which has been doing the rounds of the office for the past 9 months).

Morville’s presentation slides are an excellent introduction to his work and given their tweaking for the library/information-seeking context are very useful for those in museums too. Ellysa Cahoy has some notes taken during the presentation at the CLC blog as well for the slides that aren’t immediately self-explanatory.

Collection databases Web 2.0

OPAC2.0 – popular collection categories

In preparation for my presentation at Museums & the Web we have been busy generating a new set of user statistics from our collection database. (Which is also why the frequency of new posts has dropped!)

Objects in the collection, when ‘fully catalogued’ are assigned an object category and an object name from the Powerhouse Museum thesaurus which was first published in 1995 (ISBN 186317060X). This thesaurus is used by other museums as well.

Here are updated top 20 ‘popularity’ tables, the first by object, and the second by category. The top 20 categories indicates the broad collecting areas which receive most interest online.

Top 20 most viewed objects since launch (June 2006)

1 – (17087 views) 2005/1/1 Evening dress, beaded pink chiffon trimmed with charms, designed by Lisa Ho and made in the …
2 – (8875) 94/129/1 Evening dress, womens, `Chocolate box’, plastic / fabric, designed by Jenny Bannister for C …
3 – (7029) 95/23/1 Dress, evening, silk / polyester, designed by Jenny Bannister, Melbourne, Victoria, Australi …
4 – (6636) B1495 Aircraft, flying boat, Catalina, PB2B-2, “Frigate Bird II”, VH-ASA, metal / fabric, made by Bo …
5 – (6133) 88/4 Steam locomotive, No. 3830, iron/steel/brass, New South Wales Government Railways, Eveleigh Rai …
6 – (5504) 97/208/1 Shoes, pair, womens, ‘Super elevated gillies’, leather/ cork/ silk, Autumn/ Winter collecti …
7 – (4672) 88/5 Locomotive, full size, steam, No.1243, metal / glass, made by Davy and Company, Atlas Engineeri …
8 – (4534) 90/816 Aircraft, full-size, helicopter, Bell 206B Jetranger III, “Dick Smith Australian Explorer”, V …
9 – (4384) 2005/127/1 Clothing (9), boys, cotton / wool / metal / mother-of-pearl / plastic / paper / cardboard …
10 – (4297) 98/54/1 Bicycle, Olympic ‘Superbike’, carbon fibre / metal, Australian Institute of Sport / Royal Me …
11 – (3936) 92/405 Mantel clock, Sessions Clock Co, USA, 1905-1915 …
12 – (3557) 86/1015 Room Divider, “Carlton”, wood / plastic laminate, designed by Ettore Sottsass, made by Memph …
13 – (3481) 2006/68/1 Three piece suit, men’s, corduroy cotton, made by David Jones Ltd, Sydney, New South Wales …
14 – (3336) 2003/83/1 Chair, ‘Wiggle’, cardboard, designed by Frank Gehry, United States, 1972, made by Vitra, G …
15 – (3227) 85/1975 Armchair, `Globe’, fibreglass / aluminium / fabric / synthetic materials, designed by Eero A …
16 – (3204) 99/4/46 Model steam engine and box, donkey engine, metal / cardboard, Scorpion Superior Model / Mode …
17 – (3001) 92/305 Food safe (bush pantry), wood/ metal, unknown maker, [Queensland], Australia, c. 1925 …
18 – (2879) 7949 Locomotive, steam, No. 1, metal, hauled the first passenger train in New South Wales in 1855, m …
19 – (2798) 96/386/2 Evening dress, womens, silk, Madeleine Vionnet, Paris, France c. 1930 …
20 – (2772) L611 Aircraft, full size, Bleriot XI monoplane, wood / canvas / wire, designed by Louis Bleriot, mad …

Top 20 most popular categories* since launch (June 2006)

1 – clothing and dress (1419335 viewed objects)
2 – ceramics (1104058)
3 – numismatics (584429)
4 – pictorials (466852)
5 – textiles (394591)
6 – domestic equipment-home (320764)
7 – decorative metalwork (313593)
8 – toys (277752)
9 – arms and armour (245407)
10 – documents (235438)
11 – health and medical equipment (224492)
12 – glass (223899)
13 – jewellery (222371)
14 – models (220519)
15 – transport-land (217331)
16 – personal effects (202413)
17 – photographs (177066)
18 – musical instruments (156136)
19 – furniture (154648)
20 – juvenilia (143011)

*note – some objects belong to multiple categories

Collection databases

Powerhouse Museum’s Castle Hill stores open and online

We have just launched the website (and the physical site too) for our new Powerhouse Discovery Centre located in Castle Hill.

The PDC at Castle Hill dramatically increases the proportion of objects available for the public to look at (up from the museum world standard 5% ot around 40%!). In the warehouse spaces visitors can browse drawers and storage racks, and then go to a series of internet-connected kiosks to look up more information using a slightly enhanced version of our collection database.

In building the Castle Hill site we have added several thousand new high quality images to the collection database. Have a look either using the Castle Hill OPAC or the Powerhouse Museum’s main OPAC. Both use the same data source, the main difference being that the Castle Hill version has a early iteration of a a forthcoming visual browser we are building (which is highly dependent on colour images!).

Collection databases Web 2.0

Google co-op search experiments

Jim at Ideum encouraged me to have a play with Google’s Co-op Search.

In about 5 minutes I set up the start of a global museum collection search.

Give it a go – either by using the box below or visiting its own page museum collection search.

Then contribute your own museum collection URLs to it by following the instructions on the search page.

Obviously this only works for collections that have been well spidered by Google already. It won’t pick up those that aren’t – I tried adding the Victoria & Albert Museum’s image search without much success, for example. Others that are well spidered like ours and the Met Museum work very well.

Google Co-op is really a way of refining the results of standard Google by focussing its results on an aggregated selection of URLs – think of it as a way of performing multiple advanced searches at the same time.

Collection databases Interactive Media Web 2.0

Kathy Sierra on serendipity

I’ve just spent the last while finishing off my papers for Museums & The Web 2007. One of them on the OPAC2.0 collection database talks alot about the idea of ‘serendiptity’ and its importance in creating new ways for users to not only navigate but to find and create meaning in a database.

Kathy Sierra has a nice post introducing the very idea and calls for more randomness to be added to products, software and experiences.

Collection databases Web 2.0 Web metrics

Lorcan Dempsey on ‘intentional data’

Lorcan Dempsey opens the new year with a great post with lots of outward linkages on the under-utilisation of intentional data by libraries.

In general, consumer sites on the web make major use of such data, and it is especially valuable when they can connect it to individial identities. They use it to build up user profiles, to do rating and comparisons across sites, to recommend, and so on. Of course this is increasingly important in an environment of abundant choice and scarce attention: they are investing more effort in ‘consumption management’. We are all familiar with the benefits, and the irritations, of organizations who want to build a deeper understanding of what we do and make us offers based on that.

Libraries have a lot of data about users and usage. And there are now some initiatives which are looking at sharing it. However, in general, libraries do not have a data-driven understanding of individual users’ behaviors, or of systemwide performance of particular information resources. This is likely to change in coming years given the value of such data. So, we are seeing the growth in interest in sharing database usage data. And technical agreements and business incentives for third party providers will support this development. And, of course, libraries want to preserve the privacy of learning and research choices.

Whilst libraries are in a fundamentally better position to know more about the intentions of their users, museums tend to restrict their interest to the very visitation/donation-oriented CRM model of intention tracking.

As Dempsey points out, such data actually has much broader implications for organisations, and he summarises Chunku Mui’s proposed taxonomy of ‘Emergent Knowledge’ – knowledge that is gained about users by analysing behaviour gathered from log data and user pattern analysis.

At the Powerhouse Museum we have only very recently, with our OPAC2.0 project, started to move beyond simple log file analysis for intentional data from our website users, and now into beginning to examine the emergent trends in collection popularity. I hope that by the time Museums & The Web 2007 comes around in April, we will have the first of our open APIs to connect and use data patterns from our Synonymiser Beta.

This will allow any museum with a similar collection (or subset) to mine our anonymous behaviour data to generate recommendation data for their own collections.