Folksonomies Web 2.0

OPAC2.0 – New feature – more similar searches

Continuing on the additions from yesterday.

Today we added the top three search terms for each object to the USER KEYWORD section of an object page. This section is where folksonomy tags can be added and deleted.

Here is a 1960s raincoat for example.

Why did we add this to the USER KEYWORDS section?

What we are finding through our ongoing analysis of folksonomy tagging behaviour is that users are generally adding synonyms. What folksonomies are doing in this instance is effectively ‘crowd-sourcing’ synonym generation. Now when a user searches for term and selects an object from a list of results they are making an association between the object and that term. In many instances these may be tentative associations and generally full of false drops, but when aggregated patterns begin to emerge (see Chan 2006 forthcoming). Effectively on our site we are noticing that the search terms are beginning to offer the same kind of synonym behaviour that tags do.

We are presenting them together as a way of making explicit the terms that are already associated with an object (automatically). This way we hope to improve the diversity of user tags (why tag with a word/term that already appears?). This is just a trial though and if we see little or no change then we may move the search terms off to a separate section.

I’d welcome any comments or thoughts on this as we’re experimenting here.

Folksonomies Web 2.0

OPAC2.0 – New feature – similar searches

Take a look at the new ‘related searches’ feature on our collection search.

Now a search for ‘glass‘ will these ‘similar searches’ –

glassware vase bottles bowl bowls

This result will change over time. Hopefully we will implement a timescale simulator in our upcoming ‘experimental’ browsing section which will allow users to view the changing search language over time.

How does it work?

Because we keep a large store of relationships between search terms and clicked objects, we are able to reverse query terms such as ‘glass’ and see what terms have been used to find similar objects. We currently show only the top 5 terms – aggregated – which lessens the probability of ‘false drops’. False drops are most likely to occur for uncommon search terms although this will change over time, too.

Does this use folksonomy tags?

Yes. We allow user tags to be included in the search terms and from time to time certain objects will be most visited as a result of their user tag.

Folksonomies Web 2.0

OPAC2.0: A better search is here

Today we finally ironed out one of the major problems with the search engine on our OPAC2.0/collection database.

There are still some tweaks to be done on the results, and the advanced search needs to implemented but the new search is much better than the original.

If you have some spare moments and feel like trying a few searches, please do so. If something odd happens then I’d welcome your thoughts in a comment on this post.

Next up for OPAC2.0 is the presentation of ‘other search terms similar to X’ and ‘others who searched for X looked at’ alongside search results. We have already implemented this on Design Hub and have the code and data ready to go.

Then it is on to adding a automatic spell checker for the folksonomy tags to reduce post-user tag editing.

Folksonomies Imaging Web 2.0

Google Image Labeller

Everyone is talking about the new Google image labeller. Think the ESP Game but where your tags help Google deliver better image search results.

O’Reilly nails it in their description of it.

The launch of Google Image Labeler, a “game” that asks people to label images, and figures that images given the same label by multiple people are likely to be correct, continues the Web 2.0 trend towards bionic software, that is, software that combines machine and human intelligence. This is really just another version of the web 2.0 principle, harnessing collective intelligence, but with an emphasis on “harnessing” rather than on “collective.”

Like Distributed Proofreaders (the granddaddy in the space), Amazon’s Mechanical Turk, and mycroft, but unlike, say, a Flickr tag cloud as a reflection of collective labeling of images, Google Image Labeler puts people explicitly to work.

There’s a spectrum of ways to put humans to work refining computer results, from the implicit to the explicit. The most explicit, of course, is going to be when the third world job shops now engaged in making booty for World of Warcraft start offering their services for more general hire.

[UPDATE : O’Reilly continues their investigation looking at the roots of the Image Labeller in the ESP Game]

Folksonomies Web 2.0

Taxonomies of tagging

danah boyd, Cameron Marlow, Marc Davis and Mor Naaman, all of Yahoo, explore social tagging in detail in their paper presented to ACM/Hypertext06 in Denmark.

Of particular note are the sections on system design and user incentives which cover the differing types of systems and methods behind different implementations of tagging. They also suggest that considerably more research is required into the different ‘lects’ used in tagging and the phenomenon of ‘vocabularly overlap’ between random users’ tags in a short Flickr case study.

Essential reading.


In recent years, tagging systems have become increasingly popular. These systems enable users to add keywords (i.e., “tags”) to Internet resources (e.g., web pages, images, videos) without relying on a controlled vocabulary. Tagging systems have the potential to improve search, spam detection, reputation systems, and personal organization while introducing new modalities of social communication and opportunities for data mining. This potential is largely due to the social structure that underlies many of the current systems.

Despite the rapid expansion of applications that support tagging of resources, tagging systems are still not well studied or understood. In this paper, we provide a short description of the academic related work to date. We offer a model of tagging systems, specifically in the context of web-based systems, to help us illustrate the possible benefits of these tools. Since many such systems already exist, we provide a taxonomy of tagging systems to help inform their analysis and design, and thus enable researchers to frame and compare evidence for the sustainability of such systems. We also provide a simple taxonomy of incentives and contribution models to inform potential evaluative frameworks. While this work does not present comprehensive empirical results, we present a preliminary study of the photo- sharing and tagging system Flickr to demonstrate our model and explore some of the issues in one sample system. This analysis helps us outline and motivate possible future directions of research in tagging systems.

Folksonomies Web 2.0

OPAC2.0 Quick log charting of object popularity

A few weeks back I posted an initial chart showing distributions of object usage on our OPAC2.0.

Here’s a quick updated chart but done with logarhythmic scales on both axes.

MS Excel seems to only cope with 32,000 values on one axis so it cuts off artificially at 32,000 (out of 55,134 objects viewed of the total 61,780 currently available)

Some other useful data:
Total object views to date = 1,776,259
Max views for single object = 2104 (Delta Goodrem dress)
Average views per object = 28.751
Standard deviation = 40.819
Median views = 19

Popularity drops below 10 views at rank 39,313 (not shown on graph as a result of Excel limitations)

Already there is a clear line emerging which droops around the 20K rank point – which indicates that there is still some way to go with driving traffic down to the more obscure objects in the tail. The bump at the head is the result of objects that are receiving abnormally large amounts of traffic – the Delta Goodrem dress, the Nu-U bra (the one from 1957) – as a result of time-specific cultural factors.

Folksonomies Web 2.0

OPAC2.0 More on tag clouds

Lynda Kelly at the Australian Museum has relayed some reporting on tagging from a recent Web Usability seminar. (Lynda is part of an ARC project we are collaborating on.)

Roger Hudson took us through a brief history of classification and taxonomy(Linnaeus I think, Dewey, etc etc), making mention of an interesting Indian historical figure who had introduced the idea of classifying by “facets”. This idea was not widely taken up but is now highly relevant to the ways that tags are used. He also presented some *very* preliminary research with punters about tags – what they were and how they were being and could be used. The messages for me from his talk were (with apologies in advance to Roger as I am just outlining my impressions which could be wrong!):

1. Little understanding of the concept of tagging

2. Little understanding of why some words were larger in a tag cloud that others

3. A wide variety in the ways that people could potentially tag something. For example a picture of a redback spider was tagged as a spider (obviously); redback (also obviously); however other tags were Slim Dusty and dunny (think about it…) which i thought were pretty cool

4. The potential that as tag clouds make the “popular” tags the biggest, there could be “expert” tags that are lost (as in the above example where only 3 or so people used the word “arachnology” as a tag which is something that other experts may seach on)

When it comes to collections we are noticing some different trends emerging – mainly because tags on our site are combined with controlled vocabularies and are thus enhanced in this way, the end result for users is better/broader.

The stats we are accumulating are now showing a clear preference for tag as entry point, but interestingly enough, NOT necessarily tagged content as end point. Thus a user might click on the big tag MODEL TRAIN but then not view an actual OBJECT tagged as model train, but one of the results from a free text search for the term.

(I’ll be presenting some statistical evidence on these trends in future presentations and perhpas in a future post)

Unlike a lot of other sites that use tags we are not JUST using tags as a folksonomic classification system, we are also using them as search entry points. The use of tags as search entry points means that we are increasing the likelihood of users widening rather than narrowing their search.

Lynda has posted links to two excellent introductory pieces on folksonomies as well.

Folksonomies Interactive Media Web 2.0

OPAC2.0 Effects of tag clouds on search term usage

Rob Stein from Indianapolis Museum of Art asked me on the STEVE list –

Do you have a feel[ing] for how many people are actually entering the collection through the tag cloud you have on your page versus how many are using the category listings? I’ve often wondered if the nature of a tag cloud naturally bias’ big terms to get bigger, and smaller terms to disappear. Presenting a cloud like this side-by-side with the categorical hierarchy seems like an interesting comparison.

Since launch we’ve had nearly 1500 user classifications. Interestingly there seems to be no immediate pattern in the way in which objects are user classified and the rationale for classification is unsurprisingly very mixed (as is our collection). Most of the larger user classifications such as ‘bowling club‘ is the result of a single user classifying multiple objects in one go. (Bowling club were all tagged on the same day and none added since).

Whilst we don’t specifically track category listing use we do track tag cloud use. Here’s the figures for the last 7 days.

Date | Total successful searches | Subset of searches using tag cloud

13/08/2006 (11,665) (4,006)
12/08/2006 (12,165) (4,847)
11/08/2006 (13,613) (1,352)
10/08/2006 (5,572) (569)
09/08/2006 (6,782) (318)
08/08/2006 (4,530) (564)
07/08/2006 (9,605) (1,638)

At its lowest tag cloud searches represent 4.68% of searches, and its highest 39.84%. That is a pretty large difference but I have a feeling that the reason for the recent few days generating both more total searches and a higher percentage of tag cloud searches is that Google has again spidered the site and picks up the tag cloud words as keywords.

Because tag cloud words are user-generated there is a greater chance that they will be ‘more used’ than words from our official taxonomies. This means not only will they be more used on the site, but that they are probably also going to be words that are more often searched for in Google as well.

Now when a user clicks a tag cloud word they get TWO sets of search results. The first set of results is a simple tag search, the second is a general free text search for that keyword.

Rather than necessarily biasing the ‘tagged’ objects what we are actually observing is that a tag cloud click more frequently results in the viewing of an untagged object which appears in the later free text results. I’ll have to keep an eye on this and see if this trend continues as more objects are tagged.

As for the categorical hierarchies, we are seeing very little usage of them. The vast majority of users are using direct search terms or clicking the tag cloud, or, more often than not, getting to objects or search results via a Google search.

What has changed recently is that we have added ‘subject terms’. These are slightly looser taxonomic classifications which address particular ‘themes’. An example of this is the term ‘federation’ which is used to refer to object related to the period of Australian federation. These subject terms don’t describe the actual object but are related to its provenance and significance – and thus are particularly useful to high school teachers and students. A small portion of our total objects have subject terms attached currently and they tend to be those relatively recently acquired.

What I am noticing is a very marked appearance of subject terms in the search terms indicating that they are being used as navigation devices to discover ‘related’ objects. In the next week or two we will be making the subject terms much more prominent as it seems that they are perhaps more useful to the user than our broad object categories despite their limitations.

Folksonomies Interactive Media Web 2.0 Young people & museums

Who will own museum content?

Angelina Russo put me on to this interesting short think piece from The Art Newspaper Oct 2005.

Whatever solutions are preferred, the landscape looks like this: museums will ultimately embrace file-sharing, and overcome their fear of loss of authority. Curatorial scholarship will likely find its way near the top of the information pyramid, but is best served up in a more accessible format if it has the public at large in mind. Furthermore, the way forward will likely be with a combination of free content and licensable, high-resolution multimedia content, most economically built by consortia instead of by one museum at a time. The content will have to be updated, open to folksonomy protocols that encourage end users to contribute to databases, and that emphasize live features (real-time tours of shows and behind-the-scenes experiences) that people will pay a modest amount for. Museums will begin focusing on those things that younger audiences will be prepared to download for a micro-payment or subscription, alongside ample free offerings.

Have you tried the folksonomy tools on our recently release OPAC 2.0?

Folksonomies Social networking Web 2.0

More on prod-users, Wikipedia and the like

Excellent and wide-ranging perspectives and commentary at The Edge in response to Jaron Lanier’s essay of Digital Maoism.

Essential reading.

A short extract of the summary –

Projects like Wikipedia do not overthrow any elite at all, but merely replace one elite — in this case an academic one — with another: the interactive media elite.
— Douglas Rushkoff

Our new tool for communication and computation may take us away from distinct individualism, and towards something closer to the tender nuance of folk art or the animal energy of millenarianism.
— Quentin Hardy

Networked-based, distributed, social production, both individual and cooperative, offers a new system, alongside markets, firms, governments, and traditional non-profits, within which individuals can engage in information, knowledge, and cultural production. This new modality of production offers new challenges, and new opportunities. It is the polar opposite of Maoism.
— Yochai Benkler

The personal computer produced an incredible increase in the creative autonomy of the individual. The internet has made group forming ridiculously easy. Since social life involves a tension between individual freedom and group participation, the changes wrought by computers and networks are therefore in tension. To have a discussion about the plusses and minuses of various forms of group action, though, is going to require discussing the current tools and services as they exist, rather than discussing their caricatures or simply wishing that they would disappear.
— Clay Shirky

Wikipedia isn’t great because it’s like the Britannica. The Britannica is great at being authoritative, edited, expensive, and monolithic. Wikipedia is great at being free, brawling, universal, and instantaneous.
— Cory Doctorow