Categories
Folksonomies Web 2.0

Synonymiser Beta – proof of concept

Synonymiser is an experimental micro-application that returns related words from search data relationships held in the Powerhouse Museum’s collection database. These ‘synonyms’ are dynamically generated from realtime user interaction with the collection database.

On the Synonymiser site you can enter any search word or phrase and it will return a list of ‘related’ words or phrases and a measure of relationship.

Of course, the results are not synonyms in the dictionary sense of the word, but instead show meaning relationships specific to the way in which users use our collection database.

The idea is that these word relationships can then be used to query other data sources. In this case we retreive images from Flickrâ„¢ to demonstrate the concept. It is possible to merge terms and/or offer alternative terms to improve results using this.

There is a proposal to make these synonym relationships available via an API to allow other museums to use and build upon our usage data to improve their own search tools.

Is this useful? Would you like to be involved or help with this?

What is ‘synonym promiscuity’?

‘Synonym promiscuity’ is our term for describing the uniqueness of a relationship of one word to another. If the value is low (less than 10) then the synonym has a very close relationship with the word entered. If the value is high then the synonym is related to many other words (high promiscuity). We are currently refining the mathematics behind the calculation of these values – but they current figures shoukld provide a means for comapring words.

2 replies on “Synonymiser Beta – proof of concept”

Seb,

can you say a bit more about how the synonymiser works? is it co-occurrence frequency that’s defining ‘relatedness’ or are you using other lexigraphical resources to establish synonomy?

we’re looking at term analysis next week in a steve research meeting; thanks for such great food for thought!

jt

Hi Jennifer

Synonymiser works by looking at a query for content then compares the query term against other query terms resulting in the same subset of content. We use the promiscuity measure as a way of ranking.

Obviously it is early days – both in terms of total query terms in our database, and in terms of refining the way this works.

But, it is interesting to see that already patterns are emerging around clusters of queries. In many ways we are finding that user queries are acting much in the same ways tagging does – especially on objects for which we have images – this is the same philosophy as the Google Image Tagger etc.

By M&W07 I will have a better dataset to work with and show and share.

Comments are closed.