Categories
Conceptual Social networking Web 2.0 Web metrics Young people & museums

Social production, cut and paste – what are kids doing with ‘your’ images?

It has been one of the worst kept secrets of web statistics – deep linked image traffic. While this has been going on for years, since the beginning of the WWW actually, it has increased enormously in the past few years. On some cultural sector sites such traffic can be very substantial – a quick test is to look at exactly how much of your traffic is ‘referred’ from MySpace. It is also one of the main reasons why Photobucket has traditionally reported traffic so much higher than Flickr is – its deep linking and cut and paste engagement with MySpace. With the move away from log file analysis to page tagging in web analytics, some, but not all of this deep linking traffic is fortunately being expunged from analytics reporting.

Two Powerhouse examples include a Chinese news/comment portal that deep linked a Mao suit image (from an educational resource on our site), sending us 51,000 visits in under 24 hours in August 2005, and an A-grade Singaporean blogger who deep linked an image of Golum (from our archived Lord of the Rings exhibition pages) to use to describe an ugly celebrity which generated over 180,000 visits over 8 days In January 2007. (In both of these examples the visits were removed from the figures reported to management and funders.)

What is going on here sociologically?

At the recent ICA2007 event in San Francisco danah boyd and Dan Perkel presented an interesting look at the subcultural behaviours that are, in part, producing this effect. Although they look specifically at MySpace there are threads that can be drawn across many social sites from forums to blogs. Drawing on the work of many cultural theorists, they argue that on MySpace what is going on is a form of ‘code remix’. That is, young people’s MySpace pages are essentially ‘remixes’ of other content – but unlike a more traditional remix in audio and video cultures, these code remixes occur through the simple cut and paste of HTML snippets. By ‘producing’ both their MySpace pages as well as their online cultural identity in this way, they are reshaping concepts of ‘writing’ and digital literacy. They are also, importantly, not in control of the content they are remixing – a deep linked image can easily be changed, replaced or removed by the originating site.

There are plenty of examples – boyd and Perkel give a few – where the content owner changes the linked image to disrupt the deep linker. In the case of our Singaporean blogger we renamed the linked image to prevent it from appearing on her site (and in our statistics).

Revealingly, Perkel’s research is showing that many MySpace users have little, if any, knowledge or interest in website production – that is CSS and HTML. Instead, what has formed is a technically simple but sociologically complex ‘cut and paste’ culture. This is what drives the ‘easy embedding’ features found on almost any content provider site like YouTube etc – it is in the content providers’ interest to allow as much re-use of their content (or the content they host) because it allows for the insertion of advertising and branding including persistent watermarking. Of course, the museum sector is not geared up for this – instead our content is being cut and pasted often without anyone outside the web team having a deep understanding of what is actually going on. There are usually two reactions – one is negative (“those kids are ‘stealing’ our content”) and the other overly positive (“those kids are using our content therefore they must be engaging with it”). Certainly Perkel and others research deeply probelmatises any notion that these activities are in large part about technical upskilling – they aren’t – instead those involved are learning and mastering new communication skills, and emerging ways of networked life.

One approach that some in the sector have advocated is the widget approach – create museum content widgets for embedding – to make repurposing of content (and code snippets) easier. There have been recent calls for museum Facebook apps for example. But I’m not sure that this is going to be successful because a great deal of embeds are of the LOLcats variety – perhaps trivial, superficial, but highly viral and jammed full of flexible and changing semiotic meaning. Whereas our content tends to be the opposite – deep, complex and relatively fixed.

Categories
Digitisation Web 2.0

How to do low cost transcription of hand written and difficult documents

So your museum has already done the easy part of digitisation – taking digital photos of your objects, but now you have a complex hand-written materials you need to digitise . . . what can you do?

This is a question that has popped up in several meetings over recent months.

Our Curator of Information Technology, Matthew Connell, came up with a brilliantly simple solution – and there is no need for the original material to leave your organisation.

With the low cost of MP3 recorders it is very to now record large amount of audio into a single file, already compressed. Take one of these MP3 recorders and ask the expert who is familiar with the document or material requiring digitisation to read the document clearly into the recorder. This may be done over an extended period of time – there is no need to do it all in one go.

When completed, upload the MP3 of clearly spoken audio to a web server. Then use one of several online audio transcription services to transcribe the audio. We have been using such services to get quick, low cost transcriptions of public lectures and and podcasts, and have been impressed with their timeliness and accuracy.

Even factoring in the cost of reading time, this will almost certainly be cheaper and more error free than scanning and transcribing directly from the written original. It also provides significantly more flexibility in terms of pricing as there is a high level of competitiveness amongst audio transcription services at the moment – a level of competition that may not exist amongst specialist written services.

Categories
Conceptual Digitisation

Filtering memory – SEO, newspaper archives, museum collections

When Bad News Follows You in the New York Times (via Nick Carr) is a fascinating article about what can happen when ‘everything’ is put online.

The article looks at the new array of problems that have come about as a by-product of the NYT optimising their site and archives for Google with SEO techniques. Suddenly stories that were either of minor significance, or were in later editions, corrected, are appearing toward the top of Google searches for names, places and events.

Most people who complain want the articles removed from the archive.

Until recently, The Times’s response has always been the same: There’s nothing we can do. Removing anything from the historical record would be, in the words of Craig Whitney, the assistant managing editor in charge of maintaining Times standards, “like airbrushing Trotsky out of the Kremlin picture.”

Whitney and other editors say they recognize that because the Internet has opened to the world material once available only from microfilm or musty clippings in the newspaper’s library, they have a new obligation to minimize harm.

But what can they do? The choices all seem fraught with pitfalls. You can’t accept someone’s word that an old article was wrong. What if that person who was charged with abusing a child really was guilty? Re-report every story challenged by someone? Impossible, said Jonathan Landman, the deputy managing editor in charge of the newsroom’s online operation: there’d be time for nothing else.

(snip)

Viktor Mayer-Schönberger, an associate professor of public policy at Harvard’s John F. Kennedy School of Government, has a different answer to the problem: He thinks newspapers, including The Times, should program their archives to “forget” some information, just as humans do. Through the ages, humans have generally remembered the important stuff and forgotten the trivial, he said. The computer age has turned that upside down. Now, everything lasts forever, whether it is insignificant or important, ancient or recent, complete or overtaken by events.

Following Mayer-Schönberger’s logic, The Times could program some items, like news briefs, which generate a surprising number of the complaints, to expire, at least for wide public access, in a relatively short time. Articles of larger significance could be assigned longer lives, or last forever.

Mayer-Schönberger said his proposal is no different from what The Times used to do when it culled its clipping files of old items that no longer seemed useful. But what if something was thrown away that later turned out to be important? Meyer Berger, a legendary Times reporter, complained in the 1940s that files of Victorian-era murder cases had been tossed.

“That’s a risk you run,” Mayer-Schönberger said. “But we’ve dealt with that risk for eons.”

There are interesting parallels with our experience in making our online collection more usable and accessible. Public enquiries have skyrocketed and now range from the scholarly to the trivial – the greatest increase being in the latter category. Whilst there is a significant amount of extremely valuable piece of object related information sent in by members of the public, there are false leads and material that cannot be adequately verified, and more still that the Museum already knows but has not yet made available online. Managing public expectations and internal workflow is a difficult balancing act and a continuing challenge that many museums that not only put their collections online, but also make them highly accessible, are facing.

Categories
Imaging

Content aware image resizing from Siggraph 07

A common bugbear encountered when working with diverse collections and images is the inability to gracefully created resized versions. We have never found a suitable solution to creating thumbnails of our collection for the OPAC and Design Hub – the current solution is to take the existing large image, resize it to be 500 pixels on the longest side, then take a square from the middle 400 pixels, and resize the square to 80×80 pixels, including any white space borders. This is run as a batch process. Whilst this works for most rectangular images it still has the unintended side effect of lopping off heads and feet, and on rare irregular shapes such as very long artworks, the thumbnail is virtually useless even for quick object recognition tasks.

Here are some examples –

But here, in a presentation from Siggraph 07, is a fascinating potential solution. It is quite amazing and by reducing or expanding an image based on ‘content’ has very interesting implications for intellectual property legislation. In many ways it does what MP3 compression does (poorly) for audio, intelligently remove the bits of the image that are least recognised by the viewer. In so doing it makes assumptions about the overall image – and how we ‘see’ images.

Categories
Interactive Media Web 2.0

The new Google Maps, Google Earth and Google Sky

Everyone is buzzing about the new features that have popped up with the easily embeddable GoogleMaps today. This is a big step towards making map mashups completely mainstream – increasing the popular acceptance of the map as a user interface.

For a look at how things might work for the museum and cultural sector take a look at this query. Scroll to the bottom and you will see a map showing all the places mentioned in the book, together with pop up page references! There’s obviously been a lot of parsing of OCRed text to pull out the place names but the result is pretty incredible.

Something a few have missed is the astronomy features now available in Google Earth called Google Sky.

Download the new version of Google Earth and you will find a new toolbar icon that toggles between Earth and Sky. Once in Sky mode you can find galaxies, constellations and planets – all of which link to data from NASA and other sources including Hubble telescope pictures. It is very impressive and lots of fun.

Next task is to look into making KML files to accompany our monthly night sky guide podcasts at the Sydney Observatory . . .

Categories
Museum blogging Web 2.0

Blogs as a ‘community strategy’

New Matilda has an short but interesting piece by Kevin Anderson, blogs editor at The Guardian. In the article he stresses that blogging is about generating and engaging the community, not just a new means of publishing. Rather than see blogging as a threat to traditional publishing, it should be viewed as a new strategy for engaging audiences and readers.

This has strong resonances with experiences of museum blogging. Blogs aren’t replacing traditional forms of official communication, but they are engaging audiences in new and effective ways.

Neil McIntosh and Jack Schofield launched The Guardian’s first blog in 2001, realising it was better to be part of the conversation than listen to it from a lofty perch. The Guardian now has blogs covering everything from currents affairs — on ‘Comment is Free’ — to sport, arts and culture, and most recently food and gardening.

But blogging is not a publishing strategy, it’s a community strategy. Being one of the world’s bloggiest newspapers has led to bloggers linking to our stories, helping us grow a grass-roots following in the United States, so that The Guardian now has more online visitors outside of the UK than inside.

One of The Guardian’s stated goals is to become the world’s leading liberal voice. And our website’s ‘Head of Communities and User Experience,’ Meg Pickard, has said that we also need to enable the world’s liberal voices.

The art of blogging is about building a community and coaxing people out from behind their keyboards.

Categories
Web 2.0 Wikis

Wikipedia, Wikiscanner, revealing the hidden power struggles over knowledge production

Last week featured a rather robust debate in the office about whether museums should encourage the use of Wikipedia, and, perhaps participate in adding and editing entries themselves. Now most Fresh + New readers will be familiar with the arguments – they’ve been around since Wikipedia began.

Of course what most anti-Wikipedians, if they don’t dismiss it outright, claim is that ‘Wikipedia is only as good as its last edit’. But to me that is missing the point. Wikis, and Wikipedia as an example of a wiki, are interesting because they reveal the history of edits, changes, revisions and re-versions. They reveal the collaborative and argumentative nature of knowledge production.

Well, almost as if to prove my point, along comes Virgil Griffith’s Wikiscanner which has gotten coverage in Wired and is struggling under the burden of the resultant high traffic load.

Wikiscanner basically matches the IP addresses of those doing edits with information about their network provider – known IP address ranges of government departments, corporations and the like. By doing this Wikiscanner is beginning to reveal the complex web of individuals, and increasingly, corporations that are using Wikipedia to argue and dispute versions of the ‘truth’. You can start to get an idea of the otherwise hidden agendas and power struggles over knowledge and information quite quickly . . . .

Griffith says he launched the project hoping to find scandals, particularly at obvious targets such as companies like Halliburton. But there’s a more practical goal, too: By exposing the anonymous edits that companies such as drugs and big pharmaceutical companies make in entries that affect their businesses, it could help experts check up on the changes and make sure they’re accurate, he says.

Categories
Conceptual Web 2.0 Web metrics

Valuing different audiences differently – usability, threshold fear and audience segmentation

It is important to realise that to deliver more effective websites we need to move away from a one-size-fits-all approach not only when designing sites but also when evaluating and measuring their success. We know that some online projects are specifically intended to target specialist audiences – a site telling the histories of recent migrants might require translation tools, and a site aimed at teenagers might, by design, specifically discourage older and younger audiences in order to better attract teenage usage.

Remembering, too, that some key museum audiences (regional, remote, socially disadvantaged) may have no online representation in online visit figures, and others may have limited and sporadic online interactions, because of unequal internet access, it is important to look at the overall picture of museum service delivery. Some audiences cannot be effectively engaged online. Others still may only feel confident engaging in online conversations about the museum using non-museum services – as I’ve written before – on their own blogs, websites, and social media sites.

If we acknowledge ‘threshold fear’ in our physical institutions, then we need to realise this applies online as well. The difference being that in the online world there are many many more less ‘fearful’ options to which potential visitors and users can easily flee. The ‘back’ button is just a click away.

The measure of the ‘value’ of visitors therefore need to differ across parts of the same website. We may need to form different measures for a user in the ‘visiting the museum’ part of the website to the ‘tell us your story’ section, even though in one visit they might explore both areas. Likewise, a museum visitor who blogs about their positive experience of a real world visit on their own family blog might be considered. Or a regionally-oriented microsite that gets discussed on a specialist forum might be more valuable – to that particular project – than a posting on a more diffused national discussion list.

Visit-oriented parts of the the website should be designed and created with known target audiences in mind, understanding that not everyone can visit the museum, and their success measured accordingly. It might be sensible to attempt to address ‘threshold fear’ by using images of the museum that are more people-oriented rather than object-oriented in order to promote the notion that the museum is explicitly a place for people.

When we were building our children’s website we specifically decided against creating a resource for ‘all’ children – that would have resulted in a too generic site – and targeted the pre- and post- visit needs of a known subset of visitors with children. We don’t actively exclude other visitors (other than through language choice, visual design, and bandwidth requirements), but we have actively attempted to better meet the needs of a subset of visitors. This subset will necessarily diversify over time, but we also understand that out on the internet there are plenty of other options for children.

The problem with traditional measurements are that every visitor to our online resources is homogenised into single figures – visits, time spent, pages viewed. Not only does this reduce the value of the web analytics, it does the visitor a great disservice. Instead, good analytics is about segmentation. This can be segmentation based on task completion and conversions, and understanding visit intentions.

So who is a ‘valuable’ visitor?

It depends on context.

For our children’s site we place a greater internal value on those who complete one of two main site conversions – spending a particular amount of time on the visit information areas; and second, those who browse, find, and most critically, download an offsite activity. Focussing in on these subsets of users allows us to implement evaluation and tracking. For those who complete the visit-related tasks we might offer discount coupons for visiting and track virtual to real-world conversions. What proportion of online visitors who look at visit information actually convert their online interest to a real world action? And in what time frame (today, this week, this month?). Of the second group we may conduct evaluation of downloader satisfaction – did they make they craft activity they downloaded? Was it too hard, too easy? Did they enjoy the experience?

What of the others who visit the children’s site? They are a potential audience who have shown an interest but for many reasons haven’t ‘converted’ their online visit. We can segment this group by geography and origin – drill down deeper and really begin to examine the potential for them to ever ‘convert’.

Other parts of our website – say our SoundHouse VectorLab pages – we may see as valuable users who simply use and linkback to our ‘tip of the day’ resources. Despite being primarily an advertisement for onsite courses run in the teaching labs, we do see a great value in having our ‘tip of the day’ resources widely read, the RSS feed subscribed to, and articles linked back to. However this has to be a secondary objective to actually taking online bookings for courses.

Postscript – I’d also suggest reading the 2004 Demos report ‘Capturing Cultural Value’ for some important philosophical and practical caveats.

Categories
Collection databases Web 2.0

OPAC2.0 – Latest features update

We’ve added a whole range of new features to our OPAC that we think further enhance its usability.

Tooltips

Each ‘feature’ on the search results and object view pages now has an explanatory tooltip. Given the OPAC has become quite complex and there is a lot going on on the screen now, we felt CSS tooltips offered a more practical solution than a ‘help’ screen or more text in the form of user documentation. More tooltips will be added this week to explain museum-centric language like ‘statement of significance’.

Failed search suggestions

Now when a search term is misspelled or return no result our system generates a series of possible ‘alternatives’. This is generated on the fly using a calculation called Levenshtein distance. This cycles through each letter of the misspelt word and then queries our table of successful searches for possible matches. These are then ranked and the top 8 variants are presented to the user. In order to make this reasonably quick we have had to rebuild quite a bit of our search technology.

Opensearch RSS with thumbnails

About two months ago our Opensearch feed was updated to include thumbnails in search results. We added the thumbnails to ensure that our feed delivered optimal results to the National Library of Australia’s Libraries Australia search. We also use this modified RSS to drive search results of Design Hub.

Categories
Conceptual Interactive Media Museum blogging Web metrics

Authority in social media – Why We Twitter: Understanding Microblogging Usage and Communities

From Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng comes an interesting academic paper titled Why We Twitter: Understanding Microblogging Usage and Communities.

Following my recent post looking at diffused brand identity in social media, this paper is a useful examination of the emergent ‘authority’ and ‘connectedness’ of users amongst a dataset of 75,000 users and 1.3 million ‘posts’.

Twitter is something that I’ve seen limited potential for in most museum applications so far, but increasingly Twitter-style communciation is replacing email – see the frequent updates that your friends do on Facebook’s ‘what I am doing/feeling now’ mood monitor for example.

Abstract:

Microblogging is a new form of communication in which users can describe their current status in short posts distributed by instant messages, mobile phones, email or the Web. Twitter, a popular microblogging tool has seen a lot of growth since it launched in October, 2006. In this paper, we present our observations of the microblogging phenomena by studying the topological and geographical properties of Twitter’s social network. We find that people use microblogging to talk about their daily activities and to seek or share information. Finally, we analyze the user intentions associated at a community level and show how users with similar intentions connect with each other.