Anthony Grafton on digitisation in the New Yorker

Over in the New Yorker is an excellent article on digitisation, the various book scanning projects, and a historical look at the urge to record and catalogue everything written by historian Anthony Grafton.

Here are some pull quotes of specific note –

Collection databases Digitisation Geotagging & mapping

Brantley on digital collections and the location-awareness OPAC

Peter Brantley over at O’Reilly has put together a short post on his vision of the future of collections – specifically those held by university libraries – which should have resonance with those in collecting museums.

Collection databases Digitisation Folksonomies Web 2.0

OPAC2.0 – latest tag statistics and trends for simple comparison with Steve project

Another paper from the Steve researchers has gone online and is generating interesting discussions. It elaborates on the content of an earlier summary podcast. To be presented at ICHIM07 the paper describes some of the emerging patterns in tagging behaviour in the different interface trials.


OCLC/RLG on access and digitisation

Late in August OCLC held a special event called ‘Digitization matters: breaking through the barriers, scaling up digitization of special collections‘ in Chicago. The audio of the event is now available on the OCLC site and is important listening for museums trying to come to terms with mass digitisation and the new access demands of digital users/customers.

Amongst a slew of excellent well thought out short talks, Michael Jenkins from the Met reads Susan Chun’s provocative paper in her absence. It is a great way to start things off. Susan emphasises the importance of keeping pace with users and their expectations, and not just scholarly users. As she points out, users will neither wait for us nor will they necessarily need to wait for us as in the digital realm borders are extremely porous. She argues that audiences require quantity over exacting quality and that this is now what really matters. This requires new organisational structures, internal capacity building and looking beyond project-based funding models. She uses several important examples from her time at the Met especially the Artstor/scholars license project and the lessons learned from it.

Download the lot to your media player.

Conceptual Digitisation

Subscription museum content? Some implications of the NYT announcement for museums

The New York Times announced last week that it was stopping charging for archival and subscription content on its website. As its self-report explains, the NYT has realised that selling and managing subscription service to archival content now is not going to be as profitable as selling advertising on this content and making access to it easy and free.

This turnaround has come at the hands of Google and the power of search – search which is now driving less ‘serious’ readers to their content who are unwilling to buy a subscription. For the NYT they expect to get better returns on material previously only available through subscriptions via onsite advertising and the ability for readers to engage in conversations around the content. Conversations mean exposure, exposure means advertising. Blogger Jason Kottke (amongst many others) has already been digging through the archives exposing some of the more interesting material.

There has been an explosion of discussion across the web but the Future of the Book pulls together three commentaries to suggest that

Whatever the media business models of tomorrow may be, they will almost certainly not revolve around owning content.

and drawing on Jeff Jarvis’s 2005 proclamations, explains that the future lies in integrating your content into your audience’s conversations.

But in this new age, you don’t want to own the content or the pipe that delivers it. You want to participate in what people want to do on their own. You don’t want to extract value. You want to add value. You don’t want to build walls or fences or gardens to keep people from doing what they want to do without you. You want to enable them to do it. You want to join in.

So what might this mean for museums?

Whilst those in collecting institutions often see themselves as the sole holders of particular objects, it is becoming more acceptable to acknowledge that as institutions there is still a lot to learn about these objects and that that knowledge may lie elsewhere in the community. Certainly injecting museum content into, and encouraging audience conversations then is not as controversial as it might have been ten years ago. Exclusivity might work with our physical spaces but not online.

At the same time increasing commercial pressures are asking museums to find new revenue streams – image sales, licensing, syndication, partnerships. Already the V&A and the Met have moved to ‘no-fee’ image licensing for small run academic publishing after discovering that the internal cost of charging for these operations outweighed their commercial returns. From the WIPO’s Guide to Managing Intellectual Property for Museums (pt 6.6) –

Recent developments in business models concerning the production and distribution of content on the Internet, coupled with a continued examination by museums of their missions and mandates has led to an awareness that the making available of museum images is merely a means to a commercial end, and not the end, itself. Indeed, in a recent press release, the Victoria and Albert Museum announced that it would no longer charge fees for academic and scholarly reproduction and distribution for its images, claiming that while it earned approximately 250,000 a year from scholarly licensing programs, the overhead costs associated with licensing fees rendered their profits much less. What is not reported, but suspected, is that the Victoria and Albert Museum determined that it was wise business practice to allow its copyright-protected images to be made available for free, thereby increasing their circulation and delivering significant promotional opportunities back to the museum.

As the WIPO Guide suggests, there is some potential for museums in the online space in the brand and promotional opportunities in the short term, and then flowing from these in the medium-long term, the partnerships and commercial content syndication options that are expected to flow from a greater awareness (and discussion) of content.

What the NYT announcement does, along with the increased commercial activity around digitising state-held records, especially those relating to the profitable family history space, is create significant competition for museum content online.

Digitisation Web 2.0

How to do low cost transcription of hand written and difficult documents

So your museum has already done the easy part of digitisation – taking digital photos of your objects, but now you have a complex hand-written materials you need to digitise . . . what can you do?

This is a question that has popped up in several meetings over recent months.

Our Curator of Information Technology, Matthew Connell, came up with a brilliantly simple solution – and there is no need for the original material to leave your organisation.

With the low cost of MP3 recorders it is very to now record large amount of audio into a single file, already compressed. Take one of these MP3 recorders and ask the expert who is familiar with the document or material requiring digitisation to read the document clearly into the recorder. This may be done over an extended period of time – there is no need to do it all in one go.

When completed, upload the MP3 of clearly spoken audio to a web server. Then use one of several online audio transcription services to transcribe the audio. We have been using such services to get quick, low cost transcriptions of public lectures and and podcasts, and have been impressed with their timeliness and accuracy.

Even factoring in the cost of reading time, this will almost certainly be cheaper and more error free than scanning and transcribing directly from the written original. It also provides significantly more flexibility in terms of pricing as there is a high level of competitiveness amongst audio transcription services at the moment – a level of competition that may not exist amongst specialist written services.

Conceptual Digitisation

Filtering memory – SEO, newspaper archives, museum collections

When Bad News Follows You in the New York Times (via Nick Carr) is a fascinating article about what can happen when ‘everything’ is put online.

The article looks at the new array of problems that have come about as a by-product of the NYT optimising their site and archives for Google with SEO techniques. Suddenly stories that were either of minor significance, or were in later editions, corrected, are appearing toward the top of Google searches for names, places and events.

Most people who complain want the articles removed from the archive.

Until recently, The Times’s response has always been the same: There’s nothing we can do. Removing anything from the historical record would be, in the words of Craig Whitney, the assistant managing editor in charge of maintaining Times standards, “like airbrushing Trotsky out of the Kremlin picture.”

Whitney and other editors say they recognize that because the Internet has opened to the world material once available only from microfilm or musty clippings in the newspaper’s library, they have a new obligation to minimize harm.

But what can they do? The choices all seem fraught with pitfalls. You can’t accept someone’s word that an old article was wrong. What if that person who was charged with abusing a child really was guilty? Re-report every story challenged by someone? Impossible, said Jonathan Landman, the deputy managing editor in charge of the newsroom’s online operation: there’d be time for nothing else.


Viktor Mayer-Schönberger, an associate professor of public policy at Harvard’s John F. Kennedy School of Government, has a different answer to the problem: He thinks newspapers, including The Times, should program their archives to “forget” some information, just as humans do. Through the ages, humans have generally remembered the important stuff and forgotten the trivial, he said. The computer age has turned that upside down. Now, everything lasts forever, whether it is insignificant or important, ancient or recent, complete or overtaken by events.

Following Mayer-Schönberger’s logic, The Times could program some items, like news briefs, which generate a surprising number of the complaints, to expire, at least for wide public access, in a relatively short time. Articles of larger significance could be assigned longer lives, or last forever.

Mayer-Schönberger said his proposal is no different from what The Times used to do when it culled its clipping files of old items that no longer seemed useful. But what if something was thrown away that later turned out to be important? Meyer Berger, a legendary Times reporter, complained in the 1940s that files of Victorian-era murder cases had been tossed.

“That’s a risk you run,” Mayer-Schönberger said. “But we’ve dealt with that risk for eons.”

There are interesting parallels with our experience in making our online collection more usable and accessible. Public enquiries have skyrocketed and now range from the scholarly to the trivial – the greatest increase being in the latter category. Whilst there is a significant amount of extremely valuable piece of object related information sent in by members of the public, there are false leads and material that cannot be adequately verified, and more still that the Museum already knows but has not yet made available online. Managing public expectations and internal workflow is a difficult balancing act and a continuing challenge that many museums that not only put their collections online, but also make them highly accessible, are facing.

Digitisation Imaging Interactive Media

Open Library demo launches

Internet Archive/Open Content Allinace has launched a public demo of its forthcoming Open Library project. Having heard Brewster Kahle speak about the OCA at Museums and the Web 2007, it is fantastic to be finally able to get some hands-on time with the work he was talking about.

Open Library is a very exciting project because it offers an open alternative to Google Books’ proprietary and retail/consumer solutions.

The search is impressive and the ability for users (both community and commercial) to improve the metadata of each record – adding reviews, publisher information etc – is exciting. The ability to locate the book in retail outlets (Amazon etc) as well as in your local library is nice too.

The page turning interface works quite well and is less flashy/dramatic than some of the others I have seen around. However, as Ben Vershbow at The Future of the Book writes,

But nice as this looks, functionality is sacrificed for the sake of fetishism. Sticky tabs are certainly a cool feature, but not when they’re at the expense of a straightforward list of search returns showing keywords in their sentence context. These sorts of references to the feel and functionality of the paper book are no doubt comforting to readers stepping tentatively into the digital library, but there’s something that feels disjointed about reading this way: that this is a representation of a book but not a book itself. It is a book avatar. I’ve never understood the appeal of those Second Life libraries where you must guide your virtual self to a virtual shelf, take hold of the virtual book, and then open it up on a virtual table. This strikes me as a failure of imagination, not to mention tedious. Each action is in a sense done twice: you operate a browser within which you operate a book; you move the hand that moves the hand that moves the page. Is this perhaps one too many layers of mediation to actually be able to process the book’s contents? Don’t get me wrong, the Book Viewer and everything the Open Library is doing is a laudable start (cause for celebration in fact), but in the long run we need interfaces that deal with texts as native digital objects while respecting the originals.

And, look – here’s a book from the Powerhouse Museum’s former incarnation – the Sydney Technological Museum! (hat tip to Paul for finding this!)

Copyright/OCL Digitisation

Amazon and rare books on demand

A very interesting new development in the digitisation space as reported in The Chronicle (via Siva Vaidhyanathan).

Amazon, which made its name selling books online, is now entering the book-digitizing business.

Like Google and, more recently, Microsoft, Amazon will be making hundreds of thousands of digital copies of books available online through a deal with university libraries and a technology company.

But, unlike Google and Microsoft, Amazon will not limit people to reading the books online. Thanks to print-on-demand technology, readers will be able to buy hard copies of out-of-print books and have them shipped to their homes.

And Amazon will sell only books that are in the public domain or that libraries own the copyrights to, avoiding legal issues that have worried many librarians — and that have prompted publishers to sue Google for copyright infringement.

Whilst I agree with Siva’s argument that this is “a massive privatization of public treasures”, at the same time this activity of effectively republishing, in physical form (via on-demand), can potentially bring older books, especially those that do not already have a large re-print value, to a much larger audience beyond just scholars and researchers.

The privatisation process began long ago with economic rationalist politics and the scaling back of the public sector and public institutions. This has left us in this situation where in some countries only the private sector has the resources and capital to make grand idealistic projects like this a reality – something that used to be the preserve of visionary government (although the reality was often different).

Depending upon the quality of the print on-demand I can also see this opening up a whole new genre of coffee table ‘cultural capital’ enhancing books . . . .

Collection databases Digitisation Imaging Interactive Media Metadata Web 2.0

Hyperlinking collectively shared images – Seadragon/Photosynth

There’s been a lot of discussion on the web about Microsoft’s Photosynth but this demonstration from TED really reveals the real possibilities. The image navigation opportunities offered by Seadragon are quite amazing but as Blaise Aguera y Arcas points out in the short demonstration, what a collective Photosynth experience offers is the ability for one user/contributor’s content to benefit from the metadata associated with everyone else’s content that is visually related (around the 6:10-6:30 mark).

If the cultural sector contributed images, or made use of this sort of application our very rich contextual metadata could be added to the common pool allowing for holiday snaps to be explored with deep connections to cultural collections and other people’s snapshots. And, again as Blaise Aguera y Arcas makes clear, the other side effect is the ability to generate rich virtual reconstruction works as well.

The BBC has already been exploring these possibilities.