Museums & the Web is very big this year. There must be nearly 1000 people here and there is a good buzz in between sessions.
Today opened with an entertaining and motivational opening plenary from Brewster Kahle, founder of the Internet Archive. Kahle talked about the Internet Archive disucssing the various types of media it is digitising and making openly accessible, for free, using open standards. The big stumbling block is rights.
Starting with books he gave some interesting figures on digitisation costs. The archive is scanning 12,000 books per month over three locations (USA, Canada and the UK). It costs about $0.10 per page to do scanning, OCR, PDFing, file transfer and permanent storage (forever). Distribution problems are being solved by print on demand which costs as little as $0.01 per page and is being rolled out through mobile digital book buses in Uganda, India and China with print on demand. Kahle handed around some samples of the print on demand titles and they were of acceptable quality and had proper covers. He also handed around one of the 300 prototype $100 laptops from MIT which was pretty cool with a great hi-res screen which makes the concept of a low-cost, developing-world-friendly e-book reader viable.
Audio recordings are costing $10 per CD or roughly $10 per hour of recording. Internet Archive will host forever, and for free. Video recordings are slightly more at $15 per hour. They have also been recording broadcast television, 20 channels worldwide, 24/7. Only one week is available online so far – that of 9/11. They have also started on software archiving but are stymied by the DMCA.
The Wayback Machine (web archive) is snapshotting every two months at 100 terrabytes of storage per snapshot. Interestingly he quoted the average webpage changes or is deleted every 100 days making regular archiving critical.
Kahle emphasised the importance of public institutions doing digitisation in open formats rather than the exclusivity of GoogleBooks deals. His catchall warning for museums was “public or perish” which is a great start to the conference.
4 replies on “M&W07 – Day two: Brewster Kahle”
Seb, thanks for this post. I’ve already alerted our whole institution to it on our internal blog. Our costs were slightly higher when we did our Official Histories – http://www.awm.gov.au/histories/index.asp I think the low costs he is talking about are only achievable with automatic scanning using sheet feeders. This means you need to have books that you can rip apart into individual pages. We had spare copies of the Official Histories for both WW1 & WW2, but we won’t be able to do it that way when we start digitising our really rare and still regularly used copies of the unit histories for the same wars. I remember discussing whether we needed to offer print on demand with Carmel McInerny here about five years ago. So far there hasn’t been a single request for it and the Official Histories have been widely used since they’ve been live on our site. Interesting.
I think the most interesting paragraphs are your last two re web archiving. Hopefully all the Pandora partners are reading this. Public or perish is a great catch-cry!
Actually Brewster showed a short video of the scanning process. It was done by handand was non destructive. The device consisted of two digital cameras mounted at right angles with the book in a 90 degree V in the middle. Cameras take a shot of left and right pages, then a human turns the page and so on.
The print-on-demand was remarkably good and they’ve been distributing thousands of public domain books and literature into the developing world using this p-o-d model.
Kahle also made the offer of offering free hosting and storage (transparently) to cultural institutions as long as the institutions were happy to give their hosted content away freely.
OK, yes, I should have realised. That is I think the Scribe system which is Linux based and has been developed with the Library of Congress. I should have realised. We are interested, but are still using flat-bed scanners here. Our previous attempts on documents with (affordable) overhead cameras did not yield the best results. Technology has marched on since then. Some bound material simply cannot be unbound, so we are keen to see this system at some stage as it sounds both open and affordable. Thanks again Seb,
Brewster showed the results of a couple of different processes — some of the scannig has been done with robotic scanners (Kirtas machines) and some with the Scribe developed by the Internet Archive in collaboration with a number of their partners. Both are camera based rather than flat bed (much kinder on the source material as the books sit in a cradle to be photographed and are not opened fully).
i’ve seen both in operation, and the through-put and post-processing are impressive.