Moving out in to the cloud – reverse proxying our website

For a fair while we’ve been thinking about how we can improve our web hosting. At the Powerhouse we host everything in-house and our IT team does a great job of keeping things up and running. However as traffic to our websites has grown exponentially along with an explosion in the volume of data we make available, scalability has become a huge issue.

So when I came back from Museums and the Web in April I dropped Rob Stein, Charles Moad and Edward Bachta’s paper on how the Indianapolis Museum of Art was using Amazon Web Services (AWS) to run Art Babble on Dan, our IT manager’s desk.

A few months ago a new staff member started in IT – Chris Bell. Chris had a background in running commercial web hosting services and his knowledge and skills in the area have been invaluable. In a few short months our hosting set up has been overhauled. With a move to virtualisation inside the Museum as a whole, Chris started working with one of our developers, Luke, thinking about how we might try AWS ourselves.

Today we started our trial of AWS beginning with the content in the Hedda Morrison microsite. Now when you visit that site all the image content, including the zoomable images, are served from AWS.

We’re keeping an eye on how that goes and then will switch over the entirety of our OPAC.

I asked Chris to explain how it works and what is going on – the solution he has implemented is elegantly simple.

Q: How have you changed our web-hosting arrangements so that we make use of Amazon Web Services?

We haven’t changed anything actually. The priorities in this case were to reduce load on our existing infrastructure and improve performance without re-inventing our current model. That’s why we decided on a system that would achieve our goals of outsourcing the hosting of a massive number of files (several million) without ever actually having to upload them to a third-party service. We went with Amazon Web Services (AWS) because it offers an exciting opportunity to deliver content from a growing number of geographical points that will suit our users. [Our web traffic over the last three months has been split 47% Oceania, 24% North America, 21% Europe]

Our current web servers deliver a massive volume and diversity of content. By identifying areas where we could out-source this content delivery to external servers we both reduce demand on our equipment – increasing performance – and reduce load on our connection.

The Museum does not currently have a connection intended for high-end hosting applications (despite the demand we receive), so moving content out of the network promises to not only deliver better performance for our website users but also for other applications within our corporate network.

Q: Reverse-proxy? Can you explain that for the non-technical? What problem does it solve?

We went with Squid, which is a cache server. Squid is basically a proxy server, usually used to cache inbound Internet traffic and spy on your employees or customers – but also optimise traffic-flow. For instance, if one user within your network accesses a web page from a popular web site, it’s retained for the next user so that it needn’t be downloaded again. That’s called caching – it saves traffic and improves performance.

Squid is a proven, open-source and robust platform, which in this case allows us to do this in reverse – a reverse-proxy. When users access specified content on our web site, if a copy already exists in the cache it is downloaded from Amazon Web Services instead of from our own network, which has limited bandwith that is more appropriately allocated to internal applications such as security monitoring, WAN applications and – naturally – in-house YouTube users (you know who you are!).

Q: What parts of AWS are you using?

At this stage we’re using a combination. S3 (Simple Storage) is where we store our virtual machine images – that’s the back-end stuff, where we build virtual machines and create AMIs (Amazon Machine Images) to fire up the virtual server that does the hard work. We’re using EC2 (Elastic Cloud Compute) to load these virtual machines into running processes that implement the solution.

Within EC2 we also use Elastic IPs to forward services to our virtual machines, which in the first instance are web servers and our proxy server, but also allows us to enforce security protocols and implement management tools for assessing the performance of our cache server, such as SNMP monitoring. We also use EBS (Elastic Block Store) to create virtual hard drives which maintain the cache, can be backed up to S3 and can be re-attached to a running instance should we ever need to re-configure the virtual machine. All critical data, including logs, are maintained on EBS.

We’re also about to implement a solution for another project called About NSW where we will be outsourcing high bandwidth static content (roughly 17GB of digitised archives in PDFs) to Amazon CloudFront.

Q: If an image is updated on the Powerhouse site how does AWS know to also update?

It happens transparently, and that’s the beauty of the design of the solution.

We have several million files that we’re trying to distribute and are virtually unmanageable in a normal Windows environment. trying to push this content to the cloud would be a nightmare. By using the reverse proxy method we effectively pick and choose – and thereby pull the most popular content and it automatically gets copied to the cloud for re-use.

Amazon have recently announced an import/export service, which would effectively allow us to send them a physical hard-drive of content to upload to a storage unit that they call a “bucket”. However, this is still not a viable solution for us because it’s not available in Australia and our content keeps getting added to – every day. By using a reverse proxy we effectively ensure that the first time that content is accessed it becomes rapidly available to any future users. And we can still do our work locally.

Q: How scalable is this solution? Can we apply it to the whole site?

I think it would be undesirable to apply it to dynamic content in particular, so no – things such as blogs which get changed frequently or search results which are always going to be slightly different depending on the changes that are effected to the underlying databases at the back end. In any case, once the entire site is fed via a virtual machine in another country you’ll actually experience a reduction in performance.

The solution we’ve implemented is aimed at re-distributing traffic in order to improve performance. It is an experiment, and the measurement techniques that we’ve implemented will gauge its effectiveness over the next few months. We’re trying to improve performance and save money, and we can only measure that through statistics, lies and invoices.

We’ll report back shortly once we know how it goes, but go on – take a look at the site we’ve got running in the cloud. Can you notice the difference?


Dan Collins on our move to virtualisation

Our IT manager, Dan Collins, is in the Australian broadsheet today talking about our move over the last year to virtualisation of our servers.

“We have got a much-reduced infrastructure spend in terms of the replacement cycle of hardware,” Mr Collins said. “When you look at what it saved us having to replace over three years, I would say that is about $200,000 worth of equipment.”

There were additional savings on labour costs for maintaining the equipment, along with reduced service calls, he said.

“We have gone down now to three host servers, a massive change from 35, and that has obviously had effects on power and cooling in our server room. It is much quieter than before.”

The museum has cut its technology power costs by 33 per cent . . .

Web metrics

How much is your website worth?

I’ve noticed that I’ve been tweeting a lot of links rather than blogging them as I used to. And from time to time there are some links that need to be blogged to get to those who miss the tweets or don’t follow.

Here’s one from the Web & Information Team at Lincolnshire County Council in the UK titled ‘Let’s Turn Off The Web‘.

In order to try to calculate how much the local council website is worth, they turn the question around and ask how much it would cost to provide the same services and level of interaction with citizens if they didn’t have a website.

I like this way of thinking as it provides a way of demonstrating the value of your online services to those who see them only as a ‘cost’. (Your organisation probably already thinks in this way when it is trying to calculate the value of a marketing and PR but web units rarely do.)

So discussions of cost per user, as in a recent Freedom of Information Request to many councils, missed the point. It’s not just about cost per user. It’s about value to the user and savings to the council.

If we turned off our web services:

177,000 visitors per month (May 2009 figures) to our web site would find no web site.

If only 10% of these visitors were to contact us by phone – say 17,000 – then we would incur an extra cost of approx £51,000 per month.* Based on Socitm’s costs of phone contact

Obviously a whole lot of things couldn’t be done at all, but I was particularly drawn to these figures quoted by Lincolnshire Council from work by SOCITM called Channel Value Benchmarking:

*The costs of customer contact are…

Face to face £6.56.
Phone £3.22
Web 27p.
(These figures provided by Socitm 2008.)

Suddenly your web unit is looking pretty good value for money.

Conceptual open content

Some clarifications on our experience with ‘free’ content

Over on the Gov2 blog a comment was posted that asked for more information about our experience at the Powerhouse with ‘giving away content’ for free.

I’d be interested to know more about your experience with Flickr and your resulting sales increase. Are these print sales or licensing sales? And are they sales, through your in-house service, of the identical images you have on Flickr, or are you using a set of images on Flickr as a ‘teaser’ to a premium set of images you hold in reserve? How open is this open access? I am trying to understand the mindset of users in an open access environment who will migrate from ‘free use’ to ‘pay-for use’ for identical content, as this makes no sense, either commercially or psychologically, unless there is additional service or other value-add.

Whilst I communicated privately a lengthy response I think some of it is valuable to post here to clarify and build upon the initial findings published by my colleague Paula Bray earlier this year.

Here’s what I wrote. Some of this will be familiar to regular readers, some of it is new.

(Please also bear in mind that I am focussing here on predominantly economic/cost-related issues. Regular readers will know that our involvement in the Commons on Flickr has been largely driven by community and mission-related reasons – don’t take this post as a rebuff of those primary aims)

First a couple of things that are crucial for understanding the nuances of our situation (and how it differs, say from that of other institutions, galleries, museums)

  1. The Powerhouse is, more or less, a science museum in its ‘style’ (although not by our collection). Our exhibitions have traditionally, since our re-launch/re-naming in 1988, been heavy on ‘interactivity’ (in an 80s kind of way), and ‘hands on’. We aren’t known for our photographic or image collections and we haven’t done pure photographic exhibitions (at least for the last 15 years).
  2. Consequently we have a small income target for image sales. This target doesn’t even attempt to cover the salaries of the two staff in our Photo Library.
  3. In 2007/8 around 72% of our income was from State government funding.
  4. The Powerhouse has an entry charge for adults, and children aged 4 and over. In 2007/8 this made up 65% of the remainder of our income. Museum membership (which entitles free entry) added a further 8%.

(You can find these figures in our annual reports)

So what have we found by releasing images into the Commons on Flickr?

Firstly we’ve been able to connect with the community that inhabits Flickr to help us better document and locate the images that we have put there. This has revealed to us a huge amount about the images in our collection – especially as these images weren’t particularly well documented in the first place. This has incurred a resource cost to us of course in terms of sifting responses and then fact checking by curatorial staff. But this resource cost is outweighed by the value of the information we are getting back from the community.

Secondly we’ve been able to reach much wider audiences and better deliver on our mission. The first 4 weeks of these images being in Flickr eclipsed an entire year’s worth of views coming from the same images on our own website. Our images were already readily Google-able and were also available through Picture Australia which is the National Library of Australia’s federated image and picture search.

(I’ve written about this quite a bit on the blog previously.)

Thirdly, we’ve found that as very few people knew we had these images in the first place, we’ve been able to grow the size of the market for them whilst simultaneously reducing the costs of supplying images.

How has this ‘reduced the costs’?

What Flickr has done is reduce the internal cost of delivering these images to “low economic/high mission value” clients such as teachers, school kids and private citizens. Rather than come through us to request ‘permission’ these clients can now directly download a 1024px version for use in their school projects or private work. The reduction in staff time and resource as a result of this is not to be underestimated, nor is the increased ease o use for clients.

At the same time, Flickr’s reach has opened up new “high economic/low mission value” client groups. Here I am talking about commercial publishers, broadcasters, and businesses. Commercial publishers and publishers want a specific resolution, crop or format and we can now charge for the supply in these formats. At the same time we are finding that we are now getting orders and requests from businesses that had never considered us as a source of such material. We are actively expanding our capacity to deliver art prints to meet the growing needs of businesses as a result.

It is about relationships and mission!

At the same time, we can now build other relationships with those clients – rather than seeing them only in the context of image sales. This might be through physical visitation, corporate venue hire, membership, or donations.

Likewise, we know that the exposure of our public domain images is leading to significant offers of other photographic collections to the Museum alongside other commercial opportunities around digitisation and preservation services. Notably we have also been trying to collapse and flatten the organisation so that business units and silos aren’t in negative competition internally – so we can actually see a 360 degree view of a visitor/patron/consumer/citizen.

Conferences and event reports

Upcoming talks, workshops and presentations

I’ve got a bunch of sector talks, workshops and presentations coming up over the next few months. I’ll be talking about some brand new (and right now, top secret) projects that focus on ‘linked data’, maps and the ‘Papernet’, as well as delving deeper into metrics, ‘value’ and digital strategy.

So you just missed me at Glam-Wiki at the Australia War Memorial in Canberra but I’ll be giving a whole day long seminar titled Social Collections, New Metrics, Maps and Other Australian Oddities at San Francisco Museum of Modern Art on August 27. I’m really excited to be catching up with everyone on the West Coast and exchanging new ideas and strategies. This is a free seminar presented by the Wallace Foundation, The San Francisco Foundation, Grants for the Arts/The San Francisco Hotel Tax Fund, Theatre Bay Area and Dancers’ Group. It is also part of the National Arts Marketing Project (NAMP), a program of Americans for the Arts that is sponsored nationally by American Express.

Then I’ll be giving a presentation focussing on ‘The Social Collection’ at Raise Your Voice: the Fourth National Public Galleries Summit, September 9- 11 in Townsville. I’m looking forward to hearing Virginia Tandy from Manchester City Council and the NGV’s Lisa Sassella is running a masterclass on audience segmentation and psychographics which looks fascinating. And of course, artist Craig Walsh is speaking as well and, well, we’ve been working on a little something.

There’s even rumours that there might be a reprise of my UK workshops of last September run jointly by Culture24 and Collections Trust sometime in early November – but right now, UK readers, that is still just a rumour.

After that I’m at the New Zealand National Digital Forum in Wellington, NZ on November 23-24 where I’m presenting and facilitating sessions around locative cultural projects. I’m excited about NDF because it is always full of inspirational Kiwi initiatives and a couple of well chosen international speakers – this year the inimitable Nina Simon, and Daniel Incandela from the Indianapolis Museum of Art. No doubt there’ll be some exciting new initiative from Digital NZ announced at NDF – just because they can.

Young people & museums

Odditoreum visitor-written-labels now on Flickr

Thanks to encouragement from Shelley Bernstein at the Brooklyn, Paula Bray has started uploading photos of some of the ‘visitor-generated labels‘ from our Odditoreum mini-exhibition.

The ‘write-your-own-labels’ continue to be a roaring success.

More on the Odditoreum here and on the basic info page.

Conferences and event reports open content Wikis

Some thoughts: post #GLAM-WIKI 2009


Photography by Paula Bray
License: Creative Commons Attribution-Noncommercial-No Derivative Works 2.0

(Post by Paula Bray)

Seb and I have just spent two days at a conference, in the nation’s rather chilly capital that involved a bunch of Wikimedians (wonder what that would be called) and members from the GLAM (Galleries, Libraries and Museum sector) sector. This event was touted as a two-way dialogue to see how the two sectors could work more closely together for “the achievement of better online public access to cultural heritage”.

So what do we do post conference?

GLAM-WIKI was a really interesting conference to be a part of even if some of us were questioning ‘why’ are we here. Some of the tweets on Twitter said that there is a need for some concise decisions instead of summary. I am not sure at this stage if there are complete answers and concise decisions will need to be made by us, the GLAMs.

Jennifer Riggs, Chief Program Officer at the Wikimedia Foundation summed it up quite well and asked the question “what is one thing you will do when you leave this conference?” I think this is exactly the type of action that can lead to bigger change. Perhaps it is a presentation to other staff members in your organisation, a review of your licensing polices and business models, a suggestion of better access to your content in your KPI’s or start a page on Wikimedia about what you do and have in your collections.

One of the disturbing things for me came from Delia Browne, National Copyright Director at the Ministerial Council for Employment, Education, Training and Youth Affairs. Browne highlighted the rising costs the education sector is paying to copy assets including content from our own institutions. Delia stated that there is a 720% increase in statutory licensing costs and the more content that goes online the more this cost will increase. Now the GLAM sector can help here by rethinking its licensing options and look towards a Creative Commons license for content they may own the rights to, including things like teachers’ notes. Teachers can do so much with our content but they need to know what they can use. She raised the question “What sort of relationships do we want with the education sector”? The education sector will be producing more and more content for itself and this will enter into our sector. We don’t want to be competing but rather complimenting each other. Schools make up 60% of CAL’s (Copyright Agency Limited) revenue. What will this figure be when the Connected Classrooms initiative is well and truly operational in the “digital deluge” a term mentione by Senator Kate Lundy.

Lundy gave the keynote presentation titled Finding Common Ground. She brought up many important issues in her presentation including the rather awkward one around access to material that is already in the public domain. Lundy:

“These assets are already in the public domain, so concepts of ‘protection’ that inhibit or limit access are inappropriate. In fact, the motivation of Australia’s treasure house institutions is or should be, to allow their collections to be shared and experienced by as many people as possible .”

Sharing, in turn, leads to education, research and innovation. This is something that we have experienced with our images in the Commons on Flickr and we only have 1200 images in our photostream.

The highlight for me was the question she says we should be addressing “why are we digitising in the first place?”.

This is a really important statement and should be asked at the beginning of every digitisation project. The public needs fast access to content that it trusts and our models are not going to be able to cope with the need for fast dissemination of our digital content in the future if we don’t make it accessible. It costs so much to digitise our collections – so surely we need to ask this question first and foremost. Preservation is not enough anymore. There are too many hoops to go through to get content and we are not fast enough. “The digital doors must be opened” and this is clearly demonstrated with the great initiative Australian Newspapers Digitisation Program presented by Rose Holley of the National Library of Australia.

However as Lundy said during the panel discussion following her presentation was “goodwill will have to bust out all over”. There is a lot of middle ground that the GLAM sector needs to address in relation to policy around its access initiatives and digital strategies and, yes, I think policy does matter. If we can get this right then the doors can be opened and the staff in organisations can work towards the KPI’s, missions and aims of unlocking our content and making it publicly available.

Perhaps your one thing, post GLAM-WIKI conference, could be to comment on the Government 2.0 Taskforce Issues Paper and ensure that all the talk of Government 2.0 clearly includes reference to the Government-funded GLAM sector.