Load testing web applications for free

March 2014

I've spent a lot of time on previous projects artificially simulating load on software the team has built before it hits production, but this has always ended in a feeling that we've been kidding ourselves. A simulated system will give you simulated results.

More so, I've learnt it's very hard, time consuming, and potentially very expensive (when consultants get involved, or you outsource), to do anything beyond a short, superficial benchmarking exercise.

Building the next version of theguardian.com we took a different, cheaper, and ultimately more accurate approach.

Refresher

A typical load test looks like this.

A target system comprising of an application and some infrastructure is our candidate under test. We want to deliver a payload (or millions of payloads) to the target from a client (or millions of clients). Most importantly we want to instrument both the client and target so we can measure the performance of the thing.

In web applications, the clients are browsers, the payload an HTTP request (with a corresponding HTTP response), and the target system a web server, which is the front door to your application. Externally we might measure things like response times, failed requests etc. and internally things like latency, load average and so on.

Previous we've attempted to simulate all of this - a temporary staging target system, thousands of fake clients sending fake (and fairly uniform) payloads.

A better way is to deal directly with reality.

DDOS

The basic principal here is that most established organisations already have load (or could borrow some load from a friend) and can harness that load to transparently generate any amount of traffic at our system under test.

For example, The Guardian has 90m visitors a month generating millions of page views every day.

As we've been building the new service we've been siphoning off a gradually increasing percentage of the current Guardian website and transparently hitting the new system (our load test target).

Today, a visitor to the current version of theguardian.com will generate three requests to our new stack, and various bots (Eg, Facebook, Twitter etc.) have been routed to the new system for several months - well ahead of any full production launch.

In effect, with a tongue in cheek, we've used our own audience to issue a distributed denial-of-service against ourselves.

This gives us a huge degree of confidence that the system can not just take load but cope over a long period of time with the fluctuations in traffic a news site sees every week as well all sorts of real traffic our service will see once fully released in to production.

When dealing with reality for a long period of time we see slow HTTP connections, naughty proxies, odd user-agent headers, cache busting, crawlers/bots, weird cookies etc. and on the server over the days and weeks this is running we see outages, slow responses, memory leaks, CPU spikes etc.

Reproducing this in a simulated environment is hard.

It also gives our upstream and downstream dependent systems (think CDNs, APIs, alerting, monitoring and paging systems) a good idea of what to expect as the traffic grows. Again, we are talking about actual live production systems, not fake APIs etc.

XHR

How do we do this?

For a week in early January one in every hundred Guardian users who visited the music section were arbitrarily selected to become our first group of load test clients.

To do this we attached a simple function to every page load event on the current site to make an XHR request to the new site,

jQ(document).ready(function () {
  if ((  Math.random() * 100 < 1)
      && guardian.page.section === 'music')
     ) {
        // Nb. helpfully here, the URL structure is mirrored
        // between the old and new sites, so we can get away
        // with using 'pathname'
        jQ.get(location.pathname + '?view=mobile');
    }
})

And for several days after we gradually increased the scope of the load test to encompass a large amount of traffic.

Firstly we dropped the random pool so that the music section, a relatively small section of the site, on the old a new systems would see broadly the same amount of traffic,

if ( guardian.page.section === 'music' ) { ...

Next we expanded the number of sections day by day,

if ( /(music|travel|world...)/.test(guardian.page.section) )

And lastly added several addition requests that each page of the new system will make,

var edition = guardian.page.edition.toLowerCase()
  , n = Math.ceiling(Math.random() * 10)
  , section = location.pathname.split("/")[1]);
jQ.get(location.pathname + '?view=mobile');
jQ.get('/top-stories/trails.json?page-size='+n+'&_e='+edition);
jQ.get('/most-read/' + section + '.json?edition='+edition);

Within a week or so we were running a full production load and could poke around the logs and various other systems we have to understand how the stack is performing.

Extending this idea we could send POST requests, simulate sequences of requests, multiply the traffic at random periods, rig up a choas monkey to spam the service and so on.

Varnish

A typical website has many things that can be silently routed to a new service without the user ever realising - Eg, pings from social networks, bots & unrecognised user agents, RSS clients, archived/dormant content etc.

So the second part of the load test is to reproduce (or better, enhance) the services in the new environment and transparently begin to route certain clients to the new service over a period of time.

Varnish sat in front of your stack makes this quite easy to do (and undo). For example we can catch traffic from Facebook (who hit our site every time an article is shared to look for Open Graph data) by looking at the incoming user-agent and proxying to the new service. Eg,

sub vcl_recv {
  if ( req.http.user-agent ~ "(?i).*facebookexternalhit.* ) {
    // Nb. X-Gu-Device is used to direct traffic at the
    // new backend in a later VCL
    set req.http.X-Gu-Device = "beta";
  }

Unbeknown to anyone, except those who spot the occasional error we introduce, this process of gradual traffic migration has been happening since Autumn 2013, piece by piece the new system is introduced to real traffic behaving as real traffic does.

Why?

Load testing this way has saved the organisation a lot of time and money as well as given the team a greater degree of confidence that the system can cope with the weekly cycle of Guardian web traffic as well as understand how things react to spikes, humps, outages etc.

It's allowed us to gauge the amount of infrastructure we need to have in place. (Even in a on-demand world knowing the amount of hardware you need to buy - or reserve - the hardware upfront is a cheaper option).

And, as an aside, anyone planning any sort of large system migration, the idea of incrementally introducing your the new service in the back-end will drastically reduce any risk and nervousness you have around a big launch day.