Banking on effective load testing
CEO Deri Jones discusses what to consider when load testing online banking systems
The recent publication of an investigation into the disruption experienced by TSB customers during the bank’s 2018 migration to a new IT platform is a cautionary tale for anyone involved in online financial services provision.
While the benefits of implementing load testing on eCommerce platforms are well documented, online bank and financial services applications present their own load testing challenges for web and service managers. In many ways, the goals are the same: to ensure the platform is robust enough to deliver a satisfactory user experience (UX) for users, although some would argue that the stakes are higher. Trust is the biggest potential bone of contention for online financial services providers; if customers experience repeated examples of poor availability, they could begin to question the integrity of the broader service offering.
In common with other platforms, the more complex the site, the heavier the IT load and the greater the potential for performance problems. When users open apps, this usually kicks off a series of server requests to update an account’s balance and display recent transactions, for instance. Each request loads the servers that are processing the request and returning information to the app. If tens of thousands of customers are using the app simultaneously, the servers have to be equal to the job.
The level of peak traffic a bank or other financial services portal would be expected to handle should, in theory, be lower that the big retailers who have to consider seasonal spikes – or even the perma-peaks that are a result of customer-driven demand. But sector-specific issues can create problems that simply wouldn’t affect eCommerce performance.
The problem of ultra-complex customer journeys
OK, so applying for a mortgage online is many times more complex than buying a new duvet. There will be many, many questions to answer which require varying levels of detail and are underpinned by a mesh of micro-services that are called on at various stages.
One recent load test we conducted for a banking client showed a mortgage application error rate of 15 percent – way above acceptable failure levels. The root cause was traced back to a too-short time-out queue setting in the code which was hidden in one micro-service and obscured under another. The time-out setting failed under peak load, returning null responses to the higher micro-service and resulting in queue items being dropped, despite showing no signs of overload.
It’s worth noting that micro-service architectures create increased demands between the application and the service which can fall under the radar.
The issue of not planning far enough in advance
In this instance, management concerns over the potential impact that the error rate in mortgage applications would have on the UX almost resulted in the launch being delayed. No-one wants to launch a product with a problem that will affect a guaranteed percentage of customers, but no-one wants to slip the launch date, either.
We can’t overstate the importance of building in time for at least two rounds of testing so that tech teams have the opportunity to identify and rectify problems before an application goes live. We caution against viewing the second round of capacity testing as ‘optional’, as, in our experience, it’s almost always needed.
The report into the TSB incident states that the platform wasn’t ready because ‘it hadn’t been sufficiently tested or proved’, concluding that the bank ‘went live with a platform that was neither stable nor complete’. This arose partly because the bank had already chosen a specific time slot for the launch of the platform without a proper assessment of the work that would need to be completed before the platform was ready to go live. It’s never a good idea to add unnecessary time constraints to an already pressured process.
Matching the testing environment to the production environment
In the banking sector, it can be tricky to create test environments that precisely mirror their real-world counterparts, primarily because of data and security issues. And yet, unless the test environment replicates the production setting, errors will ensue.
In the case of our recent financial services load test project, the test environment was almost a perfect copy of the production site, except two services had been set up to run on two shared server pools, whereas in reality there would have been two separate server pools, each running a single service. A slip-up like this can result in wasted time and effort, so it pays to be diligent when setting up the test environment.
Why you should definitely keep it real
There really is little point in load testing a system if you can’t stand up before your company’s C-level team and be confident that the numbers you’re – literally – banking on are as realistic as possible. We know from experience that shortcuts undermine realism and undercut the dependability of the data. This was a criticism levelled after TSB’s catastrophic failure – specifically that decisions were made to ‘simplify’ the testing process, such as excluding transactional journeys from the test, that impaired the results.
We can’t emphasise this enough: you need user accounts – lots of them – set up just like real customer accounts with banking and credit history, to fuel a realistic load test. Obviously, this can be problematic, as real customer data has to be scrupulously anonymised. And, because the whole point of capacity testing is to work out how your systems will cope with future traffic, you have to ensure you have more users than you think you might need. You’ll also need to factor in any third-party dependencies, including address lookup and credit check add-ons, to keep it realistic. It’s a challenge – but it’s one we’ve handled countless times, so we can talk you through it.
And, speaking of which…
In our experience, the systems we test contain bugs that can break the user interface and make it harder for us to deliver the load testing we plan as quickly and smoothly as we’d like. Often, these bugs aren’t even in the front-end application but could be rooted in business logic errors in the background that ripple through to failures at the front. In the case of our bank client, we picked up a specific error in the smaller-scale ‘pre-test’ we ran which allowed the client to feed it back to their third-party supplier and have it fixed straight away (thanks to the reproducible profiles which means we can demonstrate the problem).
There’s no such thing as a quick fix
In the banking and financial services sector, there’s no room for avoidable errors. This applies to online services as much as it does to every other part of the service companies deliver to their clients. We’d love to talk more about how we can help you swerve performance problems. Give us a call on 01227 768276 or drop us a line at firstname.lastname@example.org and we’ll show you how it’s done.