Updated at 26th November – more on: BorderFree failure, Google Ads failure, HSBC bank failure.
- Despite some Black Friday problems in the UK, the good news is that the pre-Christmas shopping event is now an established part of the calendar, with online retailers preparing more thoroughly than in previous years
- However, there were still plenty of frustrated shoppers online – like those at position 117,731 in the queue at Game
The weekend started badly when Facebook‘s Ad platform was down for some hours ahead of Black Friday, preventing media buyers from tweaking their campaigns. But – here in the UK at least – we noticed that many retailers have been better prepared for the increased footfall, with extensive load testing activity in the months running up to Black Friday, and additional 24/7 regression/CX work during the final changes before code freeze.
Across the many retailers to whom we provide realistic load testing, this preparation prevented nasty problems including:
- Migration on to a single instance from separately hosted UK and USA instances of the same Hybris/Demandware type platforms which caused an unexpected and large drop in capacity during load testing: all being hosted in the cloud, but root causes of the dip were complex interactions between settings in CDN, Cloud and coded caching
- Minor version upgrades to platforms causing CX bottlenecks, only exposed under specific load testing
- Problems stemming from late changes in the last ten days before Black Friday: code freezes seemed to be later this year than last year, with some testing queuing systems as late as Thursday
- Online queuing! Will 2019 see an end to this?
Looking at the problems on Black Friday itself
An early problem frustrated shoppers at the Game online store – as queuing systems kicked in late Thursday night.
But, the patterns of failure we observed across our many retailers this year were of Black Friday surges causing blips and glitches that affected a percentage of users rather than bringing down entire websites:
- Mobile Android vs IOS – subtle CSS/JS causing buttons to be hidden
- Checkout buttons not responding for certain products
- Search results stalling or returning incorrect options
- Checkout page – greyed out button connected with a problem product in the basket
The range of issues can be categorised into 3 groups:
- Business logic errors that are invisible to tech teams and often caused by regression – e.g. rules about minimum basket value, handling of products that are out-of-stock but are now already in the basket
- Manual errors that arise when information is incomplete or has been incorrectly entered – for instance where a price is missing because a country-discount-code value is wrong
A glaring example today of how frustrating errors can be for users was a store that displayed a significant percentage of findable products that were not buyable, leaving users confused about the message that replaced the ‘Buy’ button. In this case, the handling of out-of-stock items was the root cause – it appeared that some new business logic did not perform quite as planned under load.
Queuing systems also remained controversial. Customers took to Twitter to complain about being placed in queues, only to have to begin the process again later on. It may be that load testing queuing systems has been inadequate, although, it’s not an easy thing to get right! Load test projects we have worked on have required the ultimate in realism, to handle the triple whammy of huge load, using real browsers (not mere HTTPS calls), and modelling the complex user behaviour under queuing.
Some of the third-party postcode checking services had problems.
We saw this at a store between 7am and 8am on Black Friday morning. This doesn’t always translate to a lost sale if shoppers enter their postcode manually instead, but it does spark a painful increase in UX friction, when the user enters text and is left hanging.
Some stores did suffer full outages as Black Friday problems:
The Perfume Shop and Superdrug store outages
John Lewis and Harveys Beds also had online problems:
Lulemon site was down – but a banner ironically said later ‘ no queues’
Business Insider screen grab
And the, working’ site has this banner: ironic, eh?:
And HSBC Bank was down on Black Friday
This HSBC user took to Twitter to report a system error page, which is, perhaps, an example of the increasing Black Friday problem scenario whereby some users see problems but most don’t. A pain for trouble-shooting!
Later in the day it became apparent that HSBC had serious Black Friday problems
BorderFree suffered a complete service failure.
By the late afternoon of Black Friday in the UK, BorderFree had run into problems, with retailers reporting a lapse in service that was causing some products to be displayed without prices – this was certainly true for a couple UK retailers on their US sites.
BorderFree screen shot
Tips for planning 2019 – starting with the big picture
Business and marketing teams
Plan to meet your tech teams early – sit down in the next month and ask: ‘How can we test more realistically this year? Each year the business cost of failure is higher, and I don’t want to be exposed to a grilling from the board down through my directors!’
Don’t let 2019 be the year where you’re caught offering the lame excuse that you knew the preparation load testing programme was not fully realistic, but ‘hoped’ it would be good enough!
Answer this question truthfully: are the demands of implementing truly realistic load testing currently beyond the capabilities of your inhouse team or of the providers you used last year? Base your planning on a workable strategy rather than on seeing how far you can get before you come unstuck.
Ask the business for more money to bring in new processes and expert test partners. Let your managers know that your team can’t guarantee a problem-free Black Friday without the budget to prep for it! We know that the tech challenges are going to be even bigger in 2019, whether related to more cloud, more CDN, more personalisation, PWA adoption, experiments with dynamic pricing or AI-driven 3rd-party search add-ons.
Top Tips for tech planning
- Make Load Testing as part of your CD/CI?
- Match load testing to the environment. Load testing on a ‘similar but smaller’ environment is good for frequent CD/CI but make sure you’re also testing on the big, real scale too – maybe monthly or quarterly.
- Stop load testing if it’s not using real browsers! The days of using ‘curl’ and jMeter to grab URls are over!
- Be diligent. If your tech teams say something is ‘tricky to test realistically, it’ll be easier if instead we just…’, log and list them on the front page of your Black Friday document, so everyone will be aware of the realism gap, and can focus on trying to minimise it.
Black Friday 2019 – will there be problems due to over-reliance on common capacity?
Now that we’re well and truly in a Cloud world, what would happen if there were problems with the shared cloud that so many retailers depend upon?
The big spike events – Black Friday, Super Bowl, seasonal sales and so on – mean that everyone is simultaneously dependent on a small number of providers. Computer systems professionals have long cautioned against single points of failure but the trend towards greater concentration on fewer platforms continues.
We know that the big cloud providers do make mistakes – witness just days ago when Amazon exposed customers’ personal information to the public web – luckily it was quickly spotted and resolved. In the days running up to Black Friday, Azure and Office365 went effectively offline when users were unable to login for some hours. It’s not been a good year for Microsoft Azure, with Azure problems in September 2018, and June 2018; how long before an Azure or AWS problem happens at the same time as we all depend on them – perhaps one Black Friday in the future?
Google’s own Ad service failed on Black Friday, preventing ad-buying staff at Google’s clients from accessing the control portals, so the eggs-in-one-basket syndrome is wobbling already!