No one wants to hear about downtime, everybody wants to know what is up, what is available, what is running well. But downtime happens. Sometimes it is planned, for an update or for maintenance and sometimes it happens when you don’t expect it.
In either of these cases you need to be in control so that the impact of the downtime is minimised both on your brand and your bottomline. Downtime is unavoidable as it is usually necessary for at least the following:
- Diagnostics to isolate a detected fault
- Hardware fault repair
- Fixing an error or omission in a configuration database or omission in a recent configuration database change
- Fixing an error in application database or an error in a recent application database change
- Software patching/software updates to fix a software fault.
- Rolling out a new version, development or upgrade.
The bottom line is that downtime is money. This means that organizations are constantly seeking ways to manage the impact of downtime as part of a push to reduce costs and to provide reliable, more responsive online services to users. When errors do occur, it is vital that causes are diagnosed and eliminated as rapidly as possible.
As sites become more complex, and the number of platforms that can access them increases, the danger is that periods of downtime will grow longer as testing for all of these variables following a repair, rollout or update takes an increasing amount of time, but as eCommerce, lead generation and other online activities increase in importance to organisations the drive is to reduce downtime, providing a challenge for IT Managers everywhere.
Research has shown that while most thinking and planning happens around the idea of “unplanned downtime” about 70-90% of downtime is, in fact, planned. That being the case it is important to make sure it is well planned. In order to effectively plan for downtime you will need to understand site use so as to take down elements at the time of least impact whether to revenue, sales or traffic.
Letting customers know that downtime is planned, and when they can expect to use your site, or particular functions and content, can do a lot to mitigate the risk of brand damage. While no one likes to find that they cannot perform the task, or visit the website, that they intended to the knowledge that this is a temporary situation, under the control of the brand, will instill more confidence than leaving them to wonder what is happening with no end in sight.
In addition, customer services can be informed of any planned downtime and better support callers accordingly.
Slow-downs, Availability and Poor Performance
What we have been talking about above is “hard downtime”, where the site, application or service is unavailable for a period. However, there is a more insidious kind of “soft downtime” where basic “synthetic” monitoring software will show you that the site is “up” – but performance issues such as slow delivery of pages or content, or problems with search and selection of options can make the site “down” in all practical senses to users who are not prepared to wait for slow loading pages or fight with poor rendering times. If customers become frustrated with performance and leave without completing a transaction, the effect is the same as if the site went down the owner loses revenues and customers.
In fact this simplistic view of “up /down monitoring” of a handful of pre-defined URLS may be giving site owners a false sense of security.
Tribe’s realistic “Dynamic User Journeys” avoid this issue by taking the real paths that users do through the site, choosing randomly from the same options that users can, and alerting on slow steps and journeys (as configured by the user) as well as errors, issues. and outages.
thinkTribe Monitoring Portal
There are also monitoring and reporting issues with downtime. Where a scheduled maintenance period occurs, and downtime or errors are expected to happen, this should be recorded, or course, but should have the option of being excluded from SLA, KPI and Business level reporting about performance so as not to skew the results.
The thinkTribe Monitoring Portal allows administrators to take control of planned downtime with “Planned Maintenance Exclusions”(or PMEs) to define time-lines during which monitoring data collected can be filtered to exclude it’s effect from your overall availability and delivery times and alerts to the team can be disabled (apart from the over-riding and scarily named ‘total meltdown alert’) as they already know that the site will be down during this time.
By using PME settings for all of your planned and intended outages – you can ensure that a Journey reports as 100% Available, even though it had errors, so long as the errors occurred only in the PME time windows that are not considered true downtime within your organisation. These excluded errors can still be viewed and investigated if needed.It is still possible to still allow alerting when you configure PME windows – this can be useful for any predictable time periods where Errors may occur and you wish to be notified for support, annotation, recording or analysis purposes, but the impact of errors in these timelines is not considered true downtime within your organisation.
There are Ad Hoc PME Windows:
– ones that you create for a one-off event e.g. for a planned maintenance slot coming up
and Recurring PME Windows:
– for predictable events – such as a weekly overnight batch process on your site which causes Journeys to fail and trigger Alerts you don’t wish to receive
– for predictable events that briefly cause Journeys to run slowly, and you want to ignore that from calculating overall Delivery Times
Site Release Monitoring (for performance)
Related to downtime due to errors, slowdown and maintenance is the need to temporarily take down the site while making a major update or release. PME Windows can be used in conjunctions with thinkTribe’s Site Release Monitor which enables clients to understand the impact of changes, updates and releases on both site performance and the bottom-line in the critical hours following deployment.
More and more often discussions about monitoring and testing with clients turned around ways to ensure that the impact of releases could be managed and measured accurately for all stakeholders throughout the organisation. Anything that affects user behaviour (whether that is a network, hardware, software, content or design issue) will be felt directly on both brand and the bottom line. We found that the constant cycle of development, testing, and release, use of agile development methodologies and the “permanent beta” approach have put greater pressure on the release management team and increased the need for robust, reliable, comparative performance and impact data.
The SV-Release Monitor reporting provides you not only with technical and operational information,but shows the impact of changes in performance on opportunity cost, the value of lost sales, the potential of lost leads, perception of customer service, user experience and brand management across all media. Reports are automatically generated and emailed directly to all stakeholders specified by the administrator, enabling immediate information sharing throughout the organization, reducing waiting time and allowing for instant response and action where necessary.
Performance is compared in terms of percentage change from the comparison periods, complemented by an intuitive graphical display that offers “understanding at a glance” with a red bar indicating a decline in performance and a green bar an improvement.
As with all SV-Monitor services the SV- Site Release Monitoring includes the ability to add annotations against any errors or slowdowns for future documentation, analysis and review, and the ability to cross post information with your trouble-ticket system.
Load Testing is usually thought of in terms of new developments and releases, or for checking that there will be no issues with performance at peak times for site usage (eg pre-Christmas for retailers). Another little used, but very valuable application of load testing is to stress test your existing set up to understand the redundancy you have there – or what can be taken down for maintenance and repair without adversely affecting user experience.
Read more on website Load Testing