November 24, 2013
Most of the Greater Toronto Area and Southern Ontario has been reeling in the aftermath of the ice storm that at its peak left over 300,000 people and businesses without power. For businesses, the most painful part of this outage was being susceptible to losing access to their pertinent data. Now that everything relies on data services, the internet and connected services are as important, if not more, than power and heating. It is a utility, as with a utility, you only notice its importance when it is broken. While many of our clients may have had local power outages, in a lot of cases, Pathway BCP (business continuity planning) and backup and DR (disaster recovery services) have helped mitigate these local outages through features like multiple transparent uplinks (connectivity continuity), power continuity infrastructure, transparent remote desktop access, synchronized global data availability, and geographical DR.
A connected worldAlthough the above statement and list of methods seem like an unfair plug, this is done to emphasize the importance of implementing disaster and business continuity planning at multiple layers, from facilities right up to staff and their iPads, even if it’s not implemented with Pathway. Why? Your uptime also helps others as well as the economy. A healthy and working economy where our suppliers and clients are up and running is also good for us because we all rely on one another.
Controlling the controllableBusiness owners and operators often feel as though the stakes are higher for them following an outage (in contrast to residential data service users) because the ability to generate revenue by the minute, and the safety and integrity of data and operations, are compromised. While events like ice storms, heavy rains, floods, and tornadoes can’t be avoided or controlled, there are at least two aspects that can be controlled (details are left out): The integrity and availability of business data, and the systems that control and manipulate that data. Security, capacity, uptime are often discussed under this umbrella. The speed with which the business recovers from an outage-causing event. This encapsulates concepts like recovery time and recovery point.
A few factsConsider a few key facts and stats:
- DR is worth the money — bad stuff happens very often. Technology outages are at least 100 times more frequent than automobile accidents. The frequency of hard drive, network, power, CPU and desktop failures is very high.
- Costing is very poorly done. When events of increasing potency occur more frequently, mid-sized to large businesses have no choice but to take notice. Look at variables like frequency of interruptions, reliance on certain systems, and the amount of revenue that is lost each hour that a integral system is down.
- Growth plans can be affected. For example, the retail industry was planning for a 3-8 percent growth in 2013 Christmas sales and must now make up for the downtime in both physical footfall and online commerce. Retail isn’t the only industry affected; suppliers and customers horizontally slow down entire verticals, not just their own during an outage.
- Frequency and severity are both factors. In Toronto, summer storms have been more frequent and extreme every year. Tornadoes, once a statistical outlier, are now a staple. Between 2010 and 2013, the average global distribution, frequency and average global severity of weather events has continued to increase.
- Not just summer. Although winter storms are less frequent, they can sometimes have very adverse affects on the economy. Entire supply chains can be disrupted as a result.
- Not just power. Security related outages have grown exponentially in the last three years. Between 2011 to 2013 almost all major technology providers were breached in some manner. Political and economic events have made security and the related threat tools and countermeasures a new battleground for hackers and businesses. The need to privately and resiliently implement a global and secure availability cloud of services has risen to meet this need. We believe this field will only grow, so that no single point of breach compromises any given service in its entirety.
- Other factors include capacity and general monitoring related outages. For example, running out of disk space is an embarrassing reason to declare an outage. Modern software and the growth of BYOD (bring your own device) and data obfuscates a lot of these metrics and compounds the problem. Poor local asset monitoring and capacity planning accounts for almost 35% of avoidable outages at the business premises (i.e. on the business premises, when not placed in a properly monitored setup).
- Service level agreements need to have teeth. A promise of “five nines” uptime (99.999%) for a given service on an “around the clock” basis uptime means less than 6 minutes of downtime for an entire year. Ask questions about the services to which it applies (is it power, network, apps, everything?), the list of exclusions, how guarantees are honoured and penalties paid, and whether it applies around the clock.