The high cost of downtime is well documented. According to research by the Ponemon Institute, the average cost of a data center outage is nearly $9,000 per minute. The average total cost is nearly $750,000. These figures include the cost to detect, contain and recover from the outage, along with productivity losses, lost revenue, opportunity costs, customer churn and reputational damage.
Clearly, rapid response to an incident is the key to minimizing the cost of downtime. That’s why data centers develop emergency operating procedures (EOPs) that data center personnel follow to prevent an incident from becoming a full-scale outage. In practice, however, rapid response is often hampered by an inability to access IT equipment remotely.
If an outage occurs in a distant facility or in the middle of the night, the engineer on call must get to the data center in order to troubleshoot the issue. If the organization decides to use a remote “smart hands” resource, that person may have very little to go on. The wrong equipment may be switched on or off, adding to the pain. Delays and human error result in a longer outage and higher costs.
Redundancy vs. Resilience
In order to mitigate the risk of an outage, organizations typically develop business continuity plans that focus on the physical data center infrastructure. They implement redundant power sources and uninterruptible power supplies, along with redundant systems and networking gear, as a hedge against equipment failure. They may also have the ability to failover to a redundant facility or cloud service in the event of catastrophic outage.
But redundancy isn’t enough to create resilience. What if there’s a cable cut in the “last mile” Internet connection? What if a security breach takes down one or more network segments? What if there’s a problem with a firmware update across the entire fleet of network switches? No amount of redundant equipment will protect against these kinds of problems.
Many organizations have invested in advanced infrastructure management tools to streamline their data center operations, but these tools require a network connection. With the primary network down, data center engineers have no way of troubleshooting the issue without going onsite.
Why Out-of-Band Management
One of Rahi’s European customers recently decided to make its infrastructure more resilient by implementing an out-of-band management solution. With out-of-band management data center engineers can access and control IT equipment remotely, even when the primary network is unavailable. A separate network is set up to support serial console servers that allow administrators to remotely manage network gear through the device’s serial port. Service processors provide low-level access for remote monitoring and management.
Setting up an out-of-band management network is not cheap — but neither is buying redundant equipment that sits idle unless there’s a failure. And the management network is going to be more effective in terms of accelerating problem resolution and minimizing downtime. Out-of-band management can also help streamline upgrades and facilitate the maintenance of equipment in offsite data centers, co-location facilities and remote offices.
If you’re looking to create a truly resilient infrastructure, an out-of-band management network can provide complete visibility and control without costly and time-consuming “truck rolls” to remote facilities. Rahi Systems has proven expertise in out-of-band management and can help you implement a solution that will minimize the cost and risk of a data center outage.