What if data center failure was a value-added option? This idea gained some traction at the recent Rackspace Solve Event in San Francisco as attendees dove into concepts like the new CoreOS, a “Linux-based operating system built for running large server deployments” that updates automatically across all deployment nodes — an idea the OS borrowed from the Google Chrome browser. Ideally, CoreOS could endure the failure of multiple servers in a cluster without any impact to end users or data center customers.Some experts believe that the rise of CoreOS and similar open data center initiatives will pave the way for Google-style mega servers — but is a 'failure feature' really the future?
In Columbia, TN, a critical data center equipment failure on August 13th left more than 5000 Columbia Power and Water System (CPWS) subscribers without cable television or Internet service. According to the Daily Herald, a transfer switch in the data center — which transfers operations to a generator in the event of a power loss — caught fire and was destroyed, leaving both primary and secondary systems offline.
While this kind of incident is the exception, not the rule, it's no surprise that companies considering their data center options still put reliability at the top of the list. A recent study from Emerson Network Power found that 77 percent of IT professionals say that “there is more concern today about maintaining availability uptime than there was 12 months ago”. And yet 91 percent of those asked also said their data center had experienced an unplanned outage in the last 24 months.
The open data center concept is gaining ground as companies like Docker and CoreOS look to convince small and midsize companies that any kind of computing at scale requires the same type of redundant infrastructure approach used by power players like Google and Facebook, and that failure might not be so bad. It's useful, therefore, to consider the idea of failure as simply a different way of looking at uptime and redundancy.
Typical data center thinking goes like this: failure can be minimized through the use of multiple power systems, intelligent heating profiles and strict server maintenance. The problem? Outages still occur, often when IT professionals least expect them.
New data center thinking takes server failure as a given, and then looks for ways to make it less onerous for everyone involved. Have a server go down? Rather than scramble to find a backup, concepts like CoreOS forward the idea that distribution and large-scale replication make it possible to keep an entire system running even when key components start to fail. It's like the software-side version of colocated, carrier-neutral data centers: multiplicity is inherently stronger than singularity.
Do IT professionals want their data centers to fail? Never. But they're fully aware this isn't possible — is it better, then, to consider the idea of failure as a manageable, auditable feature?
As the Marketing Manager for vXchnge, Kaylie handles the coordination and logistics of tradeshows and events. She is responsible for social media marketing and brand promotion through various outlets. She enjoys developing new ways and events to capture the attention of the vXchnge audience.