As the amount of data being generated has grown over the last several decades, the tech media has experienced periodic outbursts of panic about whether or not the world’s data centers have the storage capacity to handle all that information. Take, for instance, this alarmist headline from 2007: “Capacity Crisis: Data Centers Running Out of Space and Are on Power Overload.” The article goes on to warn that data centers faced a “time of crisis” and would soon face “more and more instances of downtime and failure.”
And yet, more than ten years later, data centers are stronger than ever, with well over 500,000 facilities worldwide with an estimated total capacity of more than 1400 exabytes (1 exabyte is roughly equal to 1 billion gigabytes). That’s a lot of storage, but considering that the US alone produces over 2.5 million gigabytes of data every minute, will it be enough?
One problem with asking this question is that it rarely takes into account the rapid developments in storage technology over the course of the last decade. Much like early 2000s panic over data center power consumption, concerns over storage limitations are grounded in the present. For decades, hard disk drives (HDDs) were the foundation of data storage due to their reliability and relatively low cost. Although there is still room for innovation, HDDs are finally hitting their performance limits. Further improvements would require far too much power consumption to be practical.
In the meantime, solid state drives (SSDs), better known as flash storage, have made tremendous advancements in reliability, performance, and capacity. While the technology has not yet reached price parity with HDD storage, the performance trade-off has already convinced many manufacturers and enterprises to abandon the older format. More importantly, SSD memory has only scratched the surface of its potential, with new advances coming every year to deliver better speed and capacity.
A major source of growing data volume is the proliferation of Internet of Things (IoT) devices. Continuously connected to network infrastructures and constantly gathering data, IoT devices are expected to exceed 20 billion units by 2020. These devices account for many of the exponential growth projections that are raising questions about global data storage capacity. Even the most conservative estimates expect IoT devices to generate dozens of zettabytes (1 zettabyte is roughly equal to 1 trillion gigabytes) worth of data annually within the next five years.
Fortunately, the very nature of IoT devices makes them a slightly less daunting problem than they might initially appear. Much of the data that these devices gather is processed locally, oftentimes with the device’s own computing power. In other cases, this data will be relayed to an edge data center rather than an enterprise-grade or hyperscale data center. Edge data centers may gather data, but they aren’t primarily used for storing it. Much of the information gathered by IoT devices is either redundant or non-essential and can easily be discarded. While edge computing architectures will require powerful analytics to determine what data needs to be retain and what can be marked for deletion, implementing these filtering measures will greatly diminish the pressure on data center capacity.
Much of the data being generated today is considered unstructured, perhaps up to 80% of it if estimates are accurate. Unstructured data is distinguished by its lack of any specific format. It can come in many sizes, shapes, and forms, making it a challenge to manage. This data can contain many valuable business insights, but finding those insights can be like searching for a needle in a haystack. Only about 10% of unstructured data is worth being saved for analysis.
Much of the data generated by IoT devices is unstructured, and so the same basic strategies can be deployed to deal with other forms of unstructured data. By using cognitive, AI-driven technology, companies are already finding ways to better interpret, evaluate, and derive insights from this data, making it easier to manage in the process. Just because data centers have exabytes worth of storage doesn’t mean every scrap of data needs to be preserved. Much of this information is redundant, irrelevant, or damaged, so any tools that make it possible to identify and discard “useless” data will prove invaluable in to effective data storage and management.
Of course, when all else fails, there’s always the simple solution: build more data centers!
In addition to investing in new memory technology and using analytics to separate the wheat from the chaff, companies are also taking steps to increase their overall data capacity, investing more than $18 billion in data center construction in the US alone. This figure doesn’t include plans to retrofit existing facilities to take advantage of the new storage technology and and best data practices. Over the next two to five years, multi-tenant data center revenue is expected to increase by 12%-14% per year. Much of that growth will come from expanded capacity.
In addition to highly agile edge data centers that are helping to realize the potential of IoT, massive hyperscale data centers are rapidly taking over the enterprise sector. The US is a leading player here as well, with about 44% of the world’s hyperscale facilities, far outpacing China (8%) and Germany (5%). Some estimates even predict that over 50% of all data traffic will pass through these massive data centers within the next few years. On the other end of the spectrum, many companies are experimenting with modular data centers that can be assembled onsite to place storage closer to end users and repurpose existing commercial space to take the pressure off network capacity.
With all of the innovations in memory hardware and data center construction over the last decade, reports of the “data center overload” seem to have been greatly exaggerated. While data storage will always be a critical concern for companies, the combination of new storage technology and more efficient memory management are poised to provide companies with all the tools they need to deal with the truly massive amounts of data being generated today and in the future.
Ross is a Regional Vice President, Operations at vXchnge and is responsible for managing all 14 data center locations. With more than 30 years of experience, Ross has managed data center construction, engineering, repair and maintenance, leading him to the emerging business of colocation.