Data Gravity: What Is It And How Does It Impact Your Network?

By: Kaylie Gyarmathy on January 2, 2020

While most people are familiar with the notion that “what goes up, must come down,” this often distorts the way they think of gravity as a physical force. They think of gravity in strictly two-dimensional terms despite the fact that gravity is constantly exerting its pull in every direction. Planets provide one of the best examples for the concept because there’s no up or down to consider in space. Bodies of data in a network system are easy to think of as virtual planets in digital space, and like their physical counterparts, they also exert a pull of their own that’s often referred to as the data gravity concept.

What is Data Gravity?

The term “data gravity” is a metaphor coined by software engineer Dave McCrory in 2010. He was trying to convey the idea that large masses of data exert a form of gravitational pull within IT systems. According to the laws of physics, objects with sufficient mass pull objects with less mass toward them. In the realm of physical objects, this law explains why moons orbit around planets and why planets orbit the sun in turn.

While data doesn’t literally exert a gravitational pull, McCrory used the concept to explain why smaller applications and other bodies of data seemed to gather around larger masses. More importantly, as applications and the datasets associated with them grow larger, they become more difficult to move.

Data Gravity and Latency

No discussion of data distribution would be complete without mentioning latency. The primary reason why data gravity exerts such powerful influence is that network systems and workloads simply perform better the closer they are to one another. That’s because, for as fast as it moves, data is still constrained by the laws of physics. Every byte of data must physically travel from one location to another within a network system. The closer data is to its destination, the faster it will get there and the faster the system will execute a task.

Data Gravity, Storage, and Cloud Computing

When data was stored primarily in mainframes and physical databases, this centralizing effect was taken for granted. With the widespread adoption of cloud computing, however, the impact of data gravity has become a much more important consideration for organizations. Where data is gathered and how large it grows matters tremendously for a company’s IT and digital transformation strategy.

The problem is that as data gathers, it becomes increasingly unwieldy. Moving it from one location to another is complicated, especially when it isn’t very well organized. Modern IT infrastructures, however, are spread across multiple storage systems and servers, both physical and virtual. Theoretically, data should be able to flow freely from one system and application to another, but in reality, moving large stores of data is difficult, costly, and risky. As more and more organizations adopt hybrid IT deployments that incorporate a mixture of public and private cloud computing, the problem becomes even more complex.

The Demands of Big Data and Enterprise Data Analytics Architecture

Massive amounts of data are being gathered today by customer-facing applications and edge computing IoT devices distributed along the edges of networks. By analyzing this data for patterns and other unique characteristics, organizations can gain tremendous insights that help them to drive innovation and grow their business. The vast majority of that data, however, is unstructured. Sifting through it requires complex algorithms that incorporate the latest developments in artificial intelligence and machine learning.

Organizations need to be able to shift their data from where it’s being stored to where those analytics applications are located. For a network system organized along traditional lines, with a variety of enterprise storage systems and data analytics architectures, data gravity makes it difficult to manage the demands of big data.

Data Gravity and Hyperconvergence

These challenges have driven many organizations to implement hyper-converged infrastructures that break down the barriers between computing and storage resources. Enterprise data analytics applications function best when they’re able to own the data they’re analyzing. A hyper-converged infrastructure that enables data consolidation makes it much easier to manage that data and bring processing power to bear even in the face of data gravity.

Of course, scalability and storage optimization remain major issues in such deployments. Any organization dealing in data needs to be able to scale its ability to manage that data as its business grows. That scalability needs to be accompanied by ongoing storage optimization that ensures frequently accessed data is stored in the fastest, most easily accessed locations while data used less often is relegated to slower, higher capacity storage.

As data collection has accelerated and organizations become increasingly dependent upon data analysis, the problems associated with data gravity have become much more acute. Putting off questions about how to manage growing datasets can potentially leave a company forced to deal with massive bodies of data that are so dense and complex that they are all but impossible to migrate or manage. By adopting more converged network infrastructures as they scale, they can better cope with the dynamics of data gravity and ensure that they aren’t trapped in its orbit.

Speak to an Expert About Your Company's Specific Data Center Needs