How to stop data lakes from getting swamped
A "data lake" sure sounds inviting.
Cool flows of structured and unstructured data, all streaming into a vast repository, where companies are free to fish out awesome new insights all day long.
But without the right approach, that data lake isn't as welcoming as it looks on the surface.
The sheer volume of data, for instance, can easily overwhelm companies who aren't discerning about what is filling the lake, and why.
The weight of all this data can also clog things up unless companies are committed to using the latest technology to integrate it and process it for maximum insight.
The data also needs to be fast and easy to access and secure, so companies can get value from it while ensuring the data isn't misused or compromised.
In short, it doesn't take much for a data lake to start looking like a data swamp: a stagnant, murky place, where when you stick in a net, you can't be sure what will come up.
Avoiding data swamps is a must to truly capitalize on increasing volumes of data and generate new business intelligence that propels growth.
Fortunately, there are ways to keep data lakes dynamic, pristine and viable business assets.
Save the lakesThe rise of data lakes is the result of the sheer amount of information available today.
Technologies like the Internet of Things (IoT) and its billions of global sensors stream out data that's never been collected before, promising the discovery of insights that just a few years ago weren't knowable and the monetization of data flows that we didn't imagine existed.
Today, for instance, agriculture companies can crunch centuries of crop data to better predict weather patterns and yields.
Transportation firms can turn to big data to optimize traffic routes by combining past and current records about vehicle speeds, weather, road conditions and fuel consumption. It's exciting, but this kind of information must live somewhere where it's useful, accessible and safe.
Data lakes that can't offer those things are a waste of money and a lost opportunity to capitalize on today's unbelievably rich data resources. Here are a few quick tips for companies looking to avoid data swamps:
- Be selective
Information overload isn't a new problem, but it takes on new dimensions for data lakes in an age when Cisco says global big data volumes are soaring toward 402 Exabytes (1 exabyte = 1 billion gigabytes) by 2021, an eight-fold increase from 2016.
In the face of all that information, companies need to resist the temptation to over-collect data just because it's available.
Companies need to know exactly what business problem they are trying to address and precisely what they hope to achieve with the data they're gathering.
This can help them avoid filling data lakes with volumes of information that do nothing but bury them in the muck and prevent them from taking advantage of what their data offers.
- Automate
To truly make sense of the data filling their data lakes, companies need to take advantage of emerging technologies like artificial intelligence (AI) and machine learning that can help them sort, analyze and learn from the data with superhuman efficiency.
These capabilities help companies spot patterns, create hypothesis and find value in their data lakes that might otherwise go unnoticed.
Companies are increasingly learning this. In NewVantage Partners' annual executive survey, 76.5% of executives indicate that the proliferation and greater availability of data is empowering AI and cognitive initiatives in their organizations.
"The survey results make clear that executives now see a direct correlation between big data capabilities and AI initiatives," according to the MIT Sloan Management Review.
In short, more automation means fewer data swamps.
- Keep it close
Distance matters because it delays many of the functions that prevent data lakes from devolving into data swamps.
The further away data lakes are from where data is created or needs to be accessed and analyzed, the greater the chance that latency will slow analytics engines or the various processes that drive AI, such as interconnection between cloud apps, data sources, users, etc.
Creating data lakes in proximity to where data is stored, produced or needed by users and applications maximizes security and optimizes the functions powered by the data the lakes contain, which keeps the lakes fresh and productive.
Data lakes thrive hereA global interconnection platform is a place where a data lake can thrive.
It provides the proximity to various sources, data stores, analytics, and cloud and network partners that's so crucial to keeping data lakes healthy.
Platform Equinix spans 48 markets on five continents, so companies can create data lakes close to almost anywhere.
The network- and cloud-density on Platform Equinix (1,700+ networks, 2,900+ cloud and IT service providers) is also a huge benefit because it enables interconnection to the cloud and network services needed to fully exploit a company's data assets.
In addition, Equinix Data Hub is a solution deployed on Platform Equinix that's designed to enable companies worldwide to store vast amounts of data at a local level, for quick access by the people and applications that need it.
That's a data swamp preventative if there ever was one.
Article by Jim Poole, Equinix Blog Network