Story image

Why companies are jumping into data lakes

21 Nov 2016

We’re living in a world awash with expanding amounts of data. Some of it has been generated by business intelligence workloads, and some of it is less structured content that’s produced during manufacturing processes, or by retail point-of-sale devices and an ever-growing number of mobile, intelligent devices.

Then, of course, there is the Internet of Things, and its growing number of connected devices continuously streaming out increasing volumes of structured and unstructured data.

This huge wave of data is overwhelming many existing enterprise storage infrastructures, regardless of whether the intent is to store and process the data locally, in a cloud service provider’s data center, or in some combination of the two.

“Data lakes” are designed to address this data storage challenge, making the data more useful and accessible, and still allowing enterprises to meet their security, privacy and data governance needs.

What is a data lake?

Data lakes are a developing entity, and the industry hasn’t coalesced around a single, universally accepted definition. A consensus definition, derived from the consultation of several different sources, follows:

“A data lake is a storage mechanism designed to facilitate the colocation and use of many different types of data, including data that is date-defined using various schemata, structural frameworks, blobs and other files.”

The hope is that a data lake will make it possible for an enterprise to gain new business insights by accumulating large amounts of data, in the format chosen by each workload, and then make it easy to process using big data analytics, cross-workload analysis, reporting, research, and even some forms of transactional workloads.

New tools, new thoughts

The movement toward implementation of data lakes is at the intersection of several trends. One is a move by cloud service providers who are seeking to innovate and provide new storage products.

Another trend sees enterprises experiencing fundamental shifts in the sources of their data and how they are using it. The data is now coming from many types of end user-focused devices and systems and is still being generated and processed by traditional systems.

Efforts are underway to combine all of this structured and unstructured data, regardless of its form or original intent, making it easier to join with other systems of record. That’s where data lakes come in.

In addition, older approaches based on monolithic application and database design simply can’t offer the speed to keep up with consumer expectations, but they’re still being used to support legacy workloads.

A data lake is a new tool to help developers deal with the tsunami of data coming from everywhere and deliver the on-demand performance expected by all users.

Finally, there’s the cloud. The horizontal scalability of cloud computing has introduced new database architectures allowing enterprises to build massive data lakes at hyperscale while maintaining the necessary data consistency across distributed environments.

Concerns about diving into data lakes

Some industry research firms have published notes or conference presentations that warn that enterprises shouldn’t dive into a data lake without proper planning. Some things to watch for include:

Make sure providers are defining data lakes in a way that their tools and products really do serve your requirements. Consider the level of expertise or skills within your organization in data analysis and data manipulation in order to make the most optimum use of a data lake.

Ensure your corporate data governance, security or privacy policies match-up with your data lake implementation. Test that the storage performance of data lakes meet the needs of all workloads.

A storage and interconnection solution for data storage demands

Data lakes may be an emerging enterprise tool, but the general need to address better ways to store and exploit the burgeoning amounts of data they can store is longstanding and only increasing in relevance. Equinix Data Hub offers a data storage and interconnection solution that enables the enterprise to move massive data stores ̶ including data lakes – closer to where their data is created or needs to be accessed by users, analytics and clouds.

Data Hub is a localized storage repository that can be easily deployed in 40 markets worldwide, so companies can safely store their data close to users, analytics engines and clouds for faster access and accelerated processing and insights. Data Hub also enables robust disaster recovery strategies and makes it easy to comply with regulations worldwide requiring companies to house data within certain borders

Article by Lance Weaver, Equinix blog network 

Dropbox invests in hosting data inside Australia
Global collaboration platform Dropbox has announced it will now host Australian customer files onshore to support its growing base in the country.
Opinion: Meeting the edge computing challenge
Scale Computing's Alan Conboy discusses the importance of edge computing and the imminent challenges that lie ahead.
Alibaba Cloud discusses past and unveils ‘strategic upgrade’
Alibaba Group's Jeff Zhang spoke about the company’s aim to develop into a more technologically inclusive platform.
Protecting data centres from fire – your options
Chubb's Pierre Thorne discusses the countless potential implications of a data centre outage, and how to avoid them.
Opinion: How SD-WAN changes the game for 5G networks
5G/SD-WAN mobile edge computing and network slicing will enable and drive innovative NFV services, according to Kelly Ahuja, CEO, Versa Networks
TYAN unveils new inference-optimised GPU platforms with NVIDIA T4 accelerators
“TYAN servers with NVIDIA T4 GPUs are designed to excel at all accelerated workloads, including machine learning, deep learning, and virtual desktops.”
AMD delivers data center grunt for Google's new game streaming platform
'By combining our gaming DNA and data center technology leadership with a long-standing commitment to open platforms, AMD provides unique technologies and expertise to enable world-class cloud gaming experiences."
Inspur announces AI edge computing server with NVIDIA GPUs
“The dynamic nature and rapid expansion of AI workloads require an adaptive and optimised set of hardware, software and services for developers to utilise as they build their own solutions."