Story image

Why companies are jumping into data lakes

21 Nov 16

We’re living in a world awash with expanding amounts of data. Some of it has been generated by business intelligence workloads, and some of it is less structured content that’s produced during manufacturing processes, or by retail point-of-sale devices and an ever-growing number of mobile, intelligent devices.

Then, of course, there is the Internet of Things, and its growing number of connected devices continuously streaming out increasing volumes of structured and unstructured data.

This huge wave of data is overwhelming many existing enterprise storage infrastructures, regardless of whether the intent is to store and process the data locally, in a cloud service provider’s data center, or in some combination of the two.

“Data lakes” are designed to address this data storage challenge, making the data more useful and accessible, and still allowing enterprises to meet their security, privacy and data governance needs.

What is a data lake?

Data lakes are a developing entity, and the industry hasn’t coalesced around a single, universally accepted definition. A consensus definition, derived from the consultation of several different sources, follows:

“A data lake is a storage mechanism designed to facilitate the colocation and use of many different types of data, including data that is date-defined using various schemata, structural frameworks, blobs and other files.”

The hope is that a data lake will make it possible for an enterprise to gain new business insights by accumulating large amounts of data, in the format chosen by each workload, and then make it easy to process using big data analytics, cross-workload analysis, reporting, research, and even some forms of transactional workloads.

New tools, new thoughts

The movement toward implementation of data lakes is at the intersection of several trends. One is a move by cloud service providers who are seeking to innovate and provide new storage products.

Another trend sees enterprises experiencing fundamental shifts in the sources of their data and how they are using it. The data is now coming from many types of end user-focused devices and systems and is still being generated and processed by traditional systems.

Efforts are underway to combine all of this structured and unstructured data, regardless of its form or original intent, making it easier to join with other systems of record. That’s where data lakes come in.

In addition, older approaches based on monolithic application and database design simply can’t offer the speed to keep up with consumer expectations, but they’re still being used to support legacy workloads.

A data lake is a new tool to help developers deal with the tsunami of data coming from everywhere and deliver the on-demand performance expected by all users.

Finally, there’s the cloud. The horizontal scalability of cloud computing has introduced new database architectures allowing enterprises to build massive data lakes at hyperscale while maintaining the necessary data consistency across distributed environments.

Concerns about diving into data lakes

Some industry research firms have published notes or conference presentations that warn that enterprises shouldn’t dive into a data lake without proper planning. Some things to watch for include:

Make sure providers are defining data lakes in a way that their tools and products really do serve your requirements.
Consider the level of expertise or skills within your organization in data analysis and data manipulation in order to make the most optimum use of a data lake.

Ensure your corporate data governance, security or privacy policies match-up with your data lake implementation.
Test that the storage performance of data lakes meet the needs of all workloads.

A storage and interconnection solution for data storage demands

Data lakes may be an emerging enterprise tool, but the general need to address better ways to store and exploit the burgeoning amounts of data they can store is longstanding and only increasing in relevance. Equinix Data Hub offers a data storage and interconnection solution that enables the enterprise to move massive data stores ̶ including data lakes – closer to where their data is created or needs to be accessed by users, analytics and clouds.

Data Hub is a localized storage repository that can be easily deployed in 40 markets worldwide, so companies can safely store their data close to users, analytics engines and clouds for faster access and accelerated processing and insights. Data Hub also enables robust disaster recovery strategies and makes it easy to comply with regulations worldwide requiring companies to house data within certain borders

Article by Lance Weaver, Equinix blog network 

MulteFire announces industrial IoT network specification
The specification aims to deliver robust wireless network capabilities for Industrial IoT and enterprises.
Google Cloud, Palo Alto Networks extend partnership
Google Cloud and Palo Alto Networks have extended their partnership to include more security features and customer support for all major public clouds.
DigiCert conquers Google's distrust of Symantec certs
“This could have been an extremely disruptive event to online commerce," comments DigiCert CEO John Merrill. 
Schneider Electric's bets for the 2019 data centre industry
From IT and telco merging to the renaissance of liquid cooling, here are the company's top predictions for the year ahead.
China to usurp Europe in becoming AI research world leader
A new study has found China is outpacing Europe and the US in terms of AI research output and growth.
Fujitsu’s WA data centre undergoing efficiency upgrade
Fujitsu's Malaga data centre in Perth has hit a four-star rating from National Australia Built Environment Rating System (NABERS).
Google says ‘circular economy’ needed for data centres
Google's Sustainability Officer believes major changes are critical in data centres to emulate the cyclical life of nature.
How to keep network infrastructure secure and available
Two OVH executives have weighed in on how network infrastructure and the challenges in that space will be evolving in the coming year.