Article by Cloudera product, marketing manager Vineeth Varughese, and Dell Technologies advisory systems engineer Kyle Prins.
The vast swell in data generated by COVID-19 is creating new growth opportunities for enterprises, but having the right infrastructure is critical in navigating what has effectively become a data storm.
Gartner recently warned that “data and analytics leaders must prepare for the complexities of multi-cloud and intercloud deployments to avoid potential performance issues, unplanned cost overruns and difficulties with integration efforts”.
According to recent research from Cloudera, 47% of organisations retain their data on-premise, 32% in private cloud, 26% in hybrid cloud, 24% in multi-cloud and 21% in single cloud.
This split provides options for enterprises at various stages of their data journey but what is increasingly needed is a birds-eye view across these multiple platforms, and the ability to analyse different data points efficiently. Security and access across multiple teams are arguably even more critical in a highly scrutinised world — no one wants to be leading the news for the wrong reasons.
If data scientists can access high-quality data more easily, then organisations can make more informed data-driven decisions and respond to threats and opportunities. This, in turn, leads to a comprehensive view that supports strategically driven business growth — be it through improving customer engagement, smart automation or reducing churn.
Modern architecture in private and public cloud
More organisations are looking to adopt the modern architecture that private cloud offers. Driving this interest is a desire by enterprises to bring cloud-native capabilities on-premise for data centers — with hybrid capabilities.
These ‘container clouds' bring together analytic experiences in data centers. They are also often much faster at provisioning enterprises seeking to separate compute and storage layers completely and better optimise each while maintaining security and governance which are so critical today.
Solving ‘noisy neighbours' and cluster sprawl
A key challenge faced by those responsible for setting up data clusters is ‘noisy neighbours', where shared clusters have multiple business applications running on them. Suppose one happens to ‘spike' at an unexpected time. In that case, it can take resources away from other applications on a cluster, making it difficult for a platform manager to predict its performance.
To avoid the ‘noisy neighbour' problem, it's common to create entirely new clusters to support new critical applications. And that leads to the second problem — cluster sprawl and cost.
As new clusters are added, more complexity is also added into the environment. It gets more difficult to manage different clusters with data needing to be replicated multiple times, creating additional costs.
It's increasingly challenging and increases the risk of data silos and a lack of elasticity. It's also common for data managers to have clusters sitting idle while they plan for data growth making utilisation rates drop.
When it comes to the stage of upgrades, you can have multiple business applications running on a cluster with a platform manager trying to negotiate with different business teams — some may want more innovative experiences; others may want more stability.
One team, for example, may be worried about risks when upgrades are happening and want to isolate data. Isolating tenants, companies can upgrade at their own choosing.
Not only can enterprises have agility through application upgrades, but they can also have hardware upgrade agility where they can fully automate and deploy firmware upgrades to their infrastructure.
It can be easy to underestimate the time needed for planning, procuring hardware, configuring clusters, onboarding, maintaining, and supporting applications. By saving time in onboarding and upgrading applications, organisations can make data-driven decisions more quickly and avoid downstream problems, such as a lack of data integration and misalignment with their data strategy.
Having flexible software-defined architecture and consolidation allows for the management of data for several Hadoop distributions simultaneously, enabling phased upgrade or migration services from a traditional on-premises platform to a modern, cloud-native, on-premises platform with the ability to support hybrid cloud deployments. This simplifies the process and significantly minimises business risks for those migrating to the new Hadoop distribution.
Ultimately, well-considered and developed private clouds enable people to work better by being more productive. Teams work better because they're collaborating more efficiently across entire data lifecycles. And businesses are working better by leveraging their data to make more data-driven decisions — and thrive in the data storm, which shows no sign of easing.