
Onehouse unveils Open Engines for flexible data lakehouses
Onehouse has announced the launch of Open Engines, designed to enable users to deploy open compute engines to process any table on their open data lakehouse platform.
The company states that Open Engines addresses challenges associated with traditional data platforms, which often require users to commit to a single query engine or data warehouse, potentially limiting flexibility for diverse data needs.
Open Engines is described as a new component of the Onehouse Cloud Lakehouse Platform. It permits users to deploy open source compute engines such as Apache Flink for stream processing, Trino for distributed SQL queries, and Ray for machine learning, artificial intelligence, and data science workloads. The deployment process operates through the Onehouse Compute Runtime, a specialised runtime optimised for lakehouse workloads.
Onehouse claims that, in combination, Open Engines and Onehouse Compute Runtime can accelerate queries by factors ranging from 2 to 30, while also reducing cloud infrastructure costs by 20 to 80 percent. The integration with OneSync multi-catalog synchronisation is included, allowing engines to access any tables managed or created by Onehouse, across different data catalogues.
Traditional data platform architectures have often concentrated on an engine-centric method, which the company asserts can restrict users. According to Onehouse, alternatives such as self-deployment of open source engines may involve substantial engineering efforts and ongoing maintenance.
Onehouse promotes an alternative approach described as a "data-centric architecture", in which the open, interoperable data platform is prioritised over the choice of engine. With this structure, data is central, and multiple popular engines are supported, giving users the freedom to select the engine best suited to each use case.
Vinoth Chandar, Founder and Chief Executive Officer at Onehouse and the original creator of the data lakehouse architecture, stated: "Open Engines is the final brick on our vision for the Universal Data Lakehouse, cementing the power and flexibility of open data architectures.
"Since emerging out of stealth in 2022, we've been steadfast in pursuing our vision to bring to life a data platform that inverts the current engine-centric thinking and puts data at the center. Built on open foundations like fast and efficient open lakehouse storage with Apache Hudi, format and catalog interoperability with Apache XTable (incubating), our managed service made it possible to ingest and transform open data for consumption by any downstream engine, with unmatched speed and efficiency from our runtime. Today, with Open Engines, we are making it seamless to bring open source compute engines directly to your data. There simply isn't a more complete or open data lakehouse solution out there."
According to the company, deploying Open Engines is designed to require only a few selections within the Onehouse user interface, after which the necessary compute infrastructure is provisioned and managed automatically.
Onehouse cites the convenience for organisations in deploying open engines for their lakehouses, emphasising that the engines benefit from performance enhancements of the Onehouse Compute Runtime and can connect to data tables managed either inside or outside the Onehouse environment.
Onehouse was established in 2021 by Vinoth Chandar, who previously built the first data lakehouse at Uber. Since then, the company has delivered its Universal Data Lakehouse platform as a managed, cloud-based service, and formed partnerships with companies including Amazon Web Services and Confluent.