Just days after rival data lakehouse provider Snowflake said that it would open up the source code to its Polaris Catalog, Databricks is open sourcing its Unity Catalog offering.
Databricks’ Unity Catalog, which was made generally available in June 2022 and later updated with Okera’s capabilities, used to be a closed-sourced unified governance offering that provided centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces.
When Snowflake released Polaris Catalog at its annual conference earlier this month, it said it would open source it within three months. It offers similar capabilities to Unity Catalog, but is built atop the popular open source Apache Iceberg data table format.
“It is difficult to look at the Unity Catalog announcement without thinking about the consistent contest that exists between Databricks and Snowflake for enterprise attention,” said Hyoun Park, chief analyst at Amalgam Insights.
“By open sourcing Unity before Polaris, Databricks wants to position as being the first to open source its data catalog,” Park added.
Now Databricks says it has open-sourced Unity Catalog under the Apache 2.0 license and opened up all its APIs as well.
The Apache 2.0 license, introduced by the Apache Software Foundation in 2004, is a software license that allows users to modify and distribute code without any charge.
After being open sourced, the catalog will provide users with a universal interface that supports data in any format and compute environment, such as the ability to read tables with Delta Lake, Apache Iceberg, and Apache Hudi clients via Delta Lake UniForm, the company said.
The now open-sourced version also supports the Iceberg REST Catalog and Hive Metastore (HMS) interface standards, it added.
Additionally, Unity Catalog will continue to provide unified governance across AI assets, such as machine learning (ML) models and generative AI tools.
The move to open up Unity Catalog’s APIs, according to IDC’s research vice president Stewart Bond, provides open access to intelligence about data held within the Databricks environment.
“This is significant as it provides opportunities for an enterprise to include intelligence about data on Databricks to be integrated into and shared with catalogs that maintain intelligence about data stored elsewhere,” Bond said, adding that it is a way to support unification of data intelligence so that data consumers, engineers, and executives do not need to use multiple tools to discover, manage, and govern all data in a given enterprise.
This approach of supporting data unification, according to Steven Dickens, The Futurum Group’s practice lead for hybrid cloud, eliminates vendor lock-in, allowing businesses to choose the best tools and platforms for their needs while ensuring consistent governance and security across their data estate.
The open sourcing of Unity Catalog, that too at the heels of Snowflake’s decision to open source Polaris Catalog in three months, is being seen by analysts as a race to be seen as more open source and grab data catalog users.
Futurum’s Dickens said Databricks’ move to open source Unity Catalog represents a significant challenge for rivals such as Snowflake, Teradata, and Dremio.
“The emphasis on interoperability and open-source commitment ensures that Databricks can cater to a wider range of customer needs, reducing the friction associated with data format compatibility,” he said.
“Teradata and Dremio, while strong in their respective niches, have not demonstrated the same level of integration and comprehensive tooling for data and AI governance,” Dickens added.
However, IDC’s Bond pointed out that the success of the now open sourced Unity Catalog will depend on how much metadata about data stored in competitive platforms is being made available to external processes.
“Unity is still a very technical catalog. Making it open source may accelerate innovations in business-level user experiences and make Unity more competitive,” Bond said.
Anirban Ghoshal is a senior writer covering enterprise software for CIO.com and databases, cloud, and AI for InfoWorld.