Get More Value From Your Data With Boomi Data Catalog and Preparation

June 3, 2020 Sean Keenan

Earlier this year, Boomi acquired Unifi, a company with a comprehensive suite of self-service data discovery and preparation tools that has consistently earned a high ranking among analysts in this space.

The tools empower business users to break down the barriers of operational data silos and make their information more accessible across the enterprise. Now, these capabilities have been incorporated into the Boomi Platform as Boomi Data Catalog and Preparation (DCP).

This blog will cover three aspects of Boomi Data Catalog and Preparation:

  1. Why DCP's modular, microservices architecture is an advantage for our customers
  2. What back-end services are common to the platform
  3. How DCP compares to legacy approaches for cataloging and preparing data

Read our blog post "The Path to Insight: Making Data Clean, Usable, and Accessible" to learn more about the need for data cataloging and data preparation.

It All Starts With Architecture

Product offerings from legacy players in the data management space have a monolithic architecture. All the product’s capabilities exist in one large application. In contrast, modern software companies build their application architectures as a series of loosely coupled microservices grouped under several main functions.

Why is this important? Well, it’s important because a modular, microservices architecture offers a lot of speed and flexibility from a development perspective.

It means you can develop or change features more quickly than you could with a monolithic or legacy architecture. In a monolithic architecture, services are tightly coupled, so adding or changing a feature can have a serious and sometimes unpredictable impact across the entire application. Rather than an opportunity cost, it’s a “complexity cost.”

In a microservices architecture, changes can be made to an individual service without affecting its “contract” with other services. This flexibility also applies to deployment. Individual services can be scaled up or down to meet changing demands, such as concurrent users and data processing loads.

This microservices architecture also allows Boomi Data Catalog and Preparation to take advantage of the latest data processing and analytics frameworks such as Apache Spark and cloud database as a service (DBaaS) offerings such as Google BigQuery and Snowflake. Boomi can write natively to these frameworks rather than relying on the connectors legacy applications require that put certain functionality out of reach. These frameworks, in turn, support a range of services and automation that traditional data catalog and processing solutions can’t match.

We look at DCP as a “harness” that can be configured to sit on top of any data processing platform.

Here's a high level product architecture overview:

DCP Architecture Delivers Sophisticated Back-End Services

Boomi Data Catalog and Preparation back-end services fall into four categories. Our Intelligent Agent (IAgent) service is responsible for the bulk of machine learning and automation that occurs within DCP. Features such as the Knowledge Graph, natural language processing, and autocomplete originate via the IAgent service. The service is written in Scala, with the Knowledge Graph supported by the graph database JanusGraph.

The Executor service is responsible for data profiling, data preview, and job execution. It uses a combination of Scala, Java, and Spark. The Discovery service detects file formats such as .csv, JSON, and XML and data types. It also provides data sampling. The Access service is an extensible framework that allows for easy adapter creation using Java.

These services offer best-in-class user features such as:

  • Robust relevance-based search of all enterprise metadata including data sets, jobs, business glossary, workflows, schedules, and more.
  • A recommendation engine that enables all classes of users to prepare and enrich data.
  • Knowledge sharing about data across the enterprise among users. Rate and comment on data. Request access to a data set by collaborating with a data steward.
  • Natural language queries that allow users to ask questions of their data without any technical knowledge. Comprehensive features for IT to maintain control and security of the data and understand its lineage.
  • The capacity to build repetitive, predictable data pipelines to operationalize workflows. On-premises, in the cloud, or in a hybrid deployment, Boomi is at home anywhere, leveraging the elastic scalability of these environments when workloads require additional compute performance.

Comparing DCP to Legacy Data Management Solutions

The biggest issue that Boomi Data Catalog and Preparation solves in comparison to legacy solutions is one of integration. Legacy solutions are typically a dis-integrated group of bespoke, on-premises application suites. For example, if an organization were to buy all the data management products from a legacy player, they'd discover that very few of those products talk to each other.

You can deploy these products independently to solve a single or a series of individual problems. But when they solve that problem, they don’t share how the problem was solved with any of the downstream applications that are solving another problem in the data pipeline. So, there’s no reusability across the application suites, which means a lot of rework and duplication of effort.

Often this approach is exacerbated by mergers and acquisitions. You'll find different architectures across the various product stacks, which makes integrating them even harder. If I’m the customer, and there’s no consistent methodology to do that, it becomes a painful and expensive problem. Some application vendors will try to retrofit certain products to solve that problem, but it ends up being the equivalent of "cloud washing" an on-premises solution to compete with native cloud applications.

There are four critical areas where the Boomi Platform and Boomi Data Catalog and Preparation make life easier for IT than a collection of individual data management solutions:

  1. Architectural flexibility: A microservices architecture simplifies deployment and supports the latest data processing frameworks.
  2. Total cost of ownership: With DCP, fewer resources and less infrastructure are required.
  3. Security and authorizations: Rules and permissions populate across the entire product and its component services.
  4. Scalability: Customers can scale up or down as needed. DCP is compatible with all major cloud service providers.

It’s common today to proclaim a software product is AI-enabled or AI-assisted. Some of these claims are pretty flimsy. But genuine AI practices are pervasive in Boomi Data Catalog and Preparation, which can’t be said about a cobbled together set of data management applications from the legacy era.

Boomi Data Catalog and Preparation leverage machine learning frameworks like neural networks and natural language processing applied to data management. By blending usability and advanced functionality, Boomi DCP creates a “cooperative AI,” where the decisions of people using the system make the system smarter — and improve business outcomes.

For more information on Boomi Data Catalog and Preparation, watch our recorded webinar "Discover, Understand, and Integrate Your Data for Better Outcomes."

About the Author

Sean Keenan is Boomi's vice president of product, Data Catalog and Preparation.

Follow on Linkedin More Content by Sean Keenan

No Previous Articles

Next Article
The Path to Insight: Making Data Clean, Usable, and Accessible
The Path to Insight: Making Data Clean, Usable, and Accessible

The CIOs referenced here have different backgrounds and experience, but they all agree that without solid d...