At Slalom, we work with a lot of organizations that are migrating from on-premise operations to the cloud. And it’s tempting to think of migration projects strictly in terms of applications: You start with a legacy on-premise application, and you replace it with a fast, new, scalable cloud application.
That’s part of the work, but another important part involves paying attention to the data being used by those applications.
I recently wrote about the importance of multi-mastering data. That article is talks about synchronizing data across multiple applications using a centralized management tool like Boomi Master Data Hub.
But in order to master your data, you need to wrangle it.
Data Wrangling and Why It Matters
Data wrangling means transforming data from one format into another to make it more suitable for uses downstream, such as for analytics. We refer to developers who build these transformation processes as "data wranglers."
Why is data wrangling important?
We often find that two applications need to share data but use different formats for the same data fields. For example, a financial application and a sales CRM might use different data types for locations. When the sales department records a sale and triggers an invoice in the financial app, the two apps need to work together and ensure that location data is handled consistently.
In a situation like this, there are several ways to translate or “wrangle” data between applications:
Duplicate data fields
If there are only two applications involved, you can configure each application to store two data fields for each location, so that one field can be translated easily to the other. But this approach tends to work well only if two applications are involved.
If five or ten applications are using different data types for the same data field, storing, updating, and retrieving all these data translations in application logic becomes slow and unwieldy.
A table in a SQL database
For a more scalable approach to data wrangling, set up a table storing all the relationships in a centralized SQL database. A SQL database can perform translations among multiple applications far more easily than having applications store multiple values for data fields themselves.
When applications need to translate fields, they simply query the database, get a response and use the translated value in the response to continue with their work.
For high volume cross-reference checks as part of integration or automation processes, you can use a Boomi database connector to connect applications to the central SQL database, where cross-reference checks can be performed at high volume.
We find that data wrangling with a central SQL database works best with clients who have strong in-house SQL programming skills. The downside? For some organizations, it’s going to be performance. If you don’t have a lot of in-house SQL expertise, you’re probably going to have a hard time setting up caching and tuning the database to get the performance you need at scale.
Boomi Crossref tables
Another approach to data wrangling, which we’re implementing for a client now, is to leverage Boomi Crossref tables.
The Boomi unified platform includes a capability called Crossref, which lets you store data relationships and perform lookups very quickly. You can set up a Crossref without any programming, and it performs faster than a database lookup would.
In your integration mappings, you have the ability to cache those lookups so that you get the benefit of having a centralized lookup without taking the huge performance hit that sometimes comes from working with SQL.
For example, using our location data scenario, you could look up a country code that comes as part of an integration, get the key for the country code, get its ID, and then cache that relationship. The next time the same ID comes through an integration being managed in Boomi, you’ve already got its relationship.
A central data hub
To handle data transformations and complex business logic for your most important data types, I recommend using a governance tool like the Boomi Hub.
Boomi Hub can store data relationships, handle complex business logic, and enforce master data policies as needed for data types and data fields. Because it’s centralized and part of the Boomi unified platform, it’s easy to access while mapping integrations and data transformations.
Boomi Hub is especially useful for clients who need to automatically detect changes in data sets and enforce survivorship rules. (In data management, survivorship means systematically dealing with cases where data is inconsistent or overlapping.) Hub is a great choice for data wrangling when organizations want to guarantee data quality for key data types as part of data integration and application modernization.
Boomi Hub is also a great choice for scenarios where having clean, accurate data is critical to the business. Data sources aren't always reliable (think of any task involving manual data entry, or data that resides in two different systems). With Hub, you can define data quality rules and data enrichment services (e.g. external address validation services) to trap this bad data before it propagates to other systems.
Choose the Approach That Works Best for You
As you can see from this list, you’ve got a variety of choices when it comes to wrangling data. Choose the approach that meets your needs for ease of use and scalability. By wrangling your data, you'll ensure the applications you’re integrating can translate data from other applications as needed. And that means you'll get the most out of your data — always a good thing.
About the AuthorFollow on Linkedin Visit Website More Content by Shane Fisher