What is data integration?
Data integration is the process of bringing data from disparate sources together to provide users with a unified view. The premise of data integration is to make data more freely available and easier to consume and process by systems and users. Data integration done right can reduce IT costs, free-up resources, improve data quality, and foster innovation all without sweeping changes to existing applications or data structures. And though IT organizations have always had to integrate, the payoff for doing so has potentially never been as great as it is right now.
Companies with mature data integration capabilities have significant advantages over their competition, which includes:
- Increased operational efficiency by reducing the need to manually transform and combine data sets
- Better data quality through automated data transformations that apply business rules to data
- More valuable insight development through a holistic view of data that can be more easily analyzed
A digital business is built around data and the algorithms that process it, and it extracts maximum value from its information assets—from everywhere across the business ecosystem, at any time it is needed. Within a digital business, data and related services flow unimpeded, yet securely, across the IT landscape. Data integration enables a full view of all the information flowing through an organization and gets your data ready for analysis.
The evolution of data integration
The scope and importance of data integration has completely changed. Today, we augment business capabilities by leveraging standard SaaS applications, all while continuing to develop custom applications. With a rich ecosystem of partners ready to leverage an organization’s information, the information about an organization’s services that gets exposed to customers is now as important as the services themselves. Today, integrating SaaS, custom, and partner applications and the data contained within them, is a requirement. These days, an organization differentiates by combining business capabilities in a unique way. For example, many companies are analyzing data in-motion and at-rest, using their findings to create business rules, and then applying those rules to respond even faster to new data. Typical goals for this type of innovation are stickier user experiences and improved business operations.
How does data integration work?
One of the biggest challenges organizations face is trying to access and make sense of the data that describes the environment in which it operates. Every day, organizations capture more and more data, in a variety of formats, from a larger number of data sources. Organizations need a way for employees, users, and customers to capture value from that data. This means that organizations have to be able to bring relevant data together wherever it resides for the purposes of supporting organization reporting and business processes.
But, required data is often distributed across applications, databases, and other data sources hosted on-premises, in the cloud, on IoT devices, or provided via 3rd parties. Organizations no longer maintain data simply in one database, instead maintaining traditional master and transactional data, as well as new types of structured and unstructured data, across multiple sources. For instance, an organization could have data in a flat-file or it might want to access data from a web service.
The traditional approach of data integration is known as the physical data integration approach. And that involves the physical movement of data from its source system to a staging area where cleansing, mapping, and transformation takes place before the data is physically moved to a target system, for example, a data warehouse or a data mart. The other option is the data virtualization approach. This approach involves the use of a virtualization layer to connect to physical data stores. Unlike physical data integration, data virtualization involves the creation of virtualized views of the underlying physical environment without the need for the physical movement of data.
A common data integration technique is Extract Transform and Load (ETL) where data is physically extracted from multiple source systems, transformed into a different format, and loaded into a centralized data store.
Considerations for improving simple integration
The value gained from implementing data integration technology is, first and foremost, the cost of no longer having to manually integrate data. There are other benefits as well including the reduction from avoiding custom coding for the integration. Organizations whenever they can should look to use an integration tool provided by a vendor rather than write custom integration code. Reasons for doing this are a) improved data quality b) optimal performance c) time savings.
Organizations could derive much greater value by adding the following additional goals to their integration maturity roadmaps:
Streamline development
Choose a solution that lets you create a catalog of formats and sub-processes for reuse, especially non-functional processes such as logging, retries, etc. The ability to test any integration logic on-the-fly will also dramatically reduce the time needed for implementation and maintenance.
Configuration
Data integration processes are configured to connect applications and systems. These configurations need to reflect any change immediately, ensure the right systems are being used, and propagate changes across various environments (development, test, quality assurance, and production). Most organizations report that they are still changing configuration parameters manually within their integrated development environment (IDE), a costly human process that may also involve tampering with integration logic. The better alternative, accessing and managing the variables in scripts or deployment interfaces, allows fully automated deployments that reduce project duration.
Testing
Testing is at the core of data integration development. It verifies the data integration technology and target systems, so it should be performed immediately, as soon as the developer creates or updates logic. However, it’s clear that most organizations have to deploy processes before they can test, which causes delays. An IDE allowing immediate debugging dramatically shortens integration process development. Moreover, because certain data integration processes are so critical, they need to be tested in environments very much like the production environment, and updates to them need to be tested for non-regression. This testing requires test scenarios to be written. Many organizations have to develop this logic on top of the integration process logic, as well as the probes to capture results. This increases development duration and costs. Using an API to inject data and record test scenarios, or an integration testing solution, can dramatically reduce project duration.
Establish a common data model
In addition to limiting technologies, building a common data model eases future integrations because all integration processes will speak the same language. The business will also be helped because services and events involving business objects can be easily created, and subscribing to the right events provides increased business visibility.
Savings from leveraging past investments
Many legacy applications are still a vital part of business processes and hold important data that needs to be integrated with all the other systems in your environment. Though their core business functionalities provide great assets for reuse in other services, many of their components and capabilities have since been replaced by other applications. Data integration can help you infuse the data in your legacy systems into your more modern environments.
Typically, data integration is used as a prerequisite for more processing of the data, most notably analytics. You need to bring data together to facilitate analytical reporting and to give users a full, unified view of all of the information that is flowing through their organization. A true analogy of data integration is to create once and use many times. For instance, you don’t want to have to enter an order into one system manually. You want to enter it once and have one system pass it to another - that’s the main value of data integration.