What is data migration?
Data migration is a process that moves data from one physical or virtual location to another, including the moving of data between different data sources and application suites as well as data formats. The process is also used by organizations to move data from a traditional data center to a cloud-based data store.
The need for data migration
Data migration can be used in multiple scenarios when an organization requires some form of change in the existing hosting of data. Generally, it is migrated when an organization needs to:
- Upgrade its existing infrastructure, such as hardware, application environment, or database management system
- Move from traditional storage to a cloud environment
- Change the format of its data
- Support analytics and reporting initiatives by building data warehouses and data lakes
- Consolidate multiple data sources, such as during company acquisitions and mergers
- Archive and delete data when it is no longer used by the business
- Reduce the number of systems that host data to achieve cost savings
This list is not exhaustive, but offers a general idea of the use cases of data migration.
The data migration project
The goal and results of each data migration project are unique to each organization; however, all data migration projects require a detailed project plan to ensure success. The project plan involves a series of activities and their owners to ensure fulfillment of due diligence, security, compliance, data quality, and other requirements that prevent data loss or breach during the process. It includes the project schedule that illustrates when these activities will occur and the dependencies between them. Because the goal of every data migration is to achieve the business or technical goals that demand the migration and to limit business disruption, a well-thought out plan and schedule are critical to minimize risk and unnecessary work.
1. Creating a migration plan
A data migration project team must create a project plan, complete with milestones and a go-live date for migrated data. Key elements of the project plan include:
- Objectives: A data migration is executed either to support business goals or technical objectives. Communicating those objectives clearly in the project plan ensures that all activities are aligned.
- Team and Responsibilities: A data migration project involves the efforts of many people who are often not co-located. A data migration project plan must clearly define the people on the team and the roles they play in the migration project, including those who have decision authority.
- Schedule & Milestones: The data migration team will execute a wide variety of tasks to complete the project that must be completed at defined times to meet the project deadline.
2. Catalog all source systems
A simple migration moves data from one source system to another. However, many migrations involve moving data hosted on multiple source systems, such as in the case of systems consolidation and business acquisitions and mergers. Further, many of the systems where source data is physically hosted and the owners of those data sources (who provide appropriate access) are often not known or understood. In other cases, such as in data consolidations or data archiving, the source data that will not be migrated must be completely understood and clearly communicated to ensure compliance with regulations and business rules. Regardless, the project plan must allow sufficient time for the discovery and cataloging of source systems that are in and out of scope of the migration.
3. Data preparation
A data migration project provides an excellent opportunity to improve the quality of data so that trust, and thus value to the organization, is increased. But before quality can be improved, it must be assessed. Low-quality data creates business disruptions; therefore, simply moving data to a new target system enables the disruptions to persist. The migration project must therefore audit source data and assess the quality levels. It must ascertain the consistency, accuracy, and completeness of data. Once complete, effort must be expended to improve data quality.
4. Prepare the target system
The target system(s) must be acquired, provisioned, validated, and ready to accept source data. Furthermore, a thorough understanding of the target system must be completed, including the data model, access rights required, and available storage to support the new data.
5. Mapping data to be migrated
The data models of the source and target data sources are often not identical, which will demand more than just a simple movement of data. Oftentimes, especially when two organizations merge, data fields on source and target systems are similar, but different. Other times, multiple source data fields may need to be merged or split into different target data fields. Regardless, effort must be placed during the planning phase to develop a mapping between source and appropriate target data fields when the structure of data needs to be changed.
6. Data transformation
As a business evolves, its business rules change. Yet these rules may not be reflected in current data. The challenge for the project team is to uncover the business rules that affect data. Typically, this information is scattered across a range of documentation throughout the organization or in the minds of data users. Thus, sufficient time must be provided to discover and catalog these rules and ascertain how and where they will be applied to transform source data.
The value of the knowledge gained during a data migration, including business rules for transforming data, extends beyond the data migration project. It can be used in other data initiatives such as data integration, data quality, metadata management, and master data management. Unfortunately, this knowledge is often discarded after the migration project ends. The project plan should therefore include provisions and systems to persist the knowledge it gains so that it can benefit the organization into the future and be easily adapted. Because data migration projects are typically ongoing throughout an organization, cataloging this knowledge will help reduce the effort required for future projects.
7. Data construction
The examination of source data will likely uncover data fields with no values. On the other hand, the data model of the target system may include fields that do not map to a corresponding or similar source field or fields. The resulting missing data may violate business rules or stall processes; therefore, the project plan must include activities that construct or create missing data to ensure data completeness on the target system.
8. Data movement
The physical movement of data between environments occurs either via external media (such as DVDs) or over a network. Regardless, it’s important to ensure the data is secured to prevent data leakages. In addition, small data migration projects may move data all at once, but larger projects may move data in stages. The project schedule must reflect this approach.
Moving data between systems will require the use of tools dedicated to data migration. Some projects simply use spreadsheets to move data. Selected business suites and database management systems may provide tools to support data migration. Third-party tools may be used; for example, an integration Platform as a Service (iPaaS) is a cloud-based solution for application and data integration that can migrate and synchronize data between any hosted environments, including diverse private and public cloud platforms or owned data centers. More traditionally, Extract, Transform, Load (ETL) tools extract data from source systems and place it into a staging area where it is transformed to conform to the established data mapping, transformation, and construction rules. It’s then loaded onto the target systems. Conversely, ELT tools move data from source to target systems and then perform a transformation. The choice of tool depends on the characteristics of the project.
Because data serves processes and decision making across the organization, readily available and trusted data is critical. Therefore, sufficient time and effort must be placed on validating transformed data, preferably before it is loaded to the target system, to ensure it conforms with the objectives of the migration project.
Validating data immediately before the go-live date increases the risk of not completing the migration on-time. Likely, many data errors will be uncovered during validation that need to be debugged and corrected. This process can take an unpredictable amount of time. The activities of mapping, transforming, constructing, and moving data are typically performed iteratively, not linearly. Continuous validation of data throughout the scope of the project therefore reduces the risk of a delayed go-live.
Upon successful completion of these steps, the process of data migration is considered concluded, and data is readily available for use by the organization in the new environment. If proper project planning and monitoring have been performed, the go-live of the new data should be an insignificant event for the business. In other words, business continuity must be maintained at the go-live.
Factors to consider when migrating data
The process of data migration is complex. Therefore, it is imperative that organizations consider a number of key factors before embarking on a data migration project. These are some of the key factors to help determine an ideal migration strategy, although every organization will have different systems and facets to consider.
Data migration is not just a technical exercise. It potentially involves the coordination of activities performed by many different people. Therefore, it requires disciplined project management skills, including accurate and timely communication, monitoring, reporting, and decision making.
Many decisions will be made during a migration project that impact one-or-more downstream activities. To ensure these decisions are carried out correctly, changes made to data should be traceable back to the business rule or decision that affected that change.
One thing must be understood in the project: Errors will occur throughout the process. It is imperative that a migration project track all discovered defects, assign owners to resolve them, and track the progress of defect removal to ensure the integrity of migrated data.
A key part of this plan is determining how long it will take to physically move the data. This includes accounting for the volume of data to be moved and the latency between the source and target systems. It is important to minimize the latency as much as possible to reduce any disruption.
Risks in data migration
Since the process of data migration involves several moving parts and vast amounts of data, various challenges exist that can hinder the process. Here are some of them and how they can affect the migration process.
Risk of business disruption
Interference is a major risk during the transfer process. Generally, organizations do not wish to stop operations during the transfer process so they can ensure that customers, employees, and partners do not face any downtime and all systems are up and running for the duration of the migration. Any changes made to the data during the process of migration may also become difficult to reflect in the migrated data. This creates discrepancies in the system and inaccurate data.
Risk of data loss or corruption
During the process of migration, organizations need to consider and minimize the risk of data loss or corruption. Data can be lost during the migration process for several reasons, including, but not limited to, incomplete or incorrect transfer, incompatibility of systems, and human error. Data loss can result in severe business losses and put the reputation of an entity at risk, affecting current as well as future prospects. Worst-case scenario, the lack of planning results in permanent data loss or heavy costs in recovery.
Risk of exposure
The risk of a data breach is a significant danger to the migration process. While data is being migrated, the system, as well as the data itself, is more vulnerable. Moving data requires passage through a medium. Threat actors can leverage the vulnerability of transfer and breach, steal, or tamper with data being transferred, resulting in a failed, incomplete, or corrupted migration.
Ways to minimize risks of data migration
Since the process of data migration is full of challenges, project teams need to create protocols and practices to ensure success. There are some ways to overcome these challenges of data migration.
The importance of planning can not be overstated. From initial research around the data types, structure, location and other relevant information to constant tracking of activities, defects, and processes, having a plan and tracking to that plan increases the odds of a successful migration.
Before beginning the migration process, an organization-wide planning exercise must take place to ensure that all required parties understand the objectives of the migration. This ensures that all decisions and activities are directly aligned with these objectives.
Understanding the data environments
Spending the necessary time to completely understand both the source and target data environments will pay large dividends. Doing so will help reduce defects discovered later in the migration project. The target environment in particular is alien territory prior to migration. Therefore, a detailed report must be prepared of it to study the effects and results of the migration.
Validation and testing
Testing does not prove the absence of defects, but uncovers them before new systems are put into production. The earlier defects in a project can be found, the less costly they are to remove. There must be a continuous process of testing throughout the migration process, especially if the organization is transforming large amounts of data.
Best practices for data migration
Data migration can be a complicated process, and an important way for the project team to ensure a successful migration is to follow certain best practices and guidelines. These tried-and-true best practices help organizations reduce the risk associated with migration while improving the efficiency of the entire process.
Here are some of the best practices an organization must consider.
Know your data
Before migrating data, it is important to understand what the data is being used for and what its purpose is upon migration. Further, an in-depth understanding of stakeholders is necessary. It is critical to understand who is using the system currently and how they are going to use it when the data is migrated.
Further, an analysis of the type of assets being migrated and their compatibility with the eventual ecosystem is imperative. This not only improves efficiency within the system but also optimizes data for future use.
Understanding the data environments
The target environment is of extreme importance. It is going to govern how data is going to react and interact with the organization and how it will help achieve the objective of the migration. The target environment is alien territory prior to migration. Therefore, a detailed report must be prepared of it to study the effects and results of the migration.
It is also critical to understand the compatibility of the entity with the new environment before migrating to it.
Impact analysis is the study of the possible effects of an event. When preparing a migration strategy, an organization must understand how the new environment will affect their operations and business. Migration can incur heavy costs; therefore, it is also important to run a cost-benefit analysis of the process. Essentially, companies must ensure that the migration is able to create value for the organization and fulfill a requirement.
This analysis is about ensuring that everyone on the migration team understands why the migration is being performed, and the expected benefits.
Data migration is crucial for organizations to get right
The process of data migration gives several opportunities to entities to improve the quality of data, the performance of existing systems, and access to datasets. Existing infrastructure, although sufficient, would eventually require an upgrade. Therefore, when companies migrate data, they essentially upgrade to a better environment, enabling better speeds within the system and improving the overall efficiency of the organization.
The migration process also allows an organization to take an in-depth look at all of its data. This allows organizations to identify discrepancies and duplicates within the system. This information can be used to eliminate any errors from the system and delete duplicate sets of data to reduce storage requirements. Further, this also allows organizations to clearly understand where their data is stored.
Finally, data migration is typically an ongoing organizational activity that supports a variety of business and technical objectives. Developing migration competence, and preserving organizational knowledge learned during these projects will help increase the odds that future projects complete successfully.