What is Reference Data Management?

Reference data management is the process of managing classifications and hierarchies across systems and business lines. This may include performing analytics on reference data, tracking changes to reference data, distributing reference data, and more. For effective reference data management, companies must set policies, frameworks, and standards to govern and manage both internal and external reference data.

Reference Data Management Diagram

After coming to widespread prominence in 2012, Reference Data Management (RDM) has become a key element in Master Data Management (MDM). RDM provides the processes and technologies for recognizing, harmonizing and sharing coded, relatively static data sets for “reference” by multiple constituencies (people, systems, and other master data domains). Such a system provides governance, process, security, and audit control around the mastering of reference data. In addition, RDM systems also manage complex mappings between different reference data representations and different data domains across the enterprise. Most contemporary RDM systems also provide connectivity, typically a service-oriented architecture (SOA) service layer (a.k.a. microservices), for sharing of reference data with enterprise applications, analytical/data science, and governance applications.

Why is reference data management so important?

Prior to the availability of commercial RDM solutions, organizations built custom solutions using existing software such as RDBMS, spreadsheets, workflow software (business process management or BPM) and other tools. Such systems often lacked change management, audit controls, and granular security/permissions. As a result, these legacy solutions have increasingly become compliance risks. Because reference data is used to drive key business processes and application logic, errors in reference data can have a major negative and multiplicative business impact. Mismatches in reference data: (1) impact data quality; (2) affect the integrity of BI reports; and, (3) also are a common source of application integration failure. Just as businesses no longer build their own CRM, ERP, and MDM systems, so too are organizations beginning to acquire commercial RDM or RDG solutions, which can be easily tailored or configured and have the full ongoing support of a major software vendor.

What are the benefits of reference data management?

One benefit of reference data management is that by centralizing control you can ensure that consistency and compliance are maintained. It helps business teams access, distribute, and update reference data across multiple systems in a consistent, governed way to meet business needs. Effective reference data management can allow a business to scale up its operations and analytics processes. It can provide for the ability to react quickly to new data requirements or market changes without restructuring the entire enterprise’s data.

Reference data management can bring consistency to your data. By managing every version of reference data and connecting them through correspondence tables, businesses can achieve semantic consistency across time and between different standards. Without this consistency, organizations would suffer from poor data quality and small errors that could become costly errors in the long run.

Reference Data Management Evaluation Criteria

  1. Ability to Map Reference Data: In addition to the canonical reference data sets (country codes, currencies, languages, etc), an RDM hub must be able to manage application-specific, industry-specific, use-case specific, new versions of, and local adaptations (e.g., foreign language versions). Also, relationships between reference data sets and all those permutations need to be managed.
  2. Administration of Reference Data Types: One of the common problems with homegrown reference data solutions is that a single data model cannot easily represent the many different types of reference data. The data model needs to be extended to support new reference data sets, and new properties specific to the varied types of reference data being managed.
  3. Management and User Experience of Reference Data Sets: RDM solutions should be designed with the business user in mind. By providing intuitive UIs and a flexible data model, an enterprise can quickly install, configure and import reference data with minimal need for ongoing IT involvement.
  4. Architecture/Performance: Because of the highly-related nature of reference data, semantic mode is useful for managing the relationships between reference data sets and across time. Clearly, the need to document reference data, and their complex connections to other domains requires the platform to have robust data/semantic modeling.
  5. Hierarchy Management Over Sets of Reference Data: Reference code tables can be either flat lists or hierarchies. Hierarchical structure is a key aspect of reference data that needs to be managed in addition to the values and mapping relationships.
  6. Connectivity: It is vital that an RDM solution provide multiple, flexible means of connection to provide maximum "accessibility". Reference data must be made easily available to downstream application systems, remote subscribers, etc. Further, each consumer of RDM data must be able to access the data in a means and format that is most convenient to them.
  7. Import and Export: An RDM solution should enable the import and export of reference data in multiple formats. For example, for inbound and outbound mappings from/to data definitions, sources and destinations such as flat files or databases as well as CSV and XML formats.
  8. Versioning Support: RDM solutions should also support versioning of reference data sets and related mappings. Such versioning is used in conjunction with lifecycle management to manage changes to the reference data sets and mappings over time.
  9. Security and Access Control: Modern RDM solutions provide robust role-based security. For example, CRUD access to a particular entity should be controlled by the user’s role, the group that the user is a member of, and related ownership of the entity, plus the lifecycle state of the entity itself.
  10. End-to-end Lifecycle Management: RDM solutions should employ governance UI and workflow processes to provide support for formal governance of reference data, putting end-to-end (E2E) lifecycle management of enterprise reference data into the hands of business users—reducing the burden on IT and improving the overall quality of data used across the organization.