What is Reference Data?
Reference data is a special subset of master data that is used for classification throughout your entire organization. It can be used in data management to define the characteristics of an identifier. Whether the data is externally mandated or internally authored, it’s unambiguous and non-negotiable. Reference data includes complex hierarchies, mappings, and more. Reference data may originate either internally (within an organization) or externally (gathered from other data sources). It may also be described in on-to-one relationships, one-to-many relationships, or hierarchies.
Organizations, such as the MDM Institute, describe two main varieties of reference data: multidomain reference data vs real-time reference data. Multi-domain reference data is non-industry specific and can span multiple functional areas (finance, risk, and compliance, human resources, etc.) and content types (ISO country codes and other non-volatile reference data) to be mastered and shared. Real-time reference data is typically used in the capital markets industry (brokers, asset managers, and securities services firms) as well as command and control military/intelligence markets. It’s also increasingly used in Internet of Things (IoT) applications requiring metadata tagging of streaming data sources into big data lakes.
According to the MDM Institute, reference data can be either public or private and connects different domains and applications across consistent values, and semantics, to create multi-domain views and hierarchies. For example, data that connects customers and products such as cost/revenue accounting information, sales personnel, business units, geographies, or industry data would all be included in the reference data.

What is the difference between reference data and master data?
It’s important to understand the difference between reference data and master data. Where master data represents key parts of the business, including customer data and data related to business activities and transactions, reference data represents a set of permissible data to be used from the master data for classification. Furthermore, changes in reference data often occur slowly over time as a direct reflection of any changes to business processes while changes in master data will occur within standard business processes. Although reference data is rarely changed, those slight changes over time must be managed and synchronized across the entire organization. This a challenge many businesses face and must combat with consistent reference data management and governance principles.
Examples of reference data
Here are some examples of reference data:
- Postal codes
- Transaction codes
- Cost centers
- Financial hierarchies
- State or country codes
- Currencies
- Organizational unit types
- Language codes
- Customer segments
- Tasks and business processes
Why is reference data important?
Because of the complex connections between domains and applications made up of reference data, managing that data can present some challenges. That is why reference data management (RDM) is so important to handle potential issues like governance, versioning control, and custom hierarchies. Reference data must be defined against business terms and set against rules created by the business, managed in hierarchies, shared for collaboration, and monitored for changes made by different users, data quality, and reporting.
Reference data is everywhere and must be managed so that systems across an organization operate in sync with interoperable, accurate data. Without this management, reference data will be siloed within an organization and typically defined and managed differently from application to application, losing accuracy and becoming cost inefficient.

Benefits of Managing and Distributing Reference Data
Variations in an organization’s reference data can cause serious data quality issues, causing downtime, inaccuracy, and poor decision making. But with a centralized way of managing its reference data, an organization can capitalize on the benefits listed below:
- Reduced IT costs: With one central place for storing and managing reference data, organizations can cut the associated storage costs of multiple systems. This will also speed up the time and money spent on modifying reference data across the enterprise.
- Agile updates: This will also help to reduce efforts involved in changing and updating reference data. In turn, reducing the time needed for data integration projects.
- Reduced risks: As mentioned, poor reference data means poor data quality, which can be risky in terms of making business decisions on inaccurate data and in terms of abiding by data regulations. By putting controls around reference data and auditing changes to it, organizations can greatly reduce the risk of incorrect or inconsistent data.
- Improved BI reporting: Reference data also informs business intelligence (BI) reporting. Better management of reference data ensures that an organization’s reports are accurate and trustworthy for business users to use in decision making processes.
Who has reference data?
Every company across every industry has reference data. Below are some industries in which reference data is extensive and successful management of it is therefore crucial:
- Banking and financial services: Secure identifiers are often used in the finance industry to make trades or complete transactions. These markers are a form of reference data and essential to the industry.
- Government/ public sector: Reference data is often used by governments and other regulatory bodies to track public records and maintain consistency across several organizations.
- Healthcare: Within the healthcare industry, reference data is critical for ensuring accurate associations between lab results and patient information, for example.
- Travel and hospitality: Airlines, hotels, rental cars, and other travel-related companies depend on reference data to organize large operations and maintain exceptional customer experiences.
- eCommerce: As more and more businesses move online and away from traditional brick-and-mortar business models, eCommerce operations and therefore reference data will continue to grow in importance. Transaction codes, customer information, and other data must be linked consistently for security and efficiency.