What is metadata management?
Metadata management is the business discipline of managing the metadata about data. It gives meaning to and describes the information assets in your organization. Metadata unlocks the value of your data by improving that data’s usability and findability. Metadata provides the context required to understand and govern your systems, your data, and your business. By using metadata management, it is easier to find and use data and provide the critical data context your business and IT teams require.
Metadata gives basic information about data—including file type, time of creation, size of file, author, and more. There are several distinct types of metadata—including descriptive metadata, structural metadata, administrative metadata, reference metadata, and statistical metadata that all provide unique information about your data. You can create metadata manually or automatically. Manually creating metadata allows for more detail, while automatic creation generally contains only very basic information. Generally, the more important the information asset, the more important it is to manage the metadata around it. This is because you want more information about how to use that valuable information asset. If the information asset is not that important, not a lot of metadata is needed.
A robust metadata management strategy assures that an organization’s data is high quality, consistent, and accurate across various systems. Organizations using a comprehensive metadata management strategy are more likely to make business decisions based on correct data than those who have no metadata management solution in place. It is an important component of any data governance initiative.
Why do organizations want to document and manage their metadata?
Most organizations have an information architecture that resembles an overpacked, completely unorganized bookstore. There is data everywhere. Most organizations’ data is not organized or catalogued, making it extremely challenging to find what they are looking for.
That’s the core problem—lack of data findability and thus lack of data usability. And that problem is only growing. Organizations can move from gigabytes to terabytes to petabytes in the span of 10 years. In an age where “data is the new oil,” successful organizations must be able to find and use all their data to gain a competitive advantage. The descriptive and search capabilities of metadata management are crucial to successfully finding and using that data.
Metadata management is also important because definitions can change depending on the information context . Look at how different groups might think about and define the term “customer,” for instance, if you talk to people in IT, Sales, or Compliance, they might have different perceptions or views about what customers represent and how that data is stored. For IT, data about customers might be focused on performing analytics reports and dashboards for the company and the very technical aspects of storing that data. If you ask IT to define where “customer” data is located, they may answer that “it’s in our enterprise data warehouse that we use for reporting, dating back to 2015. Except we also have customer data in the data lake that came from the new acquisition. That data is in the data lake and needs to be transformed before we do the reporting.” So for them, “customer” data might be very analytics-focused or contain an historical lookback.
Your Sales team may be more focused on the operations such as how they are using customer data in their sales today. For them, customer data may mean only active customers or the account level customer data (such as the name of the company), as opposed to all of the customers that the company ever had. Sales teams may refer to customer data as the company name rather than the people-level data. And Compliance may think of customer data at the people-level because their main use of the data is to comply with regulations, like GDPR.
As you can see, the challenge is not just with definitions, but it is the inconsistency of definitions across these different teams and processes. In addition, data is growing and growing. You need to be able to find your data to do your best analysis. In operations, you want to understand all the different applications and where they are getting their data from. For compliance, you want to make sure that the organization is adhering to rules, and for IT, they will mostly be concerned with producing analytics and keeping a historical record.
Metadata management allows you to give each part of your organization the metadata they need to understand and govern your systems, your data, your entire organization, and a consistent view of data throughout the organization. This is the only way an organization will be able to properly perform functions and ensure they are ultimately doing things right.
Metadata management use cases
Metadata management helps different personas in your organization answer their particular questions while also ensuring they are adhering to a consistent view of the data.
- Analytics: For more insightful analytics, users search, understand, and provision data using self-service data catalogs and governed workflows for their analytics programs. It can help you answer questions like “What is the best sales dataset for my analytics job?”
- Operations: To optimize operations, teams discover, harvest, and manage all enterprise metadata assets and data lineages to improve the quality of operations. It can help answer questions like “What systems are involved in fulfilling customer orders?”
- Compliance: To meet regulatory mandates (GDPR, CCPA, BCBS 239, etc.), you can support your compliance programs with data governance capabilities, including data privacy. Data governance teams can identify critical data elements, document definitions, and report on compliance. It can help answer the question “Where do we store and process personal information?”
Capabilities of a metadata management solution
Comprehensive metadata management software is a single solution that captures and manages all of your metadata in one place. Capabilities to look for in your solution should include:
- Metadata Management Service: Discover, harvest, and manage all your metadata in one place for elements like business definitions, glossaries, rules.
- Data Governance Solution: Document and support your governance policies and regulatory compliance efforts.
- Data Catalog: Catalog all the physical data about the underlying systems housing the critical data assets for search, collaboration, and provisioning access to them.
- Available as a Service: 100 percent SaaS and easily provisioned
A comprehensive solution will enable end-to-end metadata management. That includes:
- Discovery and Extraction: Automate metadata harvesting from your on-premises or Cloud systems
- Metadata Store: A single metadata store for all your business and technical metadata
- Classification and Lineage: Machine-learning driven classification of metadata assets to data elements and visual lineage.
- Governance and Security: Business glossary, data governance policies and regulatory compliance all in one platform.
- Search and Collaboration: Search across the entire data catalog. Collaborate with comments, ratings, and tags.
- Data Quality KPIs: Track key data quality indicators on all your metadata.
- Integration and Provisioning: Expose metadata as a service. Provision data access through the catalog.
How metadata management fosters collaboration
If an organization wants to achieve a certain level of data literacy, the different kinds of personas in your organization need to collaborate. Data literacy requires a team effort. It’s not something that individual teams can do on their own and hope to arrive at the same place. You need one solution to govern it all and to allow data stewards to interact with data users.
Let’s look at how different teams use metadata management. For example, Governance teams might be more concerned about definitions and regulatory compliance, but they need to be working hand-in-hand with the IT team. IT teams might be cataloging the physical systems that store the information, documenting the different controls and the security they’ve wrapped around that system, working with the different teams that manage the system to make sure they’ve all had training around privacy and compliance, but they need to interact with the Compliance team. Then, you have the analytics users that are consuming a lot of that data, and they want to ensure they are in accordance with the governance policies and following the protocols that the security and IT teams have established. There needs to be interaction among all of these teams at various times.
Bottom line: Look for a metadata management solution that has collaborative governance processes including: workflows, stewardship, version control, and audit trails.
Today, a lot of your metadata is separated between multiple applications and systems. This results in a lack of connection between silos of metadata. For instance, some companies will use some applications for ETL and store their metadata accordingly. They have other applications focused on data governance, and they store that data accordingly. They have other applications that store data catalog information and store that separately. All that information is connected and should be in one place to allow for better integration, better consistency, and better control through a comprehensive metadata management solution.
Metadata management and AI
A shift is happening in metadata management due to edge devices, IoT and AI. There is a greater need of using that metadata to mine for additional value from data.
Metadata’s influence on production environments (and productivity) will increasingly hinge on cataloging its various types, mapping, data modeling, machine learning, and edge computing. Those successful in operationalizing metadata in these areas will profit from metadata management.
Built-in artificial intelligence (AI) and machine learning (ML) algorithms facilitate metadata classification and data lineages (horizontal, vertical, regulatory). Deliver the data context, coherency, and control you need to achieve the highest efficiency, best performance, and smartest decision-making across all your teams and departments.