What is data quality?
Data quality describes the degree to which data fits the purpose it was intended for. Data is considered high quality when it accurately and consistently represents real-world scenarios.
To understand this, you have to look at data as being the foundation stone of a hierarchy that is built on it. Over the foundation of data, comes information, which is data placed in context. From actionable information comes knowledge, which develops into wisdom when it is applied. Bad quality data will result in bad information quality, and this moves up the hierarchy, resulting in bad business decisions.
When data fits into an intended purpose and represents real-world constructs, it is considered to be of high quality. However, this can be contradictory in nature as well.
Take for example, the master data record for a customer using a product from the company. The data in the master record may be enough to issue a bill to said customer, but a lack of accurate details on address and telephone number may not be ideal for the customer service department, and this can lead to a business problem.
Ideally, the master data record should serve multiple purposes. For this, a real-world alignment is necessary where data fits its intended purpose and can be used for other business objectives as well. This should be done without a disproportionate need of resources to collect data. In other words, a balance of the two aspects of data quality definition is required.
Human error tops the list as a reason for inaccuracy leading to bad quality data. Working on correcting low quality data is time consuming, takes Herculean effort, and needs an ideal mix of people, better processes, and technologies. Other reasons for inferior data quality include a lack of communication between departments and inadequate data strategies. Addressing these issues is dependent on a proactive management.
Importance of data quality
Across the hierarchy of a business, there is no doubt that good quality data works in favor of the bottom line. However, the questions remain as to who is responsible for ensuring that the quality of data remains consistently good and how these efforts will be funded in an organization. Quality of data needs to be tested on a minute level to understand its impact—positive or negative—on a business, and this can be quite difficult. The importance of getting protocols in place to ensure data quality can be seen in the following examples:
- For the marketing department of your business, the problem of duplicates in the data can lead to overspending of the marketing budget. For example, the same database may receive marketing materials with slight variations in a potential customer name. This can not only frustrate the customer but create entire double-ups of customer profiles.
- The online sales department may be pushing an agenda for the creation of a self-service buying decision. But with the lack of complete product data within existing databases and with the way product data is syndicated between product partners, data quality can be a difficult task to implement.
- For the supply chain part of your business, where you may be looking to automate processes, reliable location data is difficult to achieve because the same standards and precision with location data cannot apply to all locations being serviced.
- For departments that are dependent on financial reporting, the problem of receiving a wide range of answers to a single question happens because of inconsistent data, lack of updated data, or a lack of clear data parameters.
All of this comes together to create a drastically negative impact on the corporate aspects of a business and makes it difficult to meet business objectives. Most of these objectives are common across a wide range of businesses.
- Without good data quality, businesses:
- Will be unable to make use of new market opportunities. This can harm their profit margins and hamper their growth trajectory.
- Will not be able to bring in cost-reduction measures. The lack of good quality data requires that a lot of manual inspection and correction be done before it can be used. Automation of processes then becomes difficult without complete and consistent data.
- Will not be able to meet compliance requirements as it will be increasingly difficult without good quality data. The requirements cover aspects like privacy and data protection regulations as well as health and safety requirements. It also covers aspects of financial restrictions and guidelines. Having good quality data is essential to meeting compliance objectives.
- Will have difficulties making use of predictive analysis tools on corporate data assets. This can affect both short-term and long-term decisions, making things extremely difficult for a company in terms of progress. The challenges faced arise from issues like data duplication, incomplete data, inconsistency as well as inaccuracy of predictions.
The benefits of quality data
Organizations that are investing in creating quality data are able to leverage data to make better business decisions.
High-quality data facilitates better decision-making
The market today is naturally a consumer-centric one. With high quality data, businesses will be able to facilitate better decisions. For example, if an analysis of data shows that people have started increasingly spending more time outside—shopping and dining—on Thursdays than the usual Fridays, then businesses can choose to stay open longer or make available unique offerings to drive business to themselves.
Better team collaborations
When the many departments of an organization have constant access to the same data of high quality, the result is far better, more effective communication. This makes it easier for all team members to remain aligned in terms of priorities, the messaging that goes out, as well as the branding. This comes together to ensure better results.
Understanding the customer better
With good quality data, companies are able to better assess customer interests and requirements. This helps an organization grow by creating better products that are driven from customer needs. Campaigns created can then be driven based on consumer desires and direct feedback from data, not just educated guesses.
How do you assess data quality?
Given the fact that organizations stand to lose considerably if business processes are based on bad quality data, it becomes imperative that owners and managers understand how data quality can be assessed. This task includes setting up metrics and processes that assess data quality. Companies will need to work on making their data rank highly for both objective as well as subjective assessments. For businesses to improve data quality, they must:
- Assess deeply both objective and subjective data quality metrics
- Analyze results and ascertain the causes for any discrepancies
- Work on ways to improve
Subjective data assessments
With subjective assessments, organizations are measuring how stakeholders, analysts, collectors, and other parties perceive the quality of data. If any one of the stakeholders makes a decision based on the data they receive, but finds that it is inaccurate or incomplete, then their decision will be affected. This has to be taken into consideration when looking to find loopholes in the quality of data.
Objection data assessments
Objective data quality assessments look at measurable indications, which are recorded within a dataset and then evaluated from two perspectives:
- Its performance within a specific task
- From a viewpoint that it is a metrics-based dataset that can be used independently
To set these metrics for assessment of objective data, organizations can work on principles to develop key performance indicators (KPIs) that match their specific needs. These are known as functional forms. There are three ways in which functional forms are measured for quality:
- Simple ratio: Here the total number of desired outcomes is measured with the total possible outcomes. The range generally lies between 0 and 1, with 1 being the most preferred outcome. Both completeness as well as consistency can be measured with this ratio. The catch here is that both the two dimensions can be measured in several different ways and organizations will need to have set criteria in place for the best measures to come out.
- Minimum or maximum: Created to handle multiple data quality variables, this functional form has minimum as a conservative number and maximum at a more liberal number. The variables, like the accurate level of data, are marked by minimum. Aspects such as timeliness or/and accessibility are represented by maximum.
- Weighted average: Used as an alternative to minimum, these can be used when an organization is trying to investigate and understand the value that each variable brings to the equation.
Once an organization has evaluated all objective and subjective data quality metrics, they can move on to taking measures that will help streamline their processes. Taking time to look at processes and make objective decisions is a waste of time unless the actions taken are effective and consistently carried out.
How to improve data quality
For any organization, improving data quality is about the right mix of qualified people, intelligent processes and accurate technologies. All this, combined with a proactive top-level management can help improve data quality substantially.
Data quality dimensions
When working on improving the quality of data, the main task is to work on enhancing the range of data quality dimensions. The most addressed dimension is that of uniqueness of the master customer data. This database often suffers from duplication where two or more input rows may have the same data of an entity (being the customer). There are numerous ways to ensure that data is not duplicated—at entry point as well as with deduplication techniques of data already stored in databases.
In the case of product master data, the uniqueness dimension is not a large issue to contend with. Rather, it is that of completeness. The primary reason for lack of completeness is that different product categories will have varying requirements and not all of these are filled in. In many cases, the conformity of product data bears direct relation to locations, for example unit measures. While the USA measures length in inches, the rest of the world measures in centimetres.
Working on master data for location comes with the issue of a lack of consistent entry template. With so many varying formats used across the world, standardizing inputs can be extremely difficult.
Aspects to consider on intersections
At some point, location and customer domains are going to intersect and the dimension of precision is going to be hard to maintain. This is because different use cases utilize different precision dimensions for location.
To allow this intersection to happen successfully, it is important to understand customer desires, based on which relevant details of a product can be shared with them. This will help with the intersection of customer and product master data domains.
Six primary dimensions to base data quality standards
These standards can vary from one project to another, but the base should generally remain the same. High quality data will always have these six basic standards.
- Comprehensiveness: Look into what essential fields need to be filled in for a dataset to be considered complete. In the case of a customer database—name and address will be an absolute must, but depending on the product or service, gender may not be. Increasingly, removing Mr/Mrs/Ms is being raised, to not only be more inclusive to those who don’t fit in any one category, but because it’s simply not necessary to know.
- Consistency: Ensure that all iterations of any piece of data is the same across any report, analyses results, or spreadsheets that are being made and used. Look for inconsistencies as these can lead to bad quality data going forward. Good software should help remove or identify inconsistencies.
- Accuracy: Consistency is necessary to ensure a singular value across all channels. Accuracy, on the other hand, deals with those values being correct and reflective of the reality that the data represents. If this is a problem, use of robotics to enter data can remove human error.
- Format: Ensuring that data entry formats are consistent has to be the cornerstone of inputting data. Create a singular format and stick to it, for even the tiniest of details such as manufacturing year. American or English date format? All capitalization?
- Timeframe: The effectiveness of any data is how current and relevant it is at the time it is pulled up for use by the end-user. When data is current, available to decision-makers at the right time, and is the most up-to-date version of itself, timeliness can be assured.
- Integrity: This is a criteria that ascertains whether a dataset is in compliance with the rules and standards that are set by the organization. Missing values can disrupt the efficacy of data.
Ensuring that these dimensions are clearly adhered to will give organizations data sets that are accurate, high quality, and indispensable for quality decision-making.