When companies recruit data scientists and build out their analytics teams, it’s critical for decision makers to know upfront the types of research models that the data scientists they enlist tend to favor.
For instance, while some data scientists place a premium on the quality of the data being used, others are convinced that quantity is what ultimately delivers results.
A defensible case can be made for each approach, depending on the circumstances and what the organization’s goals are.
For example, quality should be a top consideration for projects where the integrity of customer data (contact, transaction history, etc.) is critical to the problem in question (e.g., How can we increase our email open rates?).
The quality of the data being used is also a critical consideration for those projects where the use of the right data sets trumps volumes of data.
For instance, an online retailer that’s looking to improve engagement (and conversion rates) among high-value customers would want to focus on specific data sets such as clickstream data and IP addresses for particular customers.
This retailer would not necessarily look to the universe of people who are engaging with its website to help determine product pages visited, length of visits, conversion rates, and products or topics that appear to resonate with different customer segments.
However, there are also situations where the quantity of data takes priority over data quality.
For instance, a regional manager for a national sportswear retailer discovers that a certain style and color (neon green) of one-piece swimsuits are selling faster than others in a particular area.
Here, applying a greater volume of data regarding purchasing habits for swimwear styles and colors across the different regions where the retailer operates can yield insights that could help the manager – and other executives – determine whether to expand the inventory of neon green one-piece swimsuits in a particular set of stores or across a larger geography.
Wikibon’s Jeff Kelly raises a valid point in the quality versus quantity debate.
“. . . Big data evangelists maintain that the sheer volume of data in big data scenarios mitigate the effects of occasional poor data quality,” Kelly notes in a blog post. “If you’re exploring petabytes of data to identify historical trends, a few data input errors will barely register as a blip on a dashboard or a report.”
When interviewing prospective data scientists and talking to references about their work histories and project accomplishments, it’s worth probing each candidate’s approach to problem solving to determine how and whether it meshes with the organization’s philosophy.
Various research finds that the world is only using a fraction of the data that’s available, suggesting that an enormous amount of potential insight is being squandered. No doubt that the ‘quality versus quantity’ of data debate will continue to draw advocates from both sides of the fence.
One thing that’s certain is that those organizations that have well-devised strategies for leveraging the right types of data using the right data analysis tools and align their efforts closely against the company’s goals position themselves to succeed.
- Subscribe to our blog to stay up to date on the latest insights and trends in big data and data analysis.