Big data: keep it small, stupid.
That’s the advice of lead Forrester advanced analytics analyst James Kobielus (@jameskobielus), who says that as data scientists move deeper into big data territory, they have to be sure they don’t drown in too much useless information. If you’re a data scientist take heed: it’s easier to make sense out of all that data, if you keep your data sample small and manageable.
In the past, data scientists have had to be satisfied with analyzing “mere samples.” They haven’t been able to collect “petabytes” of data on “every relevant variable of every entity in the population under study.”
Thanks to the big data revolution these limitations no longer exist. Data scientists now have access to more comprehensive data sets, enabling them to more quickly determine the answers to business questions that require detailed, interactive, multidimensional statistical analysis.
Kobielus says to think of this new model as “whole-population analytics,” rather than just the ability to pivot, drill, and crunch into larger data sets.
“Over time, as the world evolves toward massively parallel approaches such as Hadoop, we will be able to do true 360-degree analysis,” he says.
For instance, as people around the world continue to engage in social networking and conduct more of their lives in public online forums, data scientists will have access to more comprehensive, current, and detailed market intelligence on every possible demographic.
But beware: big data can mean big trouble if you’re not careful about how you approach it.
For one thing, as your company’s analytics initiatives rapidly grow, you’re going to max out your IT budget on storage if you don’t keep the data as compact, compressed, and storage-efficient as possible, Kobielus says.
Not only that, but your users will be overwhelmed by the massive amounts of information they have to wade through if you don’t deliver the information they need to their tablets, smartphones, and other devices so they can act on it quickly.
So all you data scientists out there, listen to Kobielus and don’t give in to the temptation to throw more data at every analytic challenge. More often than not, you only need tiny, representative samples to find the most relevant patterns.
In fact, sometimes, you only need that one crucial observation or one piece of data to deliver the key insight. And quite often all you’ll need is gut feel, instinct, or intuition to solve some really difficult problem.
“New data may be redundant at best, or a distraction at worst, when you’re trying to collect your thoughts,” Kobielus says.
So it’s worth repeating—when it comes to big data: keep it small, stupid (no offense).