Big data benefits an array of industries: science, medicine, finance, education, and others. Volumes of information, some of it being collected in real time, can be sorted and correlated to save a company time, money, and resources.
It would seem that this brave new world would be filled with positives, but there are also negatives. All of this data can also lead to false or misleading information that can cause an analyst to go astray.
Garbage In, Garbage Out
Consider this statement often attributed to computer programmers: When data entering a program is faulty, the results will be faulty. The main sources for big data include social and digital media, online transactions, and sensors; this information can be full of personal opinion, bias, or spin doctoring. In addition to volume, variety, and velocity, it is important to remember these three Vs: value, viability, and validity.
Considering the source of data can assist analysts in determining how to use the information that’s provided. For example, social media may be a good indicator of possible trends or issues that consumers have with your product. However, it may not be a good source for determining whether your company should revamp its production line or change the direction of the business.
Numbers can lie and sometimes people are all too willing to believe these lies. You can twist information around to help support almost any argument; when you have large amounts of data, it is even easier to correlate results the way you see fit.
You reduce the possibility of twisting facts to fit your current point of view by continuing your analysis even after you find information that supports your hypothesis, and by using several different sources to verify your findings.
Science Is Human
There is no such thing as pure science or analysis. Humans use both scientific and analytic methods, and humans are prone to bias. Just because your analytics are based on mathematical methods, don’t assume there are no possible errors. It is people who create these models, humans who formulate the hypothesis for these models, and information created by individuals that went into the analysis.
With big data the possibility for error increases because technology hasn’t been able to keep pace with the vast amounts of data available. Programs used to analyze smaller databases may not be adequate for large data analysis. An analyst is only as good as their data and the supporting technology, so be sure you know what software your company is using for interpreting the huge datasets available on the Internet.
Companies should take advantage of everything big data has to offer. As an analyst, you will be asked to find the best way to leverage this opportunity for your business and its stakeholders. Ensure you use this information wisely and avoid the pitfalls of cherry picking , false discoveries, and discriminatory inference.
Try Spotfire today and confidently start analyzing your big data.