When it comes to data scientists, it seems the more things change, the more they stay the same.
At the beginning of the year, we told you that there was a looming shortage of data scientists and things were only going to get worse.
Recently some three dozen data scientists from academia and business met in Chicago to “share ideas, discuss challenges, vent frustrations, and generally compare notes about their suddenly high-profile profession,” notes Ellis Booker (@ellisbooker), in the Information Week article, Meet The Elusive Data Scientist.
According to the data scientists at the meeting, there is still a shortage of trained workers.
“For every hundred job openings, there may just be a couple of applicants,” says Kirk Borne, professor of astrophysics and computational science at George Mason University, which has graduated 200 Ph.D.s in the past 20 years.
“70% of my value is an ability to pull the data, 20% of my value is using data-science methods and asking the right questions, and 10% of my value is knowing the tools,” says Catalin Ciobanu, a physicist who spent 10 years at Fermi National Accelerator Laboratory (Fermilab) and is now senior manager-BI at Carlson Wagonlit Travel.
Ciobanu believes that a data scientist’s most important skill is the ability to think through complex problems before turning to computer programs.
Scott Nicholson, a Ph.D. in econometrics, agrees.
“My definition of a data scientist is someone who uses data to solve problems, end to end, from asking the right questions to making insights actionable,” says Nicholson, who’s also the chief data scientist at Accretive Health, which works with hospitals to improve their performances and patient outcomes.
To be successful, a data scientist must also work closely with business leaders to properly use the data to solve their problems, he notes. Additionally, to conduct data analysis properly, data scientists have to overcome a number of organizational obstacles such as data sources that are siloed for any number of reasons.
Nicholson says that much of a data scientist’s work – 70% – is the “messy but necessary” business of extracting and cleaning data and getting it ready to use in a model. And while data scientists can pick up the engineering part of the equation, the top data scientists will also have an innate sense of curiosity, he adds.
“Their curiosity is broad, and extends well beyond their day-to-day activities. They are interested in understanding many different areas of the company, business, industry, and technology,” says DJ Patil, a Data Scientist in Residence at Greylock Partners.
“As a result, they are often able to bring disparate areas together in a novel way. For example, I’ve seen data scientists look at sales processes and realize that by using data in new ways they can make the sales team far more efficient,” he says. “I’ve seen data scientists apply novel DNA sequencing techniques to find patterns of fraud.”
Patil says to be successful, a data scientist needs to hone the following skills:
- Finding rich data sources
- Working with large volumes of data despite hardware, software, and bandwidth constraints
- Cleaning the data and making sure that data is consistent
- Melding multiple datasets together
- Visualizing the data
- Building rich tooling that enables others to work with data effectively
Companies looking to add more data scientists to their workforces should hire people with diverse backgrounds who have histories of playing with data to create something novel. The company should also take incredibly bright and creative people right out of college and put them through a very robust internship program, according to Patil.
“To build teams that create great data products, you have to find people with the skills and the curiosity to ask the big questions,” Patil notes.“You have to build cross-disciplinary groups with people who are comfortable creating together, who trust each other, and who are willing to help each other be amazing. It’s not easy, but if it were easy, it wouldn’t be as much fun.”
- Subscribe to our blog to stay up to date on the latest insights and trends in data analysis and the role of the data scientist.
- To hear how organizations that have adopted in-memory computing can analyze larger amounts of data in less time – and much faster – than their competitors, watch our on-demand webcast, “In-Memory Computing: Lifting the Burden of Big Data,” presented by Nathaniel Rowe, Research Analyst, Aberdeen Group and Michael O’Connell, PhD, Sr. Director, Analytics, TIBCO Spotfire.
- Download a copy of the Aberdeen In-Memory Big Data whitepaper here.
Spotfire Blogging Team