What is a Data Scientist?
A data scientist is a profession that requires a range of technical and domain-based skills to manage and analyze data to solve business problems. They are part mathematician, part business analyst, and part computer scientist. A good data scientist can see trends and patterns in data, and knows how to use data to produce helpful, actionable outcomes. Data scientists are at the forefront of modern businesses, transforming the way we work.
History of Data Scientists
In 2001, a computer scientist, William S. Cleveland, wrote an article ‘Data Science: An Action Plan for Expanding the Technical Area of Statistics’. This article introduced data science as a discipline of applied statisticians. That was only 20 years ago, and the technology and business world has moved quickly since then.
Because this is a relatively new career path, current data scientists come from a range of backgrounds and specialities. Many start their careers as statisticians, mathematicians, or data analysts. But as access to computers, artificial intelligence (AI) and data learning tools have become common, the role has evolved. A data scientist is no longer confined to the IT department; they have become an integral part of the overall business. Because of its expansion and pivotal influence on the business, the role of the data scientist requires a person with logical, innovative thinking to be able to translate data insights into business strategy.
What Qualifications Does a Data Scientist Need?
In the last ten years, higher learning institutions have developed specific courses for data scientists. Those wishing to work in the industry can get a Bachelor’s or Master’s in Data Science from a large number of universities.
The courses that data scientists take typically cover statistical modelling, data management, data visualization, machine learning, software engineering, data ethics, research design, and user experience. They may learn SQL, Python, Perl, and a range of other programming languages such as R. They will become familiar with Hadoop, Pig, Spark, Hive, and MapReduce.
However, with more open-source software becoming available and more commercialized data science tools, what people learn today might soon become obsolete. Therefore, data scientists need to be agile and continue to learn new skills and techniques within the industry.
Data Scientists Need More than Just a Degree
An excellent data scientist should be curious, always seeking new information and new ways of thinking about business challenges. A strong sense of intuition and the habit of needing proof are also excellent traits for a data scientist to have. They need to be creative enough to find answers where there were none, continuously looking for insights and outcomes.
Data scientists also need a deep sense of domain business knowledge. Knowing data and programming is one thing, having the insight to create business strategy from these insights is another. They should be able to see risks and opportunities to the business, and be able to use data to provide strategies for business growth. It is one thing to know people buy more in a certain type of weather, but how can a business take advantage of this type of information? The role of the data scientist is to figure out and answer questions like this—that continuously push the business to new heights.
A great data scientist also needs excellent communication skills. To be able to report back to stakeholders and managers and clearly explain the results of the analyses. To be able to explain where the data was incomplete, and what is needed to resolve it. To convince and persuade on the best course of action based on those results. New programs and techniques will change, but being able to think critically and having good quantitative, domain-specific skills will always be in demand.
What Does a Data Scientist Do?
A data scientist takes data, develops hypotheses and inferences, and then uses machine learning to detect patterns, relationships, and trends within that data. On any given day, they may be:
- Analyzing data sets
- Cleaning data
- Building dashboards and reports
- Visualising data
- Making statistical inferences
- Developing statistical learning models
- Creating complex predictive models
- Using statistical tools
- Communicating results of analysis to stakeholders
- Convincing decision makers
Large retail companies can produce up to 40 petabytes of data each day. Their data scientists use that data to predict a range of outcomes, including when and where people buy certain items. This allows them to plan events and sales for maximum sales, pricing things so they still make maximum profit, but also moving the biggest amount of stock.
Data scientists typically work in teams to mine big data for relevant information. They also can advise management on what type of data should be collected, how it should be analysed, and the outcomes of that interpretation. A 2017 study showed that 80 percent of a data scientist’s time is spent on data management. Finding it, cleaning and organizing it. This leaves only 20 percent of their working time to actually performing analysis. However, even this is changing, with the advent of automated machine learning and deep learning, data scientists are finding they have more time for analysis as these tools have become more automated and taken over much of the data cleansing and organization, leaving data scientists more time for analysis.
Why is the Role of Data Scientists So Important?
For a business, a data scientist is invaluable. They take millions, even billions of data points and turn that into crucial information to make predictions about an organization that might either save or grow a business. Some examples of data scientists by industry include:
Data scientists are a crucial part of marketing. For instance, a data scientist can produce a set of triggers that alert the business that their customers are at high risk of churn. . In marketing, it is well known that the cost of finding a new customer hugely outweighs the cost of retaining an existing one. The triggers set up by the data scientist allow a company to step in and make changes or speak to the customer to retain them.
This is a huge field with massive opportunities for data scientists. From managing rosters and staffing at optimal levels to identifying which patients are at a high risk of not complying with their doctor’s orders, a data scientist can find thousands of opportunities to improve business practices and health outcomes.
The insurance and banking industries are saving billions of dollars each year by using data scientists to identify fraud risks. For instance, when a customer applies for a loan, a number of data points are collected about them. This information is processed and compared to known information about previous fraud cases. Almost immediately, the system can advise if this person is a risk.
How to Become a Data Scientist
If you have a logical brain, can wrangle numbers, enjoy working with computers, and have a keen understanding of business, then a role as a data scientist may be your dream job.
The first step is getting a bachelor’s degree in computer science, statistics, or a related field. This degree will equip you with skills in:
- Math, in particular statistics
- Databases, data lakes and distributed storage
- Data cleansing techniques
- Data visualization and reporting skills
A bachelor’s degree gives you entry level knowledge, but as the field grows more qualifications or specialisations will be required. Consider a master’s degree in data or related field, and start drilling down into a specific business domain that interests you.
Once the qualifications are obtained, getting experience in the field of interest is the next step. Healthcare, marketing, government, or business all offer excellent prospects for specialization. While the skills of data scientists can be taught, understanding the relationships between the data and the real life implications takes experience and time spent in the business.
Challenges Facing Data Scientists
Due in part to the fact it is a new industry, data scientists are facing some challenges. It is a male dominated industry, and like many STEM (science, technology, engineering, and mathematics) careers, women sometimes face extra hurdles to enter and maintain their careers. In 2019, only 18 percent of data scientists were women. This lack of diversity is beginning to create problems in the field. Algorithms are created by humans and are susceptible to bias. For instance, in the banking industry, being a solo woman will be a negative against them in getting a loan. However, data shows that women are better than men in paying off loans. Banks may be missing out on their best customers, and women missing out on financial security and independence. Having a diverse workforce will help to combat these errors and bias.
Because it is a new industry, it is also struggling with a consistent vocabulary and standards of practice. It is expected that standards need to come as a consensus between stakeholders, data scientists themselves, and legislators, but none have been formed as of yet.
There is also a push for explainable AI that can be interpreted easily. Thought leaders feel that predictions shouldn’t just be a figure drawn from thin air, but rather you should be able to trace and easily explain the logic behind the machine learning models.
Current Outlook for Data Scientists
Like most STEM careers, data scientists are highly sought after and valued employees. Currently, there is a shortage of qualified data scientists with appropriate analytical skills. With higher than average salary expectations, a rapidly growing market and increased understanding of their value, the employment options for data scientists are excellent. In 2018, there was a shortage of 151,000 data scientists, making it a secure and growing field.
In particular, people from under-represented groups are being encouraged to enter the field. With some universities offering incentives for these under-represented groups to join data science programs and companies increasingly recognizing that diversity is needed for un-biased outcomes, it’s an attractive and stable employment option.