I have been working in a “Data Science” consulting team for 8 months now. Before that I got a Master’s degree in Computer Science with a machine learning specialization. So I could argue that I have seen both sides of the coin here and I have noticed some differences.
Disclaimer: You will have trouble finding two data science jobs that are the same in their requirements and tasks so this is a highly subjective discussion.
With that out of the way, let’s do some research and chat about what I think makes data science differ slightly from machine learning.
Table of Contents
- Definitions of data science and machine learning
- Data science is not a science
- Data science is mainly about data – duh
- Data science needs to be actionable, not highly accurate
Definitions of data science and machine learning
I have picked two definitions here. No matter what your opinion is, you can probably find definitions that support your viewpoint, but the following two highlight some of the points I have observed.
Data science combines math and statistics, specialized programming, advanced analytics, artificial intelligence (AI), and machine learning with specific subject matter expertise to uncover actionable insights hidden in an organization’s data.IBM at this website
Let’s look at machine learning for contrast
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that ‘learn’, that is, methods that leverage data to improve performance on some set of tasks.Wikipedia
Data science is not a science
Before I put “data science” into quotes one more time, let’s get this out the way. Data science is a really unfortunate term and I think it sends some wrong signals. In this field we use some scientific methods to achieve our goals of analyzing data. We are not performing science, we are at best applying science.
I would say it is similar to engineering, which also uses scientific principles to design build structures, but sadly the term “data engineer” is also taken by the people who design and build data warehouses and other data infrastructure.
For a lack of better term, I vote for “data detective” for our new title, but I’m open to suggestions here 😉
Machine learning on the other hand is a subfield of computer science, which also has differences to the natural sciences but shares aspects with mathematics. And at least the areas concerned with understanding and explaning the behaviour of models are closely related to research in mathematics.
But then again, there are people who call maths art (some degrees in mathematics are a Bachelors of Arts), so who knows. Maybe we are a “data artists” after all…
Data science is mainly about data – duh
Now this might be kind of obvious, but in my opinion the key aspect of data science is that it always starts with data.
Machine learning starts with a general class of problems that need an algorithm to solve efficiently, but often these methods will then be tested on a row of benchmark datasets to be deemed any good.
In data science we get one dataset or data source from a company or client and we can finetune every and anything we can get our hands on to this data because our only goal is to get the most insight out of this data.
Data science needs to be actionable, not highly accurate
This is especially true in my position in consulting – I can’t speak for my colleagues who work more product-centric. The results of our machine learning and statistics analysis needs to point to clear actions. Often data science projects are the basis for deciding on a new business strategy or adjusting the parameters of a product or sales effort. An example would be which clients to take on as a business or which form of advertising to choose.
When stakeholders are making decisions about people, they want to know how this decision was made and in a lot of cases they might even be required by law to know this information to make sure no discrimination is taking place due to a bias in the data or model.
This means that often the explainability or ability to deduce simple actions from the model is valued higher than the accuracy of the model, though of course accuracy is always desired.
Machine learning seems to be more focused on building the best and most robust method and product.