Data Scientists Versus Statisticians

19/6/2019 Data Scientists Versus Statisticians – ODSC - Open Data Science – Medium
Data Scientists Versus Statisticians

ODSC - Open Data Science Follow
May 9 · 5 min read
Ever since the term “data scientist” came onto the tech scene, there’s
been a cross-generational debate raging, attempting to define and
distinguish newly branded data scientists and traditional statisticians. I
personally adopted the data scientist title around 2012, and I recall a
rather pithy definition float across the Twittersphere around this time:
“A data scientist is someone who is better at statistics than any software

engineer and better at software engineering than any statistician.”
In a more serious light, data science is often defined as the confluence

of three areas: computer science, mathematics/statistics, and specific
domain knowledge. Implicit in this definition is the focus on solving
specific problems, in contrast with the type of deep understanding that
is typical in academic statistics.
In this article, we’ll take yet another look at the data

scientist/statistician kerfuffle to see if we can find some common
ground and maybe even a common endpoint.
Data Science or Statistics?
It seems that the designation “data scientist” has taken the world by
storm. It’s a title that conjures up almost mystical abilities of a person
https://medium.com/@ODSC/data-scientists-versus-statisticians-8ea146b7a47f 1/6
garnering information from deep data lakes with ease. It comes from a
belief that a data scientist can wave his or her hand like a 21st century
Houdini and effortlessly extract insights from the data.
What’s intriguing about the field of data science is its perceived threat
to other disciplines, specifically statistics. I don’t see this threat as real
however as the two fields are quite distinct and complementary. In the
past decade, it’s clear that though the two fields can exist separately on
their own, each is weak without the other. Statisticians need to
understand the modeling and structure of data, while data scientists
need to understand applied statistics.
It’s no wonder that statisticians feel threatened by data scientists to a

certain degree. Statisticians deal with nebulous concepts like point
estimates, margins of error, confidence intervals, standard errors, p-
values, hypothesis testing, and the proverbial argument between the
“frequentists” and “Bayesians.” Statisticians can be viewed as confusing
to the general public and many times the statisticians can’t even agree
on what is correct.
Data scientists on the other hand, closely follow the “data science
process” that is more approachable; data ingest, data transformation,
exploratory data analysis, model selection, model evaluation, and data
storytelling. Sure, many of these steps follow statistical methods
behind the scene, but they’re sealed in a more engaging and
understandable wrapper. Many more people can embrace data science.
To be sure, there will always be a need for a solid foundation in

statistics. There are many cases where a data scientist would not have a
clue what to do with certain data sets without help from someone with
a background in statistics. At the same time, if a statistician was handed
a high-dimensionality data set with 5 billion rows and 10,000 variables,
they’d be hard pressed to set-up the data for analysis without consulting
a data scientist.
Ultimately, the two disciplines need to find some common ground. It

should be part of the curriculum of a statistics department program to
teach students how to work with real-world data. And those working in
data science need to have the appropriate training in statistics.
[Related article: What Will the Next Generation of Data Scientists Look
Like?]
Further Comparing and Contrasting
Although data scientists and statisticians tend to gather information for

similar purposes, their means of data collection are quite different. On
one hand, the amount of data for data scientists is often massive,
consequently, they spend a lot of time with tasks like large-scale data
ingest, data cleansing and transformation. Conversely, statisticians still
rely on more traditional and smaller scale methods of data collection,
such as surveys, polls, and experiments.
Typically data science problems are formulated using a modeling

process which focuses on the predictive accuracy of the model. Data
scientists do this by comparing the predictive accuracy of different
machine learning algorithms and selecting the model with the best
accuracy. Statisticians take a different approach to building and testing
their models. The starting point in statistics is usually a simple model,
such as linear regression, where the data is verified to determine
whether it is consistent with the assumptions of the model. The model
is improved by addressing assumptions in the model that are violated.
The modeling process is considered complete when all model
assumptions are verified and no assumptions are violated.
While data scientists focus on comparing a number of different

methods to create the best machine learning model, statisticians rather
work to improve a single, simple model to best fit the data.
Statisticians tend to focus more on quantifying uncertainty than data

scientists. As part of the statistical model-building process, it’s common
to quantify the connection between the outcome being predicted and
each predictor. Any uncertainty about this connection is also quantified.
This process is not as common with the tools used by data scientists,
namely machine learning.
The two fields also use somewhat different nomenclature to describe

the same principles. Data scientists speak of things like: “example”
whereas statisticians use “observation,” “feature” versus “predictor” or
“independent variable,” “label” versus “response” or “dependent
variable.”
[Related article: The Di erence Between Data Scientists and Data

Engineers]
Conclusion
In current terms, the fields of data science and statistics differ in a

number of ways. The fields differ in modeling processes, the size of
data consumed, the types of problems studied, the academic
background of the people in the field, and the terminology used. At the
same time, the fields are closely related in the sense that both data
science and statistics aim to extract knowledge from data.
Given time, the fields of data science and statistics likely will converge
to a common end-point. Statisticians have gone about gathering data
and performing analysis techniques like linear regressions for several
centuries. Eventually, as more statisticians pick up on skills like
implementing algorithms that learn from data, and provide predictions
and actions and more data scientists pick up on statistical science
(sampling, experimental design, confidence intervals, p-values, etc.)
the boundary between data scientists and statisticians will eventually
blur.
. . .
Original post here.
Read more data science articles on OpenDataScience.com, including

tutorials and guides from beginner to advanced levels! Subscribe to our
weekly newsletter here and receive the latest news every Thursday.

Data Scientists Versus Statisticians - ODSC - Open Data Science - Medium

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Scientists Versus Statisticians - ODSC - Open Data Science - Medium

Загружено:

Авторское право:

Доступные форматы

19/6/2019 Data Scientists Versus Statisticians – ODSC - Open Data Science – Medium

“A data scientist is someone who is better at statistics than any software

In a more serious light, data science is often defined as the confluence

In this article, we’ll take yet another look at the data

Data Science or Statistics?

It’s no wonder that statisticians feel threatened by data scientists to a

To be sure, there will always be a need for a solid foundation in

Ultimately, the two disciplines need to find some common ground. It

Further Comparing and Contrasting

Although data scientists and statisticians tend to gather information for

Typically data science problems are formulated using a modeling

While data scientists focus on comparing a number of different

Statisticians tend to focus more on quantifying uncertainty than data

The two fields also use somewhat different nomenclature to describe

[Related article: The Di erence Between Data Scientists and Data

In current terms, the fields of data science and statistics differ in a

Original post here.

Read more data science articles on OpenDataScience.com, including

Вам также может понравиться