Вы находитесь на странице: 1из 5

11/16/2015

For data scientists, the big money is in open source - TechRepublic

BIG DATA

For data scientists, the big money is in open source


Data scientists that focus on open source technologies make more money than those
dealing in proprietary technologies.
By Matt Asay | in Big Data Analytics, January 21, 2014, 10:09 AM PST

Big data means big compensation for data scientists. But the kind of data scientist you are
largely determines just how big your paycheck will be. As a new
(http://www.oreilly.com/data/free/files/stratasurvey.pdf)O'Reilly survey
(http://www.oreilly.com/data/free/files/stratasurvey.pdf)reveals,

data scientists that focus on open

source technologies make more money than those still dealing in proprietary
technologies. The more open source software you know, the more money you stand to
make in big data.
()Big data, big money

Given the level of interest in Big Data, it's not surprising that enterprises are willing to pay
hefty salaries to recruit top talent, particularly given the difficulty in sourcing such talent. In
2012 (http://newvantage.com/wp-content/uploads/2012/12/NVP-Big-Data-Survey-ThemesTrends.pdf)NewVantage

Partners canvassed (http://newvantage.com/wp-

content/uploads/2012/12/NVP-Big-Data-Survey-Themes-Trends.pdf)a relatively small

but highly

qualified group of executives at large organizations and found that 100 percent of those
surveyed were at least "somewhat challenged" to recruit data scientists. A full 40 percent
are finding it "very difficult" or "impossible."
Against such scarcity, data scientists are priced at a premium.
According to (http://www.glassdoor.com/Salaries/data-scientist-salary-SRCH_KO0,14.htm)Glassdoor data
(http://www.glassdoor.com/Salaries/data-scientist-salary-SRCH_KO0,14.htm),

the median salary for data

scientists in the United States is $117,500. By contrast, a business analyst can expect to
(http://www.glassdoor.com/Salaries/business-analyst-salary-SRCH_KO0,16.htm)make
(http://www.glassdoor.com/Salaries/business-analyst-salary-SRCH_KO0,16.htm)and
(http://www.glassdoor.com/Salaries/data-analyst-salary-SRCH_KO0,12.htm)around
(http://www.glassdoor.com/Salaries/data-analyst-salary-SRCH_KO0,12.htm).

around $61,000

a data analyst

$55,000

Gartner analyst Svetlana

Sicular pokes fun at the whole category of data science, laughing that "a data scientist is 1)
a data analyst in California or 2) a statistician under 35."

http://www.techrepublic.com/blog/big-data-analytics/data-scientists-can-find-big-money-in-open-source/

1/9

11/16/2015

For data scientists, the big money is in open source - TechRepublic

That's a big price jump for polishing one's job title.


()Tools of the data science trade

In reality, there's more to being a data scientist than simply upgrading one's job title. As
O'Reilly's 2013 Data Science Salary Survey (http://www.oreilly.com/data/free/files/stratasurvey.pdf)
(http://www.oreilly.com/data/free/files/stratasurvey.pdf)
(http://www.oreilly.com/data/free/files/stratasurvey.pdf)suggests,

"the field of big data has ushered

in the arrival of new, complex tools that relatively few people understand or have even
heard of." Knowing those tools is what yields such outsized salaries.
But which tools a data scientist masters turns out to have a large, material impact on her
earning power.
The top data tool by far is SQL, which isn't surprising: data analysis has been around long
before we gave it a sexy "data science" label, and accessing data through SQL has long
been the standard for data analysis. This isn't changing overnight.

2013 Data Science Salary Survey


(Credit: O'Reilly)

But once we move beyond SQL, it's telling just how much of the most widely used Big Data
tools are open source: R, Python, Hadoop and more. More interestingly, however, is the
http://www.techrepublic.com/blog/big-data-analytics/data-scientists-can-find-big-money-in-open-source/

2/9

11/16/2015

For data scientists, the big money is in open source - TechRepublic

bifurcation between what O'Reilly calls "the Hadoop group"(orange) and the "SQL/Excel
group" (blue):

2013 Data Science Salary Survey


(Credit: O'Reilly)

http://www.techrepublic.com/blog/big-data-analytics/data-scientists-can-find-big-money-in-open-source/

3/9

11/16/2015

For data scientists, the big money is in open source - TechRepublic

Data scientists who use one group of tools don't use the other: the industry is roughly split
into the two camps, with the red group essentially forming a periphery around the Hadoop
group. As the O'Reilly report suggests, "The two clusters have no tools in common and are
quite distant in terms of correlation: only four positive correlations exist between the two
sets (mostly through Tableau), while there are a whopping 51 negative correlations."
()The money is in open

While somewhat interesting that data scientists split along party lines - Hadoop vs. SQL,
open vs. closed - the more interesting observation O'Reilly's report makes is just how
much this divide translates into salary differentials.
The more data tools a data scientist uses, the more her salary rises. Once a data scientist
uses at least 10 tools, her salary grows considerably:

2013 Data Science Salary Survey


(Credit: O'Reilly)

http://www.techrepublic.com/blog/big-data-analytics/data-scientists-can-find-big-money-in-open-source/

4/9

11/16/2015

For data scientists, the big money is in open source - TechRepublic

Interestingly, those in the open source/Hadoop cluster tend to use far more tools and,
hence, stand to make considerably more money. As the report authors point out, "Median
base salary generally rises with the number of tools used from the Hadoop cluster, from
$85k for those who do not use any such tools to $125k for those who use at least six." For
those in proprietary/SQL land, using five or more tools from the proprietary cluster leads
to a significant dropin salary.
While there are ways to explain away the apparent divergence in salaries, the authors
conclude:
"It seems very likely that knowing how to use tools such as R, Python, Hadoop frameworks,
D3, and scalable machine learning tools qualifies an analyst for more highly paid positions
more so than knowing SQL, Excel, and RDB platforms. We can also deduce that the
more tools an analyst knows, the better: if you are thinking of learning a tool from the
Hadoop cluster, it's better to learn several."
And then they make a highly telling point:
"The tools in the Hadoop cluster share a common feature: they all allow access to large
data sets and/or support analysis of large data sets. The demand for analysts who know
how to work with large data sets is growing, in particular for those who can perform more
advanced machine learning, graph and real-time tasks on large data sets. Until the supply
of such analysts catches up, their salaries will naturally be bid up."
In other words, the open source tools may be better suited for handling large data sets,
whereas the proprietary tools tend to have a narrower, query-based utility. Furthermore,
tools like Python and R give a user wide latitude to shape data analytics, rather than living
within the constraints that a proprietary vendor provides.
What this may mean is that taking the SQL/Excel route is a decent way to plod along with
old-school data analysis, but if you want to really go deep into data science, and get paid
handsomely for your efforts, you really need to go open with Hadoop, Python, NoSQL and
other leading open-source big data tools.

About Matt Asay


Matt Asay is a veteran technology columnist who has written for CNET, ReadWrite, and
other tech media. He is currently VP of Mobile at Adobe. Previous positions include VP of
business development and marketing at MongoDB and COO at Canonical, the Ubu...
http://www.techrepublic.com/blog/big-data-analytics/data-scientists-can-find-big-money-in-open-source/

5/9

Вам также может понравиться