Вы находитесь на странице: 1из 2

Implementing data science projects – the five essential skill sets (View Blog)

 Posted by Archisman Majumdar on March 17, 2016 at 11:30pm

The future of business, it is argued, is digital. At the core of this digital transformation is
the ability to harness data in enabling better business decisions. Typically, organizations
have teams of experts who work on existing data sets to apply diverse analytic tools and
techniques to make sense of the data. The more statistically advanced among these
teams work on typical ‘data science’ problems. Data science problems are where you
need to apply sophisticated algorithms on large data sets to derive business relevant
insights. These either involve the application of advanced statistical techniques (like
support vector machines, artificial neural networks etc.) or the handling of very large data
sets (running into petabytes for example). Whether you are just starting off or are a pro,
chances are that you have a couple of parallel data science projects going on in the
organization already.

The implicit promise is that all your data, which had been lying idle for so long, can now
start contributing to better business decisions. All you need to do is - gather data from all
these multiple sources, analyze them, and generate business relevant insights. Yet, you
stumble upon new challenges every day in managing these projects, defining the
outputs, and trying to link the outputs to the business goals. So how exactly is a ‘data
science’ project implemented? What are the key skills you or your team need for creating
such data science solutions?

In this post, I highlight some of the key aspects which in my opinion are essential for
driving successful data analytics solutions and thinking. I identify five key
skills/personalities which in my opinion are central to the success of any data science
endeavor. I highlight why a combination of the skills are essential for deriving business
relevant insights, and creating scalable solutions.

The Five towers of a data science solution

Data science projects generally require three essential skills – statistics /machine
learning skills, business skills, and coding skills. It is not a stretch to say that a person
possessing all the relevant skills is extremely rare – a unicorn in the data science
parlance. Most people, on the other hand, have a combination of one or more of these
skills. It is in these contexts that identifying the key skill composition of a data science
team and ways of enabling the members to collaborate and work together become
essential. Further, two additional skills which people often overlook are the data
visualization skills, and the project/product management skills.
In the following sections, I briefly describe each of these profiles that I believe every data
science project requires-
1. The Professors (or the Algorithms team)
This is the person responsible for all your algorithms and implementations of those
algorithms in some statistical computing language. Typically a PhD/ master’s degree
holder with relevant experience in creating and handling data models and advanced
statistical/ machine learning techniques. The key skills demonstrated by the person
include one or more of – R, Python, Algorithms, and Machine learning. The professor
and her team are key to identifying the new and existing algorithms which can help you
generate insights and do things with data which you never thought were possible.

Yet, the models developed in this team can quickly become very difficult to implement
(imagine a combination of Natural language processing, machine learning, and social
network analysis in a single module) and impossible to scale & deploy unless supported
by some other key skill sets.

2. The Data Nerds ( or The Big Data team )

The person who can handle loads of (big) data without batting an eyelid. Typically able to
find any needle in any haystack - the big data person is able to build castles and
databases in the cloud. She is proficient in skills like - ETL, Big Data, and cloud
computing platforms. These people form the backbone of the data science projects and
are key in making scalable and deployable solutions. It is often said that almost 80% of
the time in any analytics project is spent on gathering, cleaning, and massaging the data.
To link this to the business objectives is the obvious next step. And this is where the
domain expert makes her entrance.

3. The Suits ( or The Domain Experts)

While adequate expertise in the first two skillsets, ensure great data models that work,
you will need a domain expert or a business person to actually put this to (your clients’)
use. Typically, this person is very cognizant of the Industry specific analytics and
measures, and has excellent communication and presentation skills. This is the person
who typically has an MBA background or/and years of industry experience.

4. The Data Designers ( or the Visualization and Design team)

Another increasingly important aspect of a data science project is the design and
visualization of the results and analysis. This is essential since you are trying to present
sophisticated analysis to people who may not have experience/training/interest in
statistical and data science methods. Add to this the fact that all your outputs now need
to be responsive, i.e., view equally well on the laptop, tablet, or mobile. The outputs and
insights you generate must be a natural part of the workday of the end-user. Thus,
understanding the user journey, the personas, and the user interactions become crucial.
You may not be building the next Apple, but a reasonably intuitive interface is still
essential. Especially, if you are building guided analytics projects.

5. The Cat Herders (or The Product Managers)

The product manager needs to manage and make the diverse group work together and
agree on key points. The product manager for data science projects needs to be an all-
rounder with program management, client interfacing, data science, and team
management skills. Experience in herding cats is a bonus. These are the people who
need to understand the data models, as well as the end user and guide the outcomes of
the cross-functional team towards measurable business goals.

Very often organizations form teams which consists of experts with only a subset of
these skills. The allure of data gathering, data cleaning, model building, model
optimizing, and generating reports, is not just interesting, but also very addictive. It is
also one of the most oft repeated mistakes in the data science world. To avoid this, make
sure you do not lose sight of the ‘whys’ by concentrating too much on the ‘how’s. A
correctly balanced team is one of the basic prerequisites on your journey towards solving
ever more sophisticated and challenging data science problems.