Вы находитесь на странице: 1из 7

3/14/2017 100DataScienceinPythonInterviewQuestionsandAnswersfor2017

Call us 1-844-696-6465 (US Toll Free) Home Courses Blog Tutorials Interview Questions Online Hackathons Student Portfolios Sign In

Build Projects, Learn Skills, Get Hired REQUEST INFO

Upcoming Live Data Science training

100 Data Science in Python Interview $399

26 Sat and Sun (6 weeks)

Questions and Answers for 2017 Mar 7:00 AM - 10:00 AM PST LEARN MORE

30 Dec 2015
23 Sat and Sun (6 weeks)
Apr 7:00 AM - 10:00 AM PST LEARN MORE

Pythons growing adoption in data science has pitched it as a competitor to R

programming language. With its various libraries maturing over time to suit all data
science needs, a lot of people are shifting towards Python from R. This might seem like
the logical scenario. But R would still come out as the popular choice for data
scientists. People are shifting towards Python but not as many as to disregard R
altogether. We have highlighted the pros and cons of both these languages used in
Data Science in our Python vs R article. It can be seen that many data scientists learn
both languages Python and R to counter the limitations of either language. Being
prepared with both languages will help in data science job interviews.

CLICKHERE to get thedata scientist salary report delivered to your inbox!

Python is the friendly programming language that plays well with everyone and runs
on everything. So it is hardly surprising that Python oers quite a few libraries that
deal with data eciently and is therefore used in data science. Python was used for
data science only in the recent years. But now that it has rmly established itself as an
important language for Data Science, Python programming is not going anywhere.
Mostly Python is used for data analysis when you need to integrate the results of data
analysis into web apps or if you need to add mathematical/statistical codes for

In our previous posts 100 Data Science Interview Questions and Answers (General)
and 100 Data Science in R Interview Questions and Answers, we listed all the
questions that can be asked in data science job interviews. This article in the series,
lists questions which are related to Python programming and will probably be asked in
data science interviews.

Data Science Python Interview

Questions and Answers
https://www.dezyre.com/article/100datascienceinpythoninterviewquestionsandanswersfor2017/188 1/7
3/14/2017 100DataScienceinPythonInterviewQuestionsandAnswersfor2017

The questions below are based on the course that is taught at DeZyre Data Science
Call us 1-844-696-6465 (US Toll Free) Home Courses Blog Tutorials Interview Questions Online Hackathons Student Portfolios Sign In
in Python. This is not a guarantee that these questions will be asked in Data Science
Relevant Courses
Interviews. The purpose of these questions is to make the reader aware of the kind of
Build Projects, Learn Skills, Get Hired REQUEST INFO
knowledge that an applicant for a Data Scientist position needs to possess. Hadoop Online Training

Apache Spark Training

Data Science Interview Questions in Python are generally scenario based or problem
based questions where candidates are provided with a data set and asked to do data Data Science in Python Training

munging, data exploration, data visualization, modelling, machine learning, etc. Most of Data Science in R Language Training
the data science interview questions are subjective and the answers to these
Salesforce Certication Training
questions vary, based on the given data problem. The main aim of the interviewer is to
NoSQL Database Training
see how you code, what are the visualizations you can draw from the data, the
conclusions you can make from the data set, etc. Hadoop Admin Training

1) How can you build a simple logistic regression model in Python?

2) How can you train and interpret a linear regression model in SciKit learn?

3) Name a few libraries in Python used for Data Analysis and Scientic You might also like
Top 100 Hadoop Interview Questions and
NumPy, SciPy, Pandas, SciKit, Matplotlib, Seaborn Answers 2017

4) Which library would you prefer for plotting in Python language: Seaborn or Pig Interview Questions and Answers

Matplotlib? Hive Interview Questions and Answers

Matplotlib is the python library used for plotting but it needs lot of ne-tuning to HBase Interview Questions and Answers

ensure that the plots look shiny. Seaborn helps data scientists create statistically MapReduce Interview Questions and
and aesthetically appealing meaningful plots. The answer to this question varies Answers

based on the requirements for plotting data. HDFS Interview Questions and Answers

5) What is the main dierence between a Pandas series and a single-column Real-Time Hadoop Interview Questions

DataFrame in Python? and Answers

Hadoop Admin Interview Questions and

6) Write code to sort a DataFrame in Python in descending order.

7) How can you handle duplicate values in a dataset for a variable in Python? Basic Hadoop Interview Questions and
8) Which Random Forest parameters can be tuned to enhance the predictive
Apache Spark Interview Questions and
power of the model?

9) Which method in pandas.tools.plotting is used to create scatter plot

Data Analyst Interview Questions and
matrix? Answers

Scatter_matrix 100 Data Science Interview Questions and

Answers (General)
10) How can you check if a data set or time series is Random?
100 Data Science in R Interview Questions
and Answers
To check whether a dataset is random or not use the lag plot. If the lag plot for the
given dataset does not show any structure then it is random. 100 Data Science in Python Interview
Questions and Answers
11) Can we create a DataFrame with multiple data types in Python? If yes,
Taming Big Data with Spark Streaming for
how can you do it?
Real-time Data Processing

12) Is it possible to plot histogram in Pandas without calling Matplotlib? If Recap of Data Science News for February
yes, then write the code to plot the histogram? 2017

Recap of Apache Spark News for February

13) What are the possible ways to load an array from a text data le in
Python? How can the eciency of the code to load data le be improved?
Recap of Hadoop News for February 2017
numpy.loadtxt ()
What is a data science platform and why

14) Which is the standard data missing marker used in Pandas? does your business need one?

Dierence between Data Analyst and Data


15) Why you should use NumPy arrays instead of nested Python lists? Emerging Big Data Trends for 2017

16) What is the preferred method to check for an empty array in NumPy? Recap of Data Science News for January
17) List down some evaluation metrics for regression problems.
https://www.dezyre.com/article/100datascienceinpythoninterviewquestionsandanswersfor2017/188 2/7
3/14/2017 100DataScienceinPythonInterviewQuestionsandAnswersfor2017

18) Which Python library would you prefer to use for Data Munging? Recap of Apache Spark News for January
Call us 1-844-696-6465 (US Toll Free) Home Courses Blog Tutorials Interview Questions 2017
Online Hackathons Student Portfolios Sign In
Recap of Hadoop News for January 2017
Build Projects, Learn Skills, Get Hired REQUEST INFO
19) Write the code to sort an array in NumPy by the nth column?

Using argsort () function this can be achieved. If there is an array X and you would
like to sort the nth column then code for this will be x[x [: n-1].argsort ()]

20) How are NumPy and SciPy related? Blog Categories

21) Which python library is built on top of matplotlib and Pandas to ease data Big Data

plotting? CRM

Seaborn Data Science

22) Which plot will you use to access the uncertainty of a statistic? Mobile App Development

NoSQL Database
Web Development
23) What are some features of Pandas that you like or dislike?

24)Which scientic libraries in SciPy have you worked with in your project?

25) What is pylab?

A package that combines NumPy, SciPy and Matplotlib into a single namespace. Tutorials

26)Which python library is used for Machine Learning? Hadoop Online Tutorial Hadoop HDFS
Commands Guide
MapReduce TutorialLearn to implement

Learn Data Science in Python to become an Enterprise Data Scientist Hadoop WordCount Example

Hadoop Hive Tutorial-Usage of Hive

Basic Python Programming Interview Questions Commands in HQL

Hive Tutorial-Getting Started with Hive

27) How can you copy objects in Python?
Installation on Ubuntu

The functions used to copy objects in Python are- Learn Java for Hadoop Tutorial:
Inheritance and Interfaces
1) Copy.copy () for shallow copy
Learn Java for Hadoop Tutorial: Classes
2) Copy.deepcopy () for deep copy and Objects

Learn Java for Hadoop Tutorial: Arrays

However, it is not possible to copy all objects in Python using these functions. For
instance, dictionaries have a separate copy method whereas sequences in Python Apache Spark TutorialRun your First

have to be copied by Slicing. Spark Program

PySpark Tutorial-Learn to use Apache

28) What is the dierence between tuples and lists in Python?
Spark with Python

Tuples can be used as keys for dictionaries i.e. they can be hashed. Lists are mutable R Tutorial- Learn Data Visualization with R
whereas tuples are immutable - they cannot be changed. Tuples should be used when using GGVIS

the order of elements in a sequence matters. For example, set of actions that need to Neural Network Training Tutorial
be executed in sequence, geographic locations or list of points on a specic route.
Python List Tutorial

29) What is PEP8? MatPlotLib Tutorial

PEP8 consists of coding guidelines for Python language so that programmers can Decision Tree Tutorial

write readable code making it easy to use for any other person, later on. Neural Network Tutorial

30) Is all the memory freed when Python exits? Performance Metrics for Machine
Learning Algorithms
No it is not, because the objects that are referenced from global namespaces of
R Tutorial: Data.Table
Python modules are not always de-allocated when Python exits.
SciPy Tutorial
31) What does _init_.py do?
Step-by-Step Apache Spark Installation

_init_.py is an empty py le used for importing a module in a directory. _init_.py Tutorial

provides an easy way to organize the les. If there is a module Introduction to Apache Spark Tutorial
maindir/subdir/module.py,_init_.py is placed in all the directories so that the module
R Tutorial: Importing Data from Web
can be imported using the following command-

https://www.dezyre.com/article/100datascienceinpythoninterviewquestionsandanswersfor2017/188 3/7
3/14/2017 100DataScienceinPythonInterviewQuestionsandAnswersfor2017

import maindir.subdir.module R Tutorial: Importing Data from Relational

Call us 1-844-696-6465 (US Toll Free) Home Courses Blog Tutorials Interview Questions DatabaseStudent Portfolios
Online Hackathons Sign In
32) What is the dierent between range () and xrange () functions in Python?
R Tutorial: Importing Data from Excel
Build Projects, Learn Skills, Get Hired REQUEST INFO
range () returns a list whereas xrange () returns an object that acts like an iterator for Introduction to Machine Learning Tutorial
generating numbers on demand.
Machine Learning Tutorial: Linear
33) How can you randomize the items of a list in place in Python?
Machine Learning Tutorial: Logistic
Shue (lst) can be used for randomizing the items of a list in Python Regression

34) What is a pass in Python? Support Vector Machine Tutorial (SVM)

Pass in Python signies a no operation statement indicating that nothing is to be done. K-Means Clustering Tutorial

dplyr Manipulation Verbs

35) If you are gives the rst and last names of employees, which data type in
Python will you use to store them? Introduction to dplyr package

Importing Data from Flat Files in R

You can use a list that has rst name and last name included in an element or use
Dictionary. Principal Component Analysis Tutorial

Pandas Tutorial Part-3

36) What happens when you execute the statement mango=banana in Python?
Pandas Tutorial Part-2
A name error will occur when this statement is executed in Python.
Pandas Tutorial Part-1
37) Write a sorting algorithm for a numerical dataset in Python.
Tutorial- Hadoop Multinode Cluster Setup
on Ubuntu
38) Optimize the below python code-
Data Visualizations Tools in R
word = 'word'
R Statistical and Language tutorial
print word.__len__ ()
Introduction to Data Science with R
Answer: print word._len_ ()
Apache Pig Tutorial: User Dened Function
39) What is monkey patching in Python?
Apache Pig Tutorial Example: Web Log
Monkey patching is a technique that helps the programmer to modify or extend other Server Analytics
code at runtime. Monkey patching comes handy in testing but it is not a good practice
Impala Case Study: Web Trac
to use it in production environment as debugging the code could become dicult.
Impala Case Study: Flight Data Analysis
40) Which tool in Python will you use to nd bugs if any?
Hadoop Impala Tutorial

Pylint and Pychecker. Pylint veries that a module satises all the coding standards or
Apache Hive Tutorial: Tables
not. Pychecker is a static analysis tool that helps nd out bugs in the course code.
Flume Hadoop Tutorial: Twitter Data

41) How are arguments passed in Python- by reference or by value? Extraction

Flume Hadoop Tutorial: Website Log

The answer to this question is neither of these because passing semantics in Python
are completely dierent. In all cases, Python passes arguments by value where all
Hadoop Sqoop Tutorial: Example Data
values are references to objects.

42) You are given a list of N numbers. Create a single list comprehension in Hadoop Sqoop Tutorial: Example of Data
Python to create a new list that contains only those values which have even Aggregation

numbers from elements of the list at even indices. For instance if list[4] has an Apache Zookepeer Tutorial: Example of
even value the it has be included in the new output list because it has an even Watch Notication

index but if list[5] has an even value it should not be included in the list because
Apache Zookepeer Tutorial: Centralized
it is not at an even index. Conguration Management

[x for x in list [: 2] if x%2 == 0] Hadoop Zookeeper Tutorial

Hadoop Sqoop Tutorial

The above code will take all the numbers present at even indices and then discard the
odd numbers. Hadoop PIG Tutorial

Hadoop Oozie Tutorial

43) Explain the usage of decorators.
Hadoop NoSQL Database Tutorial
Decorators in Python are used to modify or inject code in functions or classes. Using
Hadoop Hive Tutorial
decorators, you can wrap a class or function method call so that a piece of code can
be executed before or after the execution of the original code. Decorators can be Hadoop HDFS Tutorial

used to check for permissions, modify or track the arguments passed to a method, Hadoop hBase Tutorial
logging the calls to a specic method, etc.
https://www.dezyre.com/article/100datascienceinpythoninterviewquestionsandanswersfor2017/188 4/7
3/14/2017 100DataScienceinPythonInterviewQuestionsandAnswersfor2017

44) How can you check whether a pandas data frame is empty or not? Hadoop Flume Tutorial
Call us 1-844-696-6465 (US Toll Free) Home Courses Blog Tutorials Interview Questions Online Hackathons Student Portfolios Sign In
Hadoop 2.0 YARN Tutorial
The attribute df.empty is used to check whether a data frame is empty or not.
Hadoop MapReduce Tutorial
Build Projects, Learn Skills, Get Hired REQUEST INFO
45) What will be the output of the below Python code
Big Data Hadoop Tutorial for Beginners-
def multipliers (): Hadoop Installation

return [lambda x: i * x for i in range (4)]

print [m (2) for m in multipliers ()]

The output for the above code will be [6, 6,6,6]. The reason for this is that because of
Online Courses
late binding the value of the variable i is looked up when any of the functions returned
by multipliers are called. Hadoop Training

Spark Certication Training

46) What do you mean by list comprehension?
Data Science in Python
The process of creating a list while performing some operation on the data so that it
Data Science inR
can be accessed using an iterator is referred to as List Comprehension.
Data Science Training

[ord (j) for j in string.ascii_uppercase]

[65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90]

47) What will be the output of the below code

word = aeioubcdfg'

print word [:3] + word [3:]

The output for the above code will be: aeioubcdfg'.

In string slicing when the indices of both the slices collide and a + operator is applied
on the string it concatenates them.

48) list= [a,e,i,o,u]

print list [8:]

The output for the above code will be an empty list []. Most of the people might
confuse the answer with an index error because the code is attempting to access a
member in the list whose index exceeds the total number of members in the list. The
reason being the code is trying to access the slice of a list at a starting index which is
greater than the number of members in the list.

49) What will be the output of the below code:

def foo (i= []):

i.append (1)

return i

>>> foo ()

>>> foo ()

The output for the above code will be-


[1, 1]

Argument to the function foo is evaluated only once when the function is dened.
However, since it is a list, on every all the list is modied by appending a 1 to it.

50) Can the lambda forms in Python contain statements?

https://www.dezyre.com/article/100datascienceinpythoninterviewquestionsandanswersfor2017/188 5/7
3/14/2017 100DataScienceinPythonInterviewQuestionsandAnswersfor2017

No, as their syntax is restrcited to single expressions and they are used for creating
Call us 1-844-696-6465 (US Toll Free)
Home Courses Blog Tutorials Interview Questions Online Hackathons Student Portfolios Sign In
function objects which are returned at runtime.

This list of questions for Python interview questions and Learn

Build Projects, answers is not
Skills, Getan exhaustive

one and will continue to be a work in progress. Let us know in comments below if we
missed out on any important question that needs to be up here.



2Comments DeZyre
1 Login

Recommend 1 Share SortbyNewest


KhushbuShah Mod ayearago

Reply Share

Reply Share

Subscribe d AddDisqustoyoursiteAddDisqusAdd Privacy

Big Data and Hadoop Training Courses in Popular Cities

Hadoop Training in Texas Hadoop Training in New Jersey

Hadoop Training in California Hadoop Training in New York

Hadoop Training in Dallas Hadoop Training in Atlanta

Hadoop Training in Chicago Hadoop Training in Canada

Hadoop Training in Charlotte Hadoop Training in Abu Dhabi

Hadoop Training in Dubai Hadoop Training in Detroit

Hadoop Training in Edison Hadoop Trainging in Germany

Hadoop Training in Fremont Hadoop Training in Houston

Hadoop Training in San Jose Hadoop Training in Virginia

Hadoop Training in Washington

https://www.dezyre.com/article/100datascienceinpythoninterviewquestionsandanswersfor2017/188 6/7
3/14/2017 100DataScienceinPythonInterviewQuestionsandAnswersfor2017

Courses About DeZyre

Call us 1-844-696-6465 (US Toll Free) Home Courses Blog Tutorials Interview Questions Online Hackathons Student Portfolios Sign In
Live Courses About Us
Big Data and Hadoop Certication Training Contact Us
Build Projects, Learn Skills, Get Hired REQUEST INFO
Hadoop Project based Training DeZyre Reviews
Apache Spark Certication Training Blog
Data Science Training Course Tutorials
Self-Paced and One-on-One Traning
Online Hackathons
Data Science in R Programming
Student Portfolios
Salesforce Certications - ADM 201 and DEV 401
Privacy Policy
Hadoop Administration for Big Data
Certicate in NoSQL Databases for Big Data

Self-Paced Courses Connect with us

CCA175 - Cloudera Spark and Hadoop Developer Certication
Hadoop Interview Preparation DezyreOnline

Free Courses
Java for Beginners
Introduction to Data Science in Python EV SSL Certicate

Copyright 2017 Iconiq Inc. All rights reserved. All trademarks are property of their respective owners.

https://www.dezyre.com/article/100datascienceinpythoninterviewquestionsandanswersfor2017/188 7/7