Вы находитесь на странице: 1из 2

Air University

Lab-10
Data Science
Decision/Classification Trees

Before doing this lab, please read the instructions mentioned at the end of this file

In this lab, we are going to apply random forest classifier (both: regression and classification) on
a data that measures the machine (computer) efficiency. You are required to do following tasks.

1. Use multivariate regression to predict the machine performance on data provided to you
named as ComputerHardware.csv.
a. To do this, youve to do some preprocessing. For example, data description file is
also provided to you and you are supposed to see it carefully and name the data
columns accordingly as ComputerHardware.csv does not has names of its
columns. So pick the names from data description file and put them on your
loaded data (loaded from ComputerHardware.csv).
b. Last column is your prediction column.
c. Split your data with some ratio (as before) and apply random forest (in this case
regression random forest prediction column is continuous) on it with at least 500
decision trees i.e set ntree = 500.
d. Measure the accuracy
2. Use Random Forest to predict the machine performance.
a. The pre-processing required to apply classification random forest is done in first
task. All you have to do in this step is, create a new categorical variable with two
or more number of categories (set the number of categories that you think are
suitable and justify this categorization in your report). One thing to notice here is
that, while training your model, ignore the variable from which youve
created new variable. For example, suppose the name of last column of your
data in task1 is set to ERP. And in task2, you are creating a new
categorical variable from this ERP column. So after creating new variable,
when you are going to train your model, do not take ERP as a predictor
because it was a target variable. This can be done with below line of code
i. randomForest(catgorical~. - ERP,data =
training,ntree=500,importance=T)
b. Train your machine with this newly created categorical variable.
c. Test the results by checking your model on testing data.
d. Calculate accuracy
e. Find the difference in both accuracies (with classification and regression)

Your answers should be well defined and properly justified. It should include the figures/graphs
of results for better understanding.

Important Instructions:

1. Your submission should be a pdf file according to defined format and should include the
following detail:
a. All figures and matrices required to this task. Every figure should be explained
w.r.t functionality that it performs.
b. Code (at the end of your submission)
c. A comprehensive conclusion that must be able to answer the following questions
i. What task you have done and what is the importance of it?
ii. What efforts/steps you did to perform a specific step i.e how you did this
step?
2. If you didnt understand 2.a or any other part of any file, do not do your lab wrong way.
Ask and clarify it from me. You can ask question on google classroom or at any other
platform that best suits you.
3. It is observed that your submissions are not proper due to which marks are being
deducted. So, please follow the proper convention for format told to you for submission.
4. Code is also attached for help. But this is not the final code. You have to write your own
code. However, this code will help you.

Вам также может понравиться