Вы находитесь на странице: 1из 5

CS 910 Exercise Sheet 2: Trying out tools

Question 1

sep

= ",")

## Taking the first alphabet of the column make and storing it as a dataframe count <- as.data.frame(table(substr(Auto\$V3, start = 1, stop = 1))) ## Taking count of rows with first alphabet as M or m

subset(count, Var1 == "m" | Var1 == "M")

 ## Var1 Freq ## 8 m 39

The count of number if Cars with name starting with M are 39

Question 2 (a)

## Storing the required columns in a separate data frame count count <- as.data.frame(table(Auto\$V4, Auto\$V5, Auto\$V6, Auto\$V7, Auto\$V8, Auto\$V9)) ## removing the combinatons with zero occrence count <- subset(count, as.numeric(count\$Freq) > 0) ## taking count of rows nrow(count)

##

[1]

36

The total number of unique combinations for which there are one or more missing values in one of the vectors is 36

(b)

## Storing the required columns in a separate data frame count count <- as.data.frame(table(Auto\$V4, Auto\$V5, Auto\$V6, Auto\$V7, Auto\$V8, Auto\$V9)) ## removing the combinatons with zero occrence count <- subset(count, as.numeric(count\$Freq) > 0) ## saving the list of cols and removing the rows with ? in any field collist <- c("Var1", "Var2", "Var3", "Var4", "Var5", "Var6") sel <- apply(count[, collist], 1, function(row) !"?" %in% row) count <- count[sel, ] ## taking count of rows nrow(count)

##

[1]

34

The total number of unique combinations for which there are one or more missing values in one of the vectors is 34

1

Question 3

##Selecting cars with four doors

q3.Auto <- subset(Auto,

##converting the column cost into numeric q3.Auto\$V26 <- as.numeric(as.character(q3.Auto\$V26)) ##Removing the NA values and displaying median median(q3.Auto\$V26, na.rm= TRUE)

Auto\$V6 == "four")

##

[1] 11245

The median of price of four door cars is 11245

##Removing the NA values and displaying mean mean(q3.Auto\$V26, na.rm= TRUE)

## [1] 13565.67

The mean of price of four door cars is 13565.67

Question 4

sep

= ",")

## Plotting the graph of height and length columns

plot(Abal\$Height, Abal\$Length, main = "Scatterplot showing Height and Length of Abalone",

xlab = "Height", ylab = "Length",

pch =

1,

ylim =

c(0, 1.2))

abline(lm(Abal\$Length ~ Abal\$Height), col = "red") lines(lowess(Abal\$Height, Abal\$Length), col = "blue")

2

Scatterplot showing Height and Length of Abalone

0.0
0.2
0.4
0.6
0.8
1.0
Height
Length
0.0
0.2
0.4
0.6
0.8
1.0
1.2

##Equation of the scatterplot lm(formula = Abal\$Length ~ Abal\$Height)->equation equation

##

## Call:

## lm(formula = Abal\$Length ~ Abal\$Height) ## ## Coefficients:

 ## (Intercept) Abal\$Height ## 0.1925 2.3761

Outliers are the values in a dataset which are not similar or along the lines of most of the dataset and hence tend to standout. These are usually present because of many reasons, e.g. , data being entered incorrectly, missing values, etc. In our plot the outliers are the points (0, 0.43) and (0, 0.315) being present as the Height has been entered as 0 for these plots. Also, points (0.515, 0.705) and (1.13, 0.455) are outliers as these values are very far from regression line, and hence, are outliers.

3

Question 5

##Taking numeric columns nAbal <- Abal[sapply(Abal, is.numeric)] ##making combinations of 2 columns combn(colnames(nAbal),2)-> combo ##calculate PPCC apply(combo, 2, function(x) cor(nAbal[,x[1]], nAbal[,x[2]])) -> PPCCnAbal ##Storing result as data frame as.data.frame(PPCCnAbal)-> PPCCnAbal ##taking transpose to convert column into rows t(combo) -> combo ##binding with result cbind(combo, PPCCnAbal)-> soln ##filtering as per condition subset(soln,as.numeric(as.character(soln\$ PPCCnAbal))>0.95)

 ## 1 2 PPCCnAbal ## 1 Length Diameter 0.9868116

## 19 Whole.weight Shucked.weight 0.9694055

## 20 Whole.weight Viscera.weight 0.9663751

## 21 Whole.weight

Shell.weight 0.9553554

The combinations for which Pearson product coeﬃcient is more than 0.95 are (Length,Diameter), (Whole.weight,Shucked.weight), (Whole.weight,Viscera.weight) and (Whole.weight,Shell.weight)

4

Question 6

xlab = "Number

of

Rings", pch

19, col = "blue")

## Adding female ECDF lines(ecdf.female.rings, pch = 20, col = "red") ## Adding infant ECDF lines(ecdf.infant.rings, pch = 20, col = "green")

## taking rows with sex as Males Abal_m <- subset(Abal, as.character(Abal\$Sex) == "M") ## calculating the ecdf subset ecdf.male.rings <- ecdf(Abal_m\$Rings) ## taking rows with sex as Females Abal_f <- subset(Abal, as.character(Abal\$Sex) == "F") ## calculating the ecdf subset ecdf.female.rings <- ecdf(Abal_f\$Rings) ## taking rows with sex as Infants Abal_i <- subset(Abal, as.character(Abal\$Sex) == "I") ## calculating the ecdf subset ecdf.infant.rings <- ecdf(Abal_m\$Rings) ## Plotting the ECDF for males plot(ecdf.male.rings, main = "Emperical CDF of various Sexes", ylab = "Quantiles of diff Sexes", =

Emperical CDF of various Sexes

0
5
10
15
20
25
30
Quantiles of diff Sexes
0.0
0.2
0.4
0.6
0.8
1.0

Number of Rings

5