Вы находитесь на странице: 1из 3

Q1.

The age attribute has the following data: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30,
33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.

a. Use smoothing by bin means to smooth this data, using a bin depth of 3. Comment on the effect
of this technique on the given data.

Step 1: Sort the data. (This step is not required here as the data are already sorted.)

• Step 2: Partition the data into equidepth bins of depth 3.

Bin 1: 13, 15, 16

Bin 2: 16, 19, 20

Bin 3: 20, 21, 22

Bin 4: 22, 25, 25

Bin 5: 25, 25, 30

Bin 6: 33, 33, 35

Bin 7: 35, 35, 35

Bin 8: 36, 40, 45

Bin 9: 46, 52, 70

• Step 3: Calculate the arithmetic mean of each bin.

• Step 4: Replace each of the values in each bin by the arithmetic mean calculated for the bin.

Bin 1: 142/3, 142/3, 142/3

Bin 2: 181/3, 181/3, 181/3

Bin 3: 21, 21, 21

Bin 4: 24, 24, 24

Bin 5: 262/3, 262/3, 262/3

Bin 6: 332/3, 332/3, 332/3

Bin 7: 35, 35, 35

Bin 8: 401/3, 401/3, 401/3

Bin 9: 56, 56, 56

b. How would you determine outliers in the data?

Outliers in the data may be detected by clustering, where similar values are organized into groups,
or named as ‘clusters’. Values that fall outside of the set of clusters can be considered as outliers.
Alternatively, a combination of computer & human inspection can be used where a predetermined
data distribution is implemented to allow the computer to identify the possible outliers. These
possible outliers can be verified by human inspection with much less effort than would be required
to verify the entire initial data set.

c. what other methods are there for data smoothing?

Other methods that may be used for data smoothing include alternate forms of binning such as
smoothing by bin medians or smoothing by bin boundaries. In Other Way, equiwidth bins can be
used to implement any of the forms of binning, where the interval range of values in each bin is
constant. Methods other than binning include using regression techniques to smooth the data by
fitting it into a function such as through linear or multiple regressions. Also, classification techniques
can be used to implement concept hierarchies that can smooth the data by rolling-up lower level
concepts to higher-level concepts.

Q3. Use the following metrics to create Chernoff faces using both Excel and R

a. Batting average - Height of face

b. Strike Rate - Curve of smile

c. Number of 4s per match - Width of eyes

d. Number of 6s per match - Height of eyes

e. Ratio of innings to total matches played - Width of face

Use data for 5 key Indian batsmen and submit your results.

R Code :

getwd()

data<-read.csv("Data.csv")

names(data)[1]<-"Player"

names(data)[5]<-"4perMatch"

names(data)[6]<-"6perMatch"

library(aplpack)

dat<-
data.frame(data[3],data[2],A=c(rep(3,5)),B=c(rep(3,5)),C=c(rep(3,5)),data[4],D=c(rep(3,5)),data[5],da
ta[6],E=c(rep(3,5)),G=c(rep(3,5)),H=c(rep(3,5)),I=c(rep(3,5)),J=c(rep(3,5)),K=c(rep(3,5)),L=c(rep(3,5)))

faces(dat,labels=data$Player)
As can be seen, the happiest face seems to be of Shikhar Dhawan, since he has the
highest strike rate, a variable mapped to curve of the smile.  Also, notice Virat Kohli has a
very long face, this is again due to the fact because the batting average is mapped to the
height of face and Kohli has a very good batting average.

Вам также может понравиться