Академический Документы
Профессиональный Документы
Культура Документы
The age attribute has the following data: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30,
33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
a. Use smoothing by bin means to smooth this data, using a bin depth of 3. Comment on the effect
of this technique on the given data.
Step 1: Sort the data. (This step is not required here as the data are already sorted.)
• Step 4: Replace each of the values in each bin by the arithmetic mean calculated for the bin.
Outliers in the data may be detected by clustering, where similar values are organized into groups,
or named as ‘clusters’. Values that fall outside of the set of clusters can be considered as outliers.
Alternatively, a combination of computer & human inspection can be used where a predetermined
data distribution is implemented to allow the computer to identify the possible outliers. These
possible outliers can be verified by human inspection with much less effort than would be required
to verify the entire initial data set.
Other methods that may be used for data smoothing include alternate forms of binning such as
smoothing by bin medians or smoothing by bin boundaries. In Other Way, equiwidth bins can be
used to implement any of the forms of binning, where the interval range of values in each bin is
constant. Methods other than binning include using regression techniques to smooth the data by
fitting it into a function such as through linear or multiple regressions. Also, classification techniques
can be used to implement concept hierarchies that can smooth the data by rolling-up lower level
concepts to higher-level concepts.
Q3. Use the following metrics to create Chernoff faces using both Excel and R
Use data for 5 key Indian batsmen and submit your results.
R Code :
getwd()
data<-read.csv("Data.csv")
names(data)[1]<-"Player"
names(data)[5]<-"4perMatch"
names(data)[6]<-"6perMatch"
library(aplpack)
dat<-
data.frame(data[3],data[2],A=c(rep(3,5)),B=c(rep(3,5)),C=c(rep(3,5)),data[4],D=c(rep(3,5)),data[5],da
ta[6],E=c(rep(3,5)),G=c(rep(3,5)),H=c(rep(3,5)),I=c(rep(3,5)),J=c(rep(3,5)),K=c(rep(3,5)),L=c(rep(3,5)))
faces(dat,labels=data$Player)
As can be seen, the happiest face seems to be of Shikhar Dhawan, since he has the
highest strike rate, a variable mapped to curve of the smile. Also, notice Virat Kohli has a
very long face, this is again due to the fact because the batting average is mapped to the
height of face and Kohli has a very good batting average.