Вы находитесь на странице: 1из 8

In-Class Short Exercises for Tutorial 7 – Grouping

As you work through the tutorials in class, complete one for each team

Name: Cameron Inglis Student Number: 301260145

Name: Nancy Mo Student Number: 301251719

Exercise 4.1 For each of the four clusters, which is the most important characteristic of athletic programs
for the individuals in that cluster? Which is the least important? Which Cluster considers the Win/Loss
percentage most important for judging an athletic programs success?
Most and least important characteristics: (most important, least important)
Cluster 1: Fem, Win
Cluster 2: Fem, Win
Cluster 3: Attnd, Violat
Cluster 4: Attnd, Violat
Cluster #3 considers the Win/Loss percentage most important for judging an athletic programs success.

Exercise 5.1: Generate the plot of means for each of the 7 characteristics and paste them into your Word

Figure 1 Finan Plot of Means

Figure 3 Teams Plot of Means

Figure 2 Grads Plot of Means

Figure 5 Fem Plot of Means

Figure 4 Attnd Plot of Means

Figure 6 Win Plot of Means

Figure 7 Violat Plot of Means

Exercise 6.1: Which characteristic do each of the four segments care the least about?
Group 1 cares least about wins, group 2 cares least about grads, group 3 cares least about violations,
and group 4 cares least about financials.

Exercise 7.1: Rerun the biplot function, this time including the argument ellipse = TRUE. Copy your plot
into your word document. Comment on how well the clusters appear to be separated. Also, which cluster
seems to have the most outliers, in the sense of being outside the 95% interval?

The clusters appear to be separated based on how the individuals feel about the four most distinct
attributes, financials, violations, graduates, and wins. The cluster with the most overall outliers appears
to be cluster 4.
Exercise 8.1: An interactive window should pop up where you can drag the 3D plot around to view it at
different angles. To get your depth perception working, move the plot around. You can see that the
individuals in each of the four segments are still forming four clouds around the same four variables as in
the two-dimensional case. If you rotate it around so that you are looking end-on to PC3, it should look
like the previous 2D plots of the first two PC’s. Rotate it more. Is it the case that the small attributes
were being captured by PC3, but we simply could not see that when we were looking at the 2D plots,
or are they still not important? Support your answer by taking a snapshot of a relevant angle of 3D plot
view and saving in a file called (2DFrom3D.png}, using the following function, with the appropriate
filepath for your machine.
Insert the file into your word document.

With the 3D plot, we can see that the PC3 attributes truly are smaller and are not skewed by a 2D plot.
Being shorter lines, this means that the variables have lesser variances in comparison to the other
variables being summarized.
Exercise 9.1 : Print out the cluster centroids (as in step 4) and copy them into your Word document .
Finally, create a contingency table comparing the cluster assignments for K-medians with the K-Means
cluster assignments. You may want to review the xtabs() function in Chapter 3. Copy the contingency
table into your Word document. The two clustering methods will likely assign different cluster numbers
to the same clusters, but if the clusters are even moderately reproducible across the two different
methods, it should be clear which clusters are nearly the same. For each Kmeans cluster, what is the
corresponding Kmedians cluster?

# A tibble: 4 x 8
ClusterKmd Win Grad Violat Attnd Fem Teams Finan
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
11 0.103 0.170 0.358 0.0633 0.0659 0.0567 0.183
22 0.113 0.135 0.148 0.125 0.0728 0.0660 0.336
33 0.332 0.121 0.102 0.0878 0.0716 0.0532 0.233
44 0.171 0.298 0.128 0.0883 0.0832 0.0642 0.167

ClusterKmd 1 2 3 4
1 25 0 0 1
2 4 29 0 2
3 0 7 1 61
4 0 2 34 2
The corresponding clusters are:
K-Means K-Median
1 1
2 2
3 4
4 3
Exercise 9.2: Repeat Exercise 9.1 with Neural Gas method. For BootCVD(), set method = "neuralgas".
This will take a minute or three, be patient. Create another contingency table comparing the cluster
assignments for the neural gas to the K-Means cluster assignments and copy it into you document.
Which neural gas cluster assignment corresponds to each K-Means cluster assignment?
ClusterKmn 1 2 3 4
1 0 0 29 0
2 0 36 0 2
3 35 0 0 0
4 1 0 0 65
The corresponding clusters are:
K-Means Neural Gas
1 3
2 2
3 1
4 4