Вы находитесь на странице: 1из 12

George Han

04/24/13
Regression and Multivariate Data Analysis STAT-UB 17
Homework 5
Professor Simonoff
Modeling 3x3 Rubiks Cube Solve Speeds
One of my hobbies is speedcubing, in which I try to solve a 3x3 Rubiks Cube in the
fastest possible time. I own three different models of cubes, and while solving, sometimes I like
to listen to music. Are these factors, by themselves or together, associated with any notable shifts
in my solve speeds? This report will use statistical methods to test whether they are. It is
interesting to look for these associations because they may reveal insights into the potential
effects of external factors on solve speeds, and may also help me improve as a speedcuber.
The numerical target variable is therefore solve times in seconds, henceforth time or
times. Potential categorical predicting variables are the model of cube used and whether I am
listening to music while solving. The levels for these predictors are described in the following:

Model of cube used while solving* (levels Standard, ZhanChi, and PanShi,
respectively):
Standard White DIY (self-assembled) Speed Cube. This old model is outdated by at least
5 years and it is slow, rigid, and does not cut corners well. However, I have practiced with
it for many years and am accustomed to its feel, especially since it is easy to control in its
solid rigidity. Not tensioned* and not lubricated*. Corresponds to Model = 1.
DaYan* V ZhanChi. The fifth model in the DaYan 3x3 series, this relatively new cube is
almost unanimously regarded as the fastest and best. It is smooth, cuts corners well, and
is easy to control. Tensioned and lubricated. Corresponds to Model = 2.
DaYan VI PanShi. The sixth and newest model in the DaYan 3x3 series, this new cube is
proclaimed by DaYan to be the best, but speedcubers in reviews have claimed that the
PanShi is only as good as the ZhanChi. However, the PanShi is my favorite cube and I
think it is the best. It is not as smooth as the ZhanChi, but cuts corners so ridiculously
well that it is sometimes difficult to control. Tensioned and lubricated. Corresponds to
Model = 3.
Whether I am listening to music (that I enjoy) during the solve (levels Yes and No):
Yes, I was listening to music while solving. Corresponds to Music = 1.
No, I was not listening to music while solving, and have not listened to music recently;
therefore, I did not feel any potential lingering energizing or comforting effects while
solving. Corresponds to Music = 0.

It is unsure as to what results can be expected. Some argue that better cubes will result in
faster times because obviously, better cubes are easier on the hands. However, others argue that
in order to achieve the best times, cube quality should be consistent with the speedcubers skill.
Sometimes, and I have personal experience with this, I get slower times with better cubes
because they literally turn so quickly that I lose track of what I am doing. Slower cubes also
allow for further look-ahead while solving, reducing the time wasted in between executions of
algorithms and other actions.

The effect on music on solve times is also unclear. Music can create energizing or
comforting effects, but will this actually improve times or simply make one feel better without
impacting times?
The data:
Solve
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Time
45.48
52.05
36.64
42.37
48.83
44.43
50.61
40.85
49.70
33.67
48.54
36.82
49.04
44.84
48.94
40.83
36.43
47.57
41.71
37.18
48.17
47.91
42.48
54.57
54.54
32.55
42.19
38.85
43.23
44.74

Model
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
3
3
3
3
3
3
3
3
3
3

Music
1
1
1
1
1
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0

These data were obtained through the recording of 30 of my own solves, using a cube
timer (http://www.cubetimer.com/). 10 solves were with the Standard White DIY, 10 with the
ZhanChi, and 10 with the PanShi. Out of each of those three sets of 10, I was listening to music
for 5 of them and not listening to music for the other 5. (5 repetitions for every combination of
predictors.) All data was obtained 04/15/13 at approximately 9:30pm 11:00pm.

Other potential predicting variables such as skill, algorithms memorized, time of day, etc.
have not been included because since I am the only solver for this data; the absence of those
variables should not have a profound effect.
Consider the following side by side boxplots of time vs. cube model:

Nothing is too extreme here, but there is a very general feeling of times increasing as the
cube model shifts from Standard to ZhanChi to PanShi. This may hint at a possible but weak
cube model effect.
Consider the following side by side boxplots of time vs. music:

The boxplot for Yes Music looks higher up on the time scale than the boxplot for No
Music, hinting at a possible Music effect in which listening to music may be associated with
slower times.
There may also be interaction effects, as music may have different impacts with different
cube models, and/or different cube models may have different impacts with music.
Consider the following abridged two-way ANOVA with a potential interaction effect:

Before going anywhere else with this, it is important to note that there is no evidence of a
statistically significant interaction effect at the = 0.05 level of significance because the p-value
of the F-statistic of the interaction terms is 0.28219 > 0.05 (do not reject HO). This corresponds to
the following hypothesis test:
HO: There are no interaction effects (all ()ij = 0).
HA: There are indeed interaction effects (at least one ()ij =/= 0).
Therefore, interaction terms should not be included in the final model. It is also not currently
useful to interpret the possible main effects here because the presence of the insignificant
interaction effect may be skewing the results. So, no conclusions besides that that the interaction
effect is insignificant will be drawn for now. Consider the following interaction plot:

This graph appears to be consistent with the findings above regarding interaction effects
because an interaction effect would be reflected in nonparallel lines, and the lines above are not
that nonparallel.
Consider the following standardized residual plots:

The normal probability plot implies that the data are fairly normally distributed, with a
possible hint at slightly light right tails. The residuals vs. fitted values plot looks well behaved
but there is a dip in variance at around fitted value 46. This little bit of heteroscedasticity may
indicate a little bit of potential non-linearity, but we can not tell for sure because there is a
following large gap in the plot. Since this potential heteroscedasticity does not look that extreme,
weighted least squares will not be used, and nor will logs be taken. However, estimates of
regression coefficients may be slightly less accurate, and that predictive accuracy may be slightly
incorrect. The histogram does not look normally distributed at all with a very noticeable left
skew, which may indicate that estimates of regression coefficients may be inappropriate, because
a part of the signal may be being mistakenly trated as noise. This might be a result of the
inappropriate inclusion of the interaction effect. The residuals vs. the order of the data looks well
behaved. Consider the following standardized residual plots vs. each predictor:

Both these plots look well behaved. There do not appear to be any obvious extreme
values, but regardless, consider the following series of diagnostic plots to assess the magnitude
of any possible outliers or leverage points, as well as a table with specific values for the more
outstanding observations (table contains observation numbers, standardized residual values, hat
values, and Cooks distances, from left to right):

The topmost plot shows the Cooks distances for each observation, which measure the
extent to which observations influence the fitted regression coefficients. Any observations with
Cooks distances of above 1 (or simply relatively large compared to those of other observations)
should be studied further, and here, there are none.
The second plot from the top shows the standardized residuals for each observation,
which measure how far out an observation is from where the general regression should imply,
and observations with standardized residuals +/- 2.5 should be studied further because that
implies that such an observation could only occur due to pure chance 1% of the time, and here,
there are none.
The bottommost plot shows the hat values (Hi) for each observation, which based on x
values, measure how far away particular cases are from the rest of the x variables, indicating
leverage. Any observation with hat value of 2.5((p + 1)/n) or greater, where p is the sum of the
degrees of freedom for all effects in the model ((3 1) + (2 1) = 2 + 1 = 3) and n is the total
number of observations (30), so for this regression the value would be 2.5((p + 1)/n) = 2.5((3 +
1)/30) = 0.33 3 , should be studied further, and here, there are none.
Since no observations were flagged by this diagnostic checking, no action will be taken
regarding the addressing of extreme values.
As concluded before, the interaction effect was shown to be insignificant, so consider the
following two-way ANOVA without interaction terms:

There is no evidence of a statistically significant Model effect because the p-value of the
F-statistic of that is still very high at 0.77190 > 0.05 (reject HO). There is indeed evidence of a
statistically significant Music effect because the p-value of the F-statistic of that is still low at
0.01814 < 0.05 (do not reject HO). These correspond to the following hypothesis tests:
HO: There is no main effect for that predictor (all i (or j) = 0).
HA: There is indeed a main effect for that predictor (at least one i (or j) =/= 0).

The regression is of somewhat weak strength with R2 = 0.2093, meaning that 20.93% of
the variability in the target variable can be accounted for by the predicting variables. The residual
standard error is 5.552, representing the standard deviation of points formed around the
regression line. This means that 95% of actual target values should be within +/- 2 * 5.552 = +/11.104, of the predicted target values. Consider the following table of least squares means:

Assuming expected value for error = 0, this two-way ANOVA model without interaction
has equation:
yij = + i + j = + 1 + 2 + 3 + 1 + 2
where i is 1 (Standard), 2 (ZhanChi), or 3 (PanShi), depending on Model, and j is 1
(Yes Music) or 2 (No Music), depending on Music. yij represents the modeled target variable
Time for Model main effect i and Music main effect j. i represents the Model main effect. j
represents the Music main effect. From the above table of least squares means, it can be seen that
= 44.192, i = 44.46 = +0.268, i = 43.19 = -1.002, i = 44.92 = +0.728, 1 = 41.64
= -2.552, and 2 = 46.75 = +2.558. Therefore:

yij = 44.192 + 0.268 * (Standard) 1.002 * (ZhanChi) + 0.728 * (PanShi) 2.552 * (Yes
Music) + 2.558 * (No Music); where each variable = 1 if true, and = 0 if false.
My average Time is 44.192 s.
Regarding cube Models:

Standard is associated with an estimated expected increase in Time of 0.268 s


ZhanChi is associated with an estimated expected decrease in Time of 1.002 s
PanShi is associated with an estimated expected increase in Time of 0.728 s

Regarding Music:

Yes Music is associated with an estimated expected decrease in Time of 2.552 s


No Music is associated with an estimated expected increase in Time of 2.558 s

where s is seconds, and all effects are relative to the overall level = 44.192.
It can be seen here that the Music effect is larger than the Model effect because the
absolute values of any of the two coefficients of Music are larger than the absolute values of any
of the three coefficients of Model. However, all of these least square means are less than 3.000 s.
So, practically, when recreationally speedcubing, I will most likely continue to use whichever
cube I want and choose to listen to music as I desire, because a few seconds of variation in time
is not a big deal to me. But, when competitively speedcubing or when attempting to break my
previous solve time record (32.55 s, which was actually recorded in this data set with Model =
PanShi and Music = No Music), it might be a good idea to use a ZhanChi while not listening to
music.
Since the interaction effect is insignificant, Tukey multiple comparison tests can be used
to see how the solves with different cube models, with music, and without music separate, so
consider the following:

The very high p-values for all comparisons between cube models 0.866 between
ZhanChi and Standard, 0.981 between PanShi and Standard, and 0.767 between PanShi and
ZhanChi indicate that the Model effect is very weak and that there may not be a difference
(with regard to Time) between Standard, ZhanChi, and PanShi. The low p-value for the
comparison between with music and without music is 0.018, indicating a strong and statistically

significant music effect, and adding strength to the statement that there is indeed a difference
between Yes Music and No Music.
Consider the following standardized residual plots:

The normal probability plot implies that the data are fairly normally distributed. The
residuals vs. fitted values plot looks well behaved but there is a dip in variance at around fitted
value 46. This little bit of heteroscedasticity may indicate a little bit of potential non-linearity, but
we can not tell for sure because there is a following large gap in the plot. Since this potential
heteroscedasticity does not look that extreme, weighted least squares will not be used, and nor
will logs be taken. However, estimates of regression coefficients may be slightly less accurate,
and that predictive accuracy may be slightly incorrect. The histogram is roughly normally
distributed, a large improvement from the previous one. The residuals vs. the order of the data
looks well behaved. Consider the following standardized residual plots vs. each predictor:

Both these plots look well behaved.


So what insight does all this reveal? For me, the model of cube does not have much of an
impact on my times. This adds strength to the argument that skill is more important than cube
quality. However, for me, not listening to music tends to yield better times. This may mean that
though music may create energizing or comforting effects, it may also be distracting. The Music
effect is virtually the same among all cube Models, and the Model effect is virtually the same
among Yes Music and No Music. Also, the regression was not very strong, as only around 20%
of the variability in Time could be accounted for by the Model and Music effects. These
conclusions tell me that in order to improve as a speedcuber, the most important thing for me to
do is simply practice more and improve my skill and maybe memorize a few more algorithms to
help speed things up. I need not upgrade my equipment or seek the aid of external influences
such as music. But, when going for record times or competitiveness, it might be a good idea to
possibly gain a slight edge by using ZhanChi and not listening to music.

*: I solve with a simplified variant of the Fridrich (CFOP) method: solve the cross on one side
intuitively, solve the first two layers (F2L) intuitively, orient the last layer (OLL) with one to
three algorithms, and finally permute the last layer (PLL) with one or two algorithms. All solves
in the data set were solved with this method. All solves also included a 10 second inspection time
prior to the first turn. These 10 seconds were not included in the values of the observations in
Time. The scrambles were generated randomly by a computer program on the site of the timer,
http://www.cubetimer.com/.
*: Tensioning refers to the adjusting of the tensions in the screws that fasten each face of the
cube to the core. Tensioning is done to tweak the ability of the cube to cut corners to ones liking,
which should improve cube performance and therefore solve times.
*: Lubrication refers to the application of lubricant to the cube to reduce friction and improve
smoothness, speed, corner cutting, as well as durability, and therefore solve times. I use Lubix, a
silicone based lubricant manufactured by Lubix Cube.
*: DaYan, a Chinese speedcube manufacturer, is widely regarded as the worlds best.

Вам также может понравиться