Вы находитесь на странице: 1из 7

# Sum of Squares The ANOVA can be calculated using either partial sums of squares, sequential sums of squares, or a classical

sum of squares. The default is partial, but this is definable under Edit, Preferences, Math. Partial SS is sometimes referred to as Type III sum of squares. This calculates the SS for a term after correcting for all other terms in the model. This is normally the desired form of sums of squares. A disadvantage is that for a non-orthogonal design, the term SS may not add up to the total SS. Sequential SS is sometimes referred to as Type I sum of squares. This calculates sums of squares in sequence. The SS for a term is corrected only for terms above it on the term list. The term SS will add up to the total SS, but they are order dependent. More Details: Type I Sum of Squares (aka Sequential) Hierarchical decomposition Type I SS is the SS corresponding to each effect adjusted for every other effect preceding it in the model. Example Yhat = b0 + b1x1 + b2x2 +b12x1x2 +b3x3 + b13x1x3 + b23x2x3 + b123x1x2x3 SS(b2) = SS(b2|b0, b1) SS(b13) = SS(b13|b0, b1, b2, b12, b3) Type II Sum of Squares (aka Classical) Type II SS is the reduction in the SSerror due to adding an effect after all other terms have been added to the model except effects that contain the effect being tested. Example Yhat = b0 + b1x1 + b2x2 +b12x1x2 +b3x3 + b13x1x3 + b23x2x3 + b123x1x2x3 SS(b2) = SS(b2|b0, b1, b3, b13) SS(b13) = SS(b13|b0, b1, b2, b12, b3, b23) Type III Sum of Squares (aka Partial) Type III SS is the SS corresponding to each effect adjusted for every other effect in the model. Example Yhat = b0 + b1x1 + b2x2 +b12x1x2 +b3x3 + b13x1x3 + b23x2x3 + b123x1x2x3 SS(b2) = SS(b2|b0, b1, b12, b3, b13, b23, b123) SS(b13) = SS(b13|b0, b1, b2, b12, b3, b23, b123) Post ANOVA and Prediction Equations This section provides definitions for the post-ANOVA information for the individual terms. Factor: Experimental variables selected for inclusion in the predictive model. Coefficient Estimate: Regression coefficient representing the expected change in response y per unit change in x when all remaining factors are held constant. In orthogonal designs, it equals one half the factorial effect. Coefficient Estimate for General Factorial Designs: Coefficients for multi-level categorical factors are not as simple to interpret. They do not have a physical meaning, but do have a mathematical meaning. Beta(1) is the difference of level 2 from the overall average. Beta(2) is

the difference of level 3 from the overall average. Beta(k) is the difference of level (k+1) from the overall average. The negative sum of the coefficients will be the difference of level 1 from the overall average. Dont use these coefficients for interpretation of the model use the model graphs! DF: Degrees of Freedom equal to one for testing coefficients. Standard Error: The standard deviation associated with the coefficient estimate. 95% CI High and Low: These two columns represent the range that the true coefficient should be found in 95% of the time. If this range spans 0 (one limit is positive and the other negative) then the coefficient of 0 could be true, indicating the factor has no effect. VIF: Variance Inflation Factor Measures how much the variance of the model is inflated by the lack of orthogonality in the design. If the factor is orthogonal to all other factors in the model, the VIF is one. Values greater than 10 indicate that the factors are too correlated together (they are not independent.) The predictive model is listed in both actual and coded terms. (For mixture experiments, the prediction equations are given in actual, real and pseudo values of the components.) The coded (or pseudo) equation is useful for identifying the relative significance of the factors by comparing the factor coefficients. This comparison cannot be made with the actual equation because the coefficients are scaled to accommodate the units of each factor. The equations give identical predictions. These equations, used for prediction, have no block effects. Blocking is a restriction on the randomization of the experiment, used to reduce error. It is not a factor being studied. Blocks are only used to fit the observed experiments, not to make predictions. For Linear Mixture Models Only: The coefficient table is augmented for linear mixture models to include statistics on the adjusted effects. Since the linear coefficient cannot be compared to zero, the linear effect of component i is measured by how different the ith coefficient is from the other (q-1) coefficients. The t-test is not applicable to the mixture coefficient estimates, but can be applied to the adjusted effect. When the design space is not a simplex, the formula for calculating the component effects must be adjusted for the differences in the ranges, which also provides an adjusted linear effect. For One Factor Designs Only: The next section in the ANOVA lists results for each treatment (factor level) and shows the significance of the difference between each pair of treatments. Estimated Mean: The average response at each treatment level. Standard Error: This is the standard error associated with the calculation of this mean. It comes from the standard deviation of the data divided by the square root of the number of repetitions in a sample. Treatment: This lists each pairwise combination of the factor levels. Mean Difference: This is the difference between the average response from the two treatments. DF: This is the degrees of freedom associated with the difference. Standard Error: This is the standard error associated with the difference between the two means.

t value: This is calculated by the Mean Difference divided by the Standard Error. It represents the number of standard deviations separating the two means. Prob>t: This is the probability of getting this t-value if the two means are not different. A value less than 0.05 indicates that there is a statistically significant difference between the means. A value larger than 0.10 indicates that there is no difference between the means. Prediction Equations Design Expert provides prediction equations in terms of actual units and coded units. In the case of mixture designs, the options are actual, pseudo and real units. The coded equations are determined first, and the actual equations are derived from the coded. Experimenters often wonder why the equations look so different, even to the point of having different signs on the coefficients. To get the actual equation, replace each term in the coded equation with its coding formula:

Substituting the formula into each linear term will result in a new linear coefficient and a correction to the intercept. Substituting the formula into each quadratic term will result in a new quadratic coefficient and a correction to the intercept. Substituting the formula into each interaction term will result in a new interaction coefficient, a correction to each main effect in the interaction and a correction to the intercept. These corrections from the interactions can be large and opposite in sign from the linear terms and can change the sign on the linear terms. Interpretation of R-squared Experimenters frequently ask the question "What is a good R-squared value? How low can it be before the results are not valid?" First of all, experimenters should be focusing on the adjusted R-squared and predicted R-squared values. The regular R-squared can be artificially inflated by simply continuing to add terms to the model, even if the terms are not statistically significant. The adjusted R-squared basically plateaus when insignificant terms are added to the model, and the predicted R-squared will decrease when there are too many insignificant terms. A rule of thumb is that the adjusted and predicted R-squared values should be within 0.2 of each other. There is no commonly used "cut-off" value for R-squared. Focus on thinking about the objective of the experiment. If the objective is to create a model that will accurately model a process so that I can determine very precise optimum parameter settings (often this is with response surface or mixture designs), then it is desirable to have a high adjusted and predicted R-squared (preferably .70+). In this case you need more than just to identify significant factors, you need to make sure you are modeling HOW the factors affect the responses and that you are not leaving anything out. The other objective is one where the primary concern is to simply identify factors and interactions that are affecting the response and generally learn if higher or lower factor levels are

better (generally with factorial designs). In this case, you might have the situation where the model is statistically significant and there is no lack of fit, but the R-squared's are low. You can conclude that the significant terms identified are correct and the graphs will show the best directions to follow, but you have not yet found (or controlled) all the sources of variation in the process. There are other things left unidentified which may or may not also give even better results. So, there is no doubt that the factors found are correct and their model is correct. BUT, there is more to be investigated if it is economically beneficial to the company. Do NOT use the model for prediction because it doesn't explain enough of what is going on. A good next step would be to set the known factors at their best settings and then brain-storm about other possible factors and run another DOE Don't Let R-squared Fool You Has a low R2 ever disappointed you during the analysis of your experimental results? Is this really the kiss of death? Is all lost? Lets examine R2 as it relates to design of experiments (DOE) and find out. R2 measures are calculated on the basis of the change in the response (Y) relative to the total variation of the response (Y + ) over the range of the independent factor (the following formula is representative of the concept, not the mathematical calculation):

Lets look at an example. Response Y is dependent on factor X in a linear fashion: We run a DOE using levels X1 and X2 in the figure below to estimate 1. Having the independent factor levels far apart generates a large signal to noise ratio and it is relatively easy to estimate 1. Because the signal (Y) is large relative to the noise (), R2 approaches one. What if we had run a DOE using levels X3 and X4 in the figure below to estimate 1? Having the independent factor levels closer together generates a smaller signal to noise ratio and it is more difficult to estimate 1. We can overcome this difficulty by running more replicates of the experiments. If enough replicates are run, 1 can be estimated with the same precision as in the first DOE using levels X1 and X2. But, because the signal (Y) is smaller relative to the noise (), R2 will be smaller, no matter how many replicates are run!