Вы находитесь на странице: 1из 4

Chapter 7

Conclusion

This paper discussed a range of imputation methods to compensate for partial

nonresponse in survey data and showed proofs on the disadvantages and advantages of the

methods. It showed that when applying imputation procedures, it is important to consider the

type of analysis and the type of point estimator of interest. Whether the goal of the researcher is

to produce unbiased and efficient estimates of means, totals, proportions and official aggregated

statistics or a complete data file that can be used for a variety of different analyses and by

different users, the researcher should clearly identify first the type of analysis and the type of

estimator of interest that will suit his or her purpose.

Anyone faced with having to make decisions about imputation procedures will usually

have to choose some compromise between what is technically effective and what is operationally

expedient. If resources are limited, this is a hard choice. It is to be hoped that this study might be

helpful to some in guiding that choice.

There are other issues to consider in determining which imputation method should be

used for a particular assumption. There are several practical issues that involve the case of

implementation, such as difficulty of programming, amount of time it spends and complexity of

the procedures used.


For our particular implementation, all of the methods were run to a programming

language due to the unavailability of software that can generate imputations for all the methods

these researchers intended to use. In all of the methods, the overall mean imputation was the

simplest and easiest to use and to create a computer program. The other three methods required

the formation of imputation classes. Both regression imputations were the hardest to program

and the most time consuming imputation methods.

The performance of several imputation methods in imputing partial nonresponse

observations was compared using the 1997 Family Income Expenditure Survey (FIES) data set.

A set of criteria were computed for each method based on the data set with imputed values and

data set with actual values to find the best imputation method for this data set. The criteria in

judging the best method were the bias and variance estimates of the population mean of the

imputed data, the preservation of the distribution by the actual data, and the other measures of

accuracy and precision incorporated from the study of Kalton (1983).

The results show that the choice of imputation method significantly affected the estimates

of the actual data. The similarities among the best two methods, namely, the deterministic and

stochastic regression imputation methods were due in part to the adequacy and prediction power

of the models.

The bias and variance estimates of the population mean of the imputed data obtained

appeared to vary much across imputation methods and it was unexpected that the hot deck

imputation method rendered the highest estimates in majority of the nonresponse rates as well as
its variables. Stochastic regression, on the other hand, was the best method in that particular

criterion since in majority of the results in the tests it delivered relatively small biases and

variances.

The distributions of the imputed data of each method were checked for the preservation

of the distribution using the Kolmogorov-Smirnov Goodness of Fit test. In the methods used in

this study, both regression imputation methods retained the distribution of the data especially the

deterministic regression imputation that generated exactly the same distribution as the actual

data.

In the other tests of accuracy and precision, namely, the mean deviation, mean absolute

deviation and root mean square deviation, the different methods provided mixed results in all

nonresponse rates. The results for some methods did not consistently and clearly yielded good

results. Only half of the methods used provided great results in one particular criterion which is

the preservation of the distribution of the data. In the other results, inconsistency was obviously

seen due to the frequent alternating rankings from each method.

Given the criteria and procedures in judging the best imputation procedure from the set of

four methods, the selection of the best method was difficult. Consequently, in order to determine

the best method of imputing nonresponse observation for each variable in the study, the methods

were ranked according to several criteria. The rank value registered a value 1 if it ranks first and

4 if it ranks worst in one particular criterion.


After comparing the methods, the two regression method namely the deterministic

regression and stochastic imputation method gave the outstanding results. The results were

ranking first and second and vice-versa in majority of the criteria. The researchers concluded that

the stochastic regression imputation procedure is considered the best imputation method for this

study.

The efficiency of the imputation method was supported by the coefficient of

determination of the model and the added random residual in the deterministic imputed value.

The random residuals added to the deterministic imputation provided a change in making the

estimates less biased than its deterministic counterpart.

Deterministic regression imputation method performed much better than hot deck

imputation method. It is surprising that the hot deck imputation method was less efficient than

deterministic regression where in the related studies; it emerged as the better method than

deterministic regression. Most likely the selection of donors with replacement might be the cause

of this downfall and not the imputation classes. If it were the imputation classes, then both

regression imputation methods estimates could be as worse as the hot deck imputation even if the

model is adequate.

Вам также может понравиться