Вы находитесь на странице: 1из 3

Volume 4 Issue 1 January 2012 Researchers Corner Determining Sample Size for a given Confidence Interval

In our October 2011 issue, we discussed precision and reliability in terms of confidence interval and confidence level. There are web sites like

surveysystem.com and raosoft.com which instantly provide the sample size for a given confidence level, confidence interval and population size. One can also know the confidence interval given confidence level, population size and sample size and have cconfidence interval (margin of error) at various levels of confidence. Alternatively, one can also compare the sample sizes needed for different levels of confidence. It is very important to note that the confidence interval calculation assumes strict randomness (i.e., the non-zero equal probability of every item in the population to get selected) of sample of the relevant population. If sample is not truly random, one cannot rely on the confidence intervals. Thus, the success of a survey depends much on how well the sample represents the larger population (randomness) for its validity to generalize to the population.

Given a confidence level, three factors determine the size of the confidence interval (or margin of error):

1. Sample size: The larger the sample size, the smaller the confidence interval (better precision) and the answers truly reflect the population. The relationship is not linear and hence by doubling the sample size we cannot reduce confidence interval by 50%. In other words, Confidence interval (margin of error) generally decreases as the sample size increases, but with diminishing returns only up to a certain point and after that the size of the population does not matter. Confidence interval (margin of error) reveals the imprecision inherent in survey data in terms of range (note that it is not a specific number). As for as confidence level is concerned, 90% confidence level means that 10% of the surveys will not make much sense.

2. Response distribution: The response distribution is the percentage of the sample that picks a particular answer. If 99% of the sample said YES and 1% said NO for a question, the chances of error are remote, irrespective of sample size.

However, if the percentages are 55% and 45%, the chances of error are much more since it is easier to be sure of extreme answers than the mid-range answers. Unfortunately, if not designed properly, questions often lead to median replies. To determine the sample size needed for a given level of accuracy, we can either use the known response distribution or make the most conservative assumption/the worst case percentage (50%).

3. Population size: Lastly, when the population size is "large" (i.e., > 20,000) or unknown sample size does not change much. Let us see with an illustration.

Suppose there are 1000 students in a college and we wish to elicit opinion of students on the infrastructure facilities of the college in terms of Inadequate (I) or Adequate (A). Suppose we select a sample of 50 students and get the result that 15 students opine as I and remaining 35 as A. How well our sample represents entire population is often measured by confidence interval and confidence level. Note that confidence interval (or the margin of error) is the measure of precision and it is given as the plus-or-minus figure (recall marks predictions of students) usually reported in opinion poll results. Volume 4 Issue 1 January 2011 To further simplify the above illustration, let us assume that we are interested in knowing only how many students rate infrastructure as A-Adequate. That is, say out of 50 students, 35 or 70% have rated it as A. There is another dimension to this survey, namely the response rate. It is not practical that all those in the sample will always respond to our survey. For the time being let us assume that all have responded and proceed. Suppose we draw another random sample of 50 students and repeat the survey, the percentage of rating infrastructure as Adequate is 66. Similarly, a third sample of 50 would give a percentage of 74. If we repeat sampling, though initially the range of margin of error (66% to 74%) reduces drastically as the number of samples is increased, the reduction becomes marginal after certain number of samples is tried. Thus, by repeated sampling and carrying out survey, we find the result falls between 66% and 74%. This is what the theory of sampling distribution reveals and the range within which the results fall is the confidence interval (or margin of error). In other words, repeated sampling and/or increasing the sample size decreases the margin of error (i.e., increases the precision). There is no need for a researcher to repeatedly take samples to arrive at desired confidence interval as there are standard tables and even websites, as mentioned earlier, to give confidence interval. Hence the wider the confidence interval we are willing to 2

accept, the more certain we can be that the whole population answers would be within that range. The confidence interval is the amount of error that we can tolerate. Lower margin of error requires a larger sample size. The confidence level is the amount of uncertainty we can tolerate. Higher confidence level requires a larger sample size. M S Sridhar
sridhar@informindia.co.in ----------------------------------------------------------------------------------------------------------------------------- ---------

http://informindia.co.in/iil_newsletter_researhc.asp

Вам также может понравиться