Statistics is an integral part of design and analysis of the survey data. Usage of statistics is required for determination of correct sample size, collect data in a scientific manner and post survey analysis based on collected data and drawing inferences from the samples through application of statistical measures like measures of central tendency, hypothesis testing, creating predictive models etc.

Determining Sample Size in a Survey: Identifying the appropriate sample size of the survey is one of the most critical aspects of survey design. The sample size should be big enough so that the parameters can be statistically significant. Larger the sample size, more is the statistical confidence in any of the survey metrics. However, having a larger sample size beyond what may be required will lead to additional costs.

Cochran formula: This is the most widespread used method. This method relies on calculation of sample size based on distribution of a binary response variable in a survey. Such questions have answers of “Yes” or “No” in the survey. It is calculated as

Sample size (N) =

 Z * Z * p * (1 - p) e 2

- Where ‘Z’ is the z statistic value of a normal curve at desired confidence interval for example 95%, 90% etc. At or 5% level of significance, Z value is 1.96 and for 1% level of significance it is 2.58. For most surveys, 5% is an acceptable significance. If the desired significance level is 1%, it will significantly increase the sample size.

- Where ‘p’ represents an expected proportion of parameter of interest in the study. This cannot be accurately estimated prior to a survey, however is required as a parameter to calculate the sample size. A good estimate of this parameter should be derived from a previous study. Alternatively, a logical estimate of this parameter can be determined after doing a preliminary survey, for only a limited population, say 30 people. Using this derived estimate of proportion, the correct Sample size can be calculated. Many popular software, online calculators assume it to be 50% by default. Users should be well aware of this assumption if they use such tools to calculate the sample size.

- Where ‘e’ is the acceptable margin of error in parameter of interest. This can be assumed to be 5% or 10% based on the requirement. Lowering the margin of error will lead to increase in the sample size.

As an example, let us suppose, the margin of error is 5%, a previous research study indicates a prevalence of an effect to 30%. If the acceptable confidence interval is 95%, the sample size can be calculated as below

Sample size (N) =

 1.96 * 1.96 * 0.30 * (1 - 0.70) e 2

Sample size = 323

If the response rate is 40%. The same size required will be

 323 0.4 2

= 807.