GENERAL QUESTIONS ABOUT SURVEYS
Question: How can statisticians estimate specific characteristics of a population?
Statisticians obtain estimates of population characteristics by collecting data. There are two main methods used to collect data: 1) taking data from every unit in the population of interest (called a census), and 2) taking data from a sample. A census takes a great deal of time and costs a great deal of money. Taking a census is not practical and modern survey design has been developed in order to collect data less expensively and in a more timely manner. In addition, sample surveys are frequently more accurate at the national level because quality control of field data collection is better. Whether one is doing a sample or a census, and no matter what kind of sampling units are selected, interviewers must collect data from entities in the sample called reporting units. Reporting units can be people, households, plots of land, or other items.
Question: How does one select a sample?
There are several procedures used to select samples from a population. One way is to obtain units in an informal way such as by interviewing people coming out of a bank or factory. When one samples in an informal way, objectivity and measures of accuracy are compromised and therefore the study loses credibility. Formal sampling is the preferred method.
Before one can undertake formal sampling, the population of interest is divided into sample units (SUs). This list of SUs is called the sampling frame. The list can be made up of districts, villages, holdings, persons, fields, plots, or other sub-units of a population. A sample of units from the sampling frame should represent all the diversity in the population of interest. Normally, we stratify the list so that we can ensure that the sample is more nearly representative. For example, one might list all the land units and select a sample with too many lowland parcels or too many mountain parcels. Stratification of the land means that similar types of land are grouped together. We select samples from each group so that all groups are represented in proportion to the way we want the samples to be selected. Normally, the population is stratified at many levels. For example, one first divides the land into administrative districts, then land use categories. Within each stratum one constructs sampling units called segments.
In order for the data to be objective, we need to select the sample using random procedures. Random selection allows us to compute sampling errors, (sampling errors are discussed later). Moreover, selecting segments at random ensures objectivity in the sample.
Question: What types of errors are associated with statistical surveys?
There are two types of errors associated with surveys: sampling errors, which are a result of differences among samples, and nonsampling errors, which are a result of sampling frame errors, response errors, or mistakes in processing the data. Of the two, nonsampling errors are usually more important and more difficult to estimate and control. They are a major concern in any survey. The area frame (AF) is designed to control for both types of errors; however, special emphasis is placed on controlling nonsampling errors.
Question: What is sampling error?
Sampling error is survey error associated with the sample. The following example illustrates sampling error.
An area of interest is divided into 30,000 segments of land and a random sample of 300 segments is selected. Data are obtained from the 300 segments and an estimate of wheat acreage is produced by multiplying the total wheat acreage found in the 300 segments by 30,000/300 = 100. If another 300 segments were selected, the estimate would be different. The differences in estimates among samples are called sampling errors.
If the estimates vary considerably, then we would conclude that our estimator has a large variance or sampling error. Estimators with small sampling errors are most desirable. However, there is another criterion that also is important, the element of nonsampling error or bias.
Question: What is nonsampling error (or bias)?
Nonsampling error is the difference between the center of the distribution that defines sampling error and the true value being estimated. This difference is defined as bias. Whether or not the true value being estimated is at the center of the sampling error distribution is controlled by:

  • the completeness of the sampling frame,
  • every element in the population having a known positive chance of selection,
  • accuracy of data collection, recording, and data entry,
  • the technical properties of the estimators.
Often, one cannot determine sampling errors unless other samples of 300 segments are selected and enumerated. However, with proper sampling techniques the variation can be measured with only one sample. The segment to segment variation is used to calculate the sample to sample variation. In essence sample to sample variation is estimated with only one sample. (This is one reason why use of random numbers to select the sample is so important.) One common measure of sampling error is the coefficient of variation (CV). It is a relative sampling error and is expressed as a percent, such as plus or minus 3 percent. For example, with such a margin of error, one might estimate maize to be 427,000 hectares (plus or minus 3 percent). Margins of error are related to the size of the sample. Smaller samples produce larger margins of error than will larger samples. With a sample size of 100, for example, the CV might be plus or minus 6 percent. In order to reduce the CV to 3 %, a sample of 400 might be required.