Frequently Asked Questions

General Questions About Surveys

Q. How can statisticians estimate specific characteristics of a population?

A. Statisticians obtain estimates of population characteristics by collecting data. There are two main methods used to collect data: 1) taking data from every unit in the population of interest (called a census), and 2) taking data from a sample. A census takes a great deal of time and costs a great deal of money. Taking a census is not practical and modern survey design has been developed in order to collect data less expensively and in a timely manner. In addition, sample surveys are frequently more accurate at the national level because quality control of field data collection is better. Whether one is doing a sample or a census, and no matter what kind of sampling units are selected, interviewers must collect data from entities in the sample called reporting units. Reporting units can be people, households, plots of land, or other items.
Q. How does one select a sample?

A. There are several procedures used to select samples from a population. One way is to obtain units in an informal way such as by interviewing people coming out of a bank or factory. When one samples in an informal way, objectivity and measures of accuracy are compromised and therefore the study loses credibility. Formal probability sampling is the preferred method. Before one can undertake formal sampling, the population of interest is divided into sample units (SUs) and every SU has been assigned probabilities. This list of SUs with the probabilities is called the sampling frame. The list can be made up of districts, villages, holdings, persons, fields, plots, or other sub-units of a population. A sample of units from the sampling frame should represent all the diversity in the population of interest. Normally, we stratify the list so that we can ensure that the sample is more nearly representative. For example, one might list all the land units and select a sample with too many lowland parcels or too many mountain parcels. Stratification of the land means that similar types of land are grouped together. We select samples from each group so that all groups are represented in proportion to the way we want the samples to be selected. Normally, the population is stratified at many levels. For example, one first divides the land into administrative districts, then land use categories. Within each stratum one constructs sampling units called segments and selects a probability sample. In order for the data to be objective, we select the sample using probability procedures. Probability sampling allows us to compute sampling errors, (sampling errors are discussed later). Moreover, selecting segments at random ensures objectivity in the sample.
Q. What types of errors are associated with statistical surveys?

A. There are two types of errors associated with surveys: sampling errors, which are a result of differences among samples, and nonsampling errors, which are a result of sampling frame errors, response errors, or mistakes in processing the data. Of the two, nonsampling errors are usually more important and more difficult to estimate and control. They are a major concern in any survey. The area frame (AF), our company designs control both types of errors; however, special emphasis is always placed on controlling nonsampling errors.
Q. What is sampling error?

A. Sampling error is survey error associated with the sample. The following example illustrates sampling error. An area of interest is divided into 30,000 units of land without overlap or omission and a probability sample of 300 segments is selected. Data are obtained from the sample of 300 land parcels and an estimate of wheat acreage is produced by multiplying the total wheat acreage found in the 300 segments by proper sampling fraction, 30,000/300 = 100. If another 300 segments were selected, the estimate would be different. The differences in estimates among samples are called sampling errors. If the estimates made from different samples of 300 vary considerably, then we would conclude that our estimator has a large variance or sampling error. Estimators with small sampling errors are most desirable. However, there is another criterion that also is important, the element of nonsampling error or bias.
Q. What is nonsampling error (or bias)?

A. Nonsampling error is the difference between the center of the distribution that defines sampling error and the true value being estimated. This difference is defined as bias. Whether or not the true value being estimated is at the center of the sampling error distribution is controlled by: the completeness of the sampling frame, all units of the population having a known positive chance of selection, accuracy of data collection, recording, and data entry, the technical properties of the estimators. Often, one cannot determine sampling errors unless other samples of 300 segments are selected and enumerated. However, with proper sampling techniques, sampling error can be calculated with only one sample. The unit to unit variation is used to calculate the sample to sample variation. In essence sample to sample variation is estimated with only one sample. (This is one reason why use of probability sampling is so important.) One common measure of sampling error is the coefficient of variation (CV). It is a relative sampling error and is expressed as a percent, such as plus or minus 3 percent. For example, with such a margin of error, one might estimate maize to be 427,000 hectares (plus or minus 3 percent). Margins of error are related to the size of the sample. Smaller samples produce larger margins of error than will larger samples. With a sample size of 100, for example, the CV might be plus or minus 6 percent. In order to reduce the CV to 3 %, a sample of 400 might be required.
Q. Will the AF help reduce the sample size?

A. The sample size of a survey is independent of the frame used. The sample size has more to do with the level of summarized data produced. When data are required for the smallest administrative levels then sample sizes are driven high because each administrative unit has a sufficient sample. There are better ways to generate small area estimates that do not require increasing the sample past the point where the sample can be managed properly by the survey institutions. The AF could also have a large sample and it would therefore encounter some of the same problems that the current system encounters with respect to data quality and management. However, nonsampling errors are easier to control in AF surveys than in list frames.

Questions About Area Frame Sampling Technology

Q. What is an area frame?

A. An area frame (AF) is a special case of cluster sampling. The SUs are areas of land commonly called segments or area parcels, which have identifiable boundaries. The goal is to divide the entire land area of interest into SUs and to select a sample of such segments. The process of area sampling is usually accomplished by selecting the sample in stages, an approach that avoids the necessity of dividing the entire population into segments. An AF is suitable for general purpose sampling and is designed for obtaining information about variables associated with land such as crops, livestock, forests, soil, and ground water. Households inside the segments can also be associated with land. At the end of AF construction, a small sample of representative land segments or parcels will be selected for data collection. Typically, land within the selected segments account for less than one percent of the country's total land area. In the US, the sample accounts for one half of one percent of the land.
Q. How is an area frame constructed?

A. In order to construct an AF, both maps and satellite images or ortho-photos of the land area are used. An AF is constructed as follows: The total land of the country is divided into administrative areas. Imagery and maps are used to subdivide the land in administrative strata and landuse strata such as cropland, range, woods, estates, cities and wastelands. Small sampling units (SUs) are constructed using imagery. These SUs are numbered and a small sample of land parcels is selected in each stratum to represent the land in that stratum. The selected SUs are called segments. There are segments selected to represent all strata with land of interest to the data users.
Q. How do you determine sample size?

A. The sample size is determined by the uses to be made of the data, resources available to collect and summarize data, and institution expertise. The sample can be 350 to 3000 segments. These segments will differ in size from one stratum to the next. For example, in range land, segments may contain 400 hectares or more. In cropland, segments may contain 10 to 30 hectares. In cities, the segments may be less than a city block.
Q. How are AF methods implemented?

A. Implementation of AF methods is straight forward: Divide the land area of interest into homogeneous farming systems and land use strata; Subdivide each stratum into (Ni) units of land without overlap or omission; Select a representative sample of (ni) land units called segments from each stratum; Collect the desired information from the segments without error; and, Estimate population totals by multiplying sample totals by the proper expansion factors (Ni/ni).
Q. What are advantages and disadvantages of AF?

A. Among sampling frames, the AF has the following advantages: The AF is permanent for 10 years or more. When employing AF methods, the same segments of land are used for many surveys and thus measure change accurately. The AF is useful for environmental and natural resource data as well as agricultural data. The enumerators can be quality checked with a smaller sample size. Data can be collected in an integrated fashion. There are several types of advanced methods of data collection that can be integrated with the AF method that will provide specialized data such as commercial crops and deforestation. AF data will withstand professional scrutiny. An AF is complete, has no duplication, and facilitates data collection in the field. Disadvantages of AF are primarily related to the expense of setting up the system. However, over a period of 10 years an AF is more cost-effective than other sampling frames.
Q. How can Geographic Information System (GIS) technology support AF technology?

A. GIS technology allows one to establish logical relationships among layers of information digitized and entered into a computer. For example, roads of a country might be overlaid onto soil maps in order to identify areas where produce can be transported easily. An essential part of a GIS system is the quality of the data entered. For example, some environmental parameters involve observing the invertebrates in the bottom of a stream. Such minute detail must be collected in a scientific manner before it is useful in a GIS. AAIC personnel are concerned with the quality of data going into the GIS. By using an AF, our data help GIS technology reflect more accurate relationships.
Q. What data can the AF provide?

A. The AF is designed to collect data from the fields and farmland and from the households that are located inside the segments. For agricultural data, the first survey of each growing season is conducted after planting. Interviewers go to the segments and collect data on the number of hectares planted to each crop. At that time AAIC will recommend collecting data on soil erosion and land use. Close to harvest, enumerators can go back to these same fields and do crop cutting surveys, which estimate yields and, subsequently, total production. Yield estimates also may be obtained by asking farmers what they harvested (i.e., farmer recall). In addition, we can employ agrometeorological (or agromet) yield models. These models simulate crop growth in the computer. Weather data, rainfall, solar radiation, crop variety, soil fertility, and good historic yield data are required. With several years of data, models are calibrated to conditions in a country and crop yields can be forecasted accurately.
Q. Since animals are mobile, how do you estimate number of heads of livestock?

A. One can improve data from an AF for specific variables such as livestock by obtaining additional data through list sampling. When data are collected from both the AF and list frames, and combined removing any duplication, we call this method multiple frame sampling (MFS). Most countries improve data with MFS methods.
Q. How are large estate farms surveyed?

A. MFS, discussed in the previous section, is employed when dealing with estate farms or what we in the US refer to as extreme operators or large corporate operations. A list of large farms is prepared and a sample selected in order to represent the list. Data are collected for the smaller farms from the AF and these data are combined in a way that avoids duplication. When MFS is employed, the advantages of list frames (efficient for farms on the list) and AF (complete for all items) are reached.
Q. Can the AF system provide baseline, midterm and end of project data?

A. An AF can be constructed for a project area that provides baseline, midterm and end of project data. Because permanent plots are established, measures of change are both accurate and cost effective.

Questions About Use of Satellite Imagery

Q. How does satellite imagery improve estimates of crop area?

A. Satellite imagery has two uses: to stratify land into land use strata, to improve estimates of land cover, and to estimate crops directly using current imagery and classification technology. The first use of stratification has already been discussed. This question will deal with the use of digital satellite imagery to improve estimates of land cover and to reduce survey error. The AF is a perfect tool to provide representative ground data for statistical calibration for atmospheric and growth state. In general, an AF makes estimates of populations based on small samples of segments. Satellite imagery covers an entire population (all sampling units) but the information is not perfect. That is, one has reflected energy at certain distances above sea level for all sampling units in the population. One must use the reflected energy in the satellite imagery in order to reduce the sampling error of the estimates. With AF methods, one has actual ground observations. One uses reflected energy from known fields in order to calibrate the reflected energy. We then classify the entire satellite image. The last step is to evaluate the misclassification of the satellite data using area frame ground data and adjust the full frame classification based on misclassification identified in the area frame segments where ground truth is available.
Q. What are the prerequisites to use digital satellite imagery to identify specific crops?

A. A few basic prerequisites must be met before a decision is made to use digital data to identify specific crops and land cover. If one prerequisite is lacking, the entire effort may be seriously hampered. The prerequisites are the following: Satellite availability Current digital satellite images must be available for the area of interest. In most areas of the world, obtaining satellite data is not a major problem. Moreover, the new satellite systems provide a wide variety of resolution size and spectral bands that are available to solve problems. Acquisition Date The acquisition of imagery must be timed to correspond with the critical periods in the growing cycles of crops required when the spectral values of the crops allow them to be differentiated. Ground Data Representative ground data from an area frame must be available. Sufficient Field Size Fields of sufficient size of all crops to be classified must be available in sufficient number to obtain accurate signatures of the crops. Sound Statistical Methodology Using digital data requires sound statistical methodology. The area frame method used is one AAIC staff helped develop and one that is used by experienced personnel. Currently it is used in the United States by the National Agricultural Statistics Service and in SUPARCO in Pakistan. General Resources Resources and personnel are required to implement the technology for: collecting data in the field with a probability survey, obtaining digital satellite and area frame ground data, performing digital analysis. If these requirements can be met, using digital satellite data to improve crop and land-cover estimates can be rewarding.
Q. When Does Digital Image Analysis Become Cost Effective?

A. Every country is different, however, implementing digital satellite technology to identify crops can be expensive so the improved accuracy must be used by the policy and planning managers must be worth spending the extra money. It is important to do a cost analysis to understand the costs as well as the value of the improved accuracy.
Q. How can a group learn more about area frame methods?

A. One possible approach would be to arrange for AAIC staff members to conduct a seminar at your local office to explain how it impacts estimates of: 1) crop production, 2) yield forecasts, 3) livestock production, 4) natural resources and environmental parameters, 5) social and economic status, and 7) land tenure. AAIC's professional staff can present a two hour or two-day seminar or discuss individual cases with project managers and program officers. A combination of both is also possible. Summary By using area frame technology, it is possible to develop a standard survey methodology that allows accurate agricultural, natural resource, environmental, and social data to be collected that will enable data to be compared across years and to establish relationships among variables in different sectors. Area frame methodology offers objective accurate comprehensive and timely data at a reduced long-term cost. Surveys at the beginning of the project are based on a methodology that can be used again for mid-term and end of project data collection. This is crucial since different survey methods produce different survey results even if the population sampled is identical. In order to have comparable data, survey design and procedures must remain constant. AAIC would select segments at the beginning of the project to represent the project area and use those same segments throughout the project life so that we have a statistically controlled monitoring system. In many areas of the world there is a great need for better crop and livestock statistics at the national, provincial and project levels. Most of us in AAIC have devoted our lives to these advanced data collection methods that generate timely, comprehensive, objective, credible, replicated (time series) data. You will find our work useful, cost effective, and able to generate data that can be defended scientifically. Although surveys with specialized purposes are useful, a comprehensive general purpose system will allow project managers to compare data collected across years by different institutions. AAIC would welcome the opportunity to facilitate development of this kind of system in collaboration with donor projects.