About Small Area Estimation

Sample surveys are usually designed to provide reliable estimates for the target survey population and for major population subgroups. Standard survey estimates for the national and major subgroups are termed "design-based" or "direct" estimates because they are based only on the survey data and the selection probabilities for the sample in the subgroup of interest. The direct (or design-based) estimates become problematic when the sample sizes in the subgroups of interest are small (or even zero). In this situation, model-dependent methods are increasingly being used to produce what are termed as "small area" or "indirect" estimates.

The term "small area" usually refers to a small geographic area such as a state, county, municipality, school district, metropolitan area, or a small domain such as a specific age-sex-race group within a large geographic area. A geographical area is regarded as "small" if the area sample is insufficient to yield direct estimates with adequate precision and reliability. In order to make estimates for small areas with adequate levels of precision, it is standard to use indirect estimates that utilize information from outside areas with similar characteristics to the area of interest. Generally, a statistical model is used to obtain indirect estimates for geographical areas considered to be "small".

The information from respondents who are outside the geographical area and other geographical characteristics are incorporated through the use of a statistical model. The use of a model decreases the variability of the small area estimate, but if those characteristics are not chosen properly, it may introduce bias into the estimates. Model diagnosis and evaluation are essential to validate the final models being used.

Frequently Asked Questions

Q. What are direct estimates? Why are there differences between the direct estimates and the estimates from the model?

A. The direct estimates are constructed based only on the survey data and the design weights for the sample in the small areas of interest. When the sample sizes are small, the direct estimates are unreliable. The model-based estimates are based on an explicit statistical regression model which combines the direct estimates and information from auxiliary data on small areas obtained from a variety of external sources. The model-based estimates are generally more reliable and stable assuming that the data fit the model reasonably well. However, when there is not much local data for a particular area, the model-based estimates rely on the relationship between the auxiliary data describing the characteristics of the area and the response variable of interest, e.g., the relationship between median family income and smoking prevalence at the county level.

Q. When should the model-based estimates presented in this website be used?

A. The data user needs to make a decision on when to use the model-based small area estimates according to the situation in their area. The model-based estimates are expected to be better than the direct estimates on average, if the models used are appropriate; but that doesn't mean that the model-based estimates are close to the true values for every area. For areas with large sample sizes from the surveys being used in the estimation, the model-based estimates are influenced by the associated direct estimates. As the area level sample sizes get smaller, the model-based estimates increasingly represent estimates for all the areas with similar profiles based on characteristics as reflected in the auxiliary data, rather than an estimate that reflects any unusual anomalies of a specific area. If, in this later situation, there are some special programs implemented in a specific area to promote certain cancer prevention or cancer screening, those will not be reflected in the estimates, and one needs access to the local data to obtain more accurate estimates.

Researchers and cancer control planners should decide on the utility of these model-based small area estimates for their particular application based on the strengths and limitations discussed on this website. We hope that users will provide feedback to the NCI on the uses of these estimates. While these estimates may have great utility in local and regional cancer control planning, they should be supplemented with local knowledge and information when available.

Q. Can these model-based estimates presented in this website be used to rank and compare counties or states?

A. The model-based estimates alone cannot be used to rank and compare counties because these estimates are associated with random errors which were due to sampling error in the direct estimates and the lack-of-fit of the models. Ranking the model-based estimates accounting for the associated standard errors involves complex statistical techniques and is beyond the scope of this project.

Q. Whom should I contact if I have additional questions or comments?

A. Please contact the Small Area Estimates Web Staff with any questions or comments and your message will be forwarded to the appropriate staff member. We hope that researchers and cancer control planners will provide feedback to the NCI on the utility, strengths, and shortcomings of these new estimates. While these estimates may have great utility in local and regional cancer control planning, they should be supplemented with local knowledge and information when available. Feedback is greatly appreciated, both in terms of the global utility of these estimates, as well as local anomalies.

Return to top