Model-based Estimates at the State and County Levels

Statistical models are developed to produce estimates of a number of cancer-related knowledge variables.

The HINTS is designed to produce reliable estimates at the national and regional levels. GIS maps using HINTS data have been used to provide a visual representation of possible geographic relationships in HINTS cancer-related variables. Because of instability in some state values from relatively small sample sizes, these maps can only illustrate regional differences. However, there is high interest in looking at specific state level estimates for the variables of interest. Therefore model-based small area estimation techniques are employed to produce state level estimates for a number of cancer-related knowledge variables.

In this section we sketch the model-based estimation procedure for the main outcomes of interest. For a more complete explanation including the mathematical formulation, see Section 5 of the Moser et al (2013) Report [View PDF]

For each outcome, the goal is to estimate the population parameter (e.g., percentage of adults age 18 and over who think that smoking increases their chance of cancer a lot).

The direct estimate of the population parameter is obtained based on HINTS data from every available state for a specific survey year. Characteristics (auxiliary variables) describing each state in the U.S. is calculated from the Census 2000, 2010, and other administrative sources. The associated design effects are obtained using the Kish formula and then smoothed for stabilizing purpose. The state level direct estimates, the auxiliary variables describing the state, and the smoothed deign effects are then input into a mixed model based on the Fay-Herriot model (Fay and Herriot 1979) with arcsin square root transformation to the direct estimates:

The first level of the model (sampling model) assumes the arcsine square root transformed direct estimate follows a normal distribution with the corresponding population parameter as the mean, the corresponding smoothed sampling variance as the variance. The corresponding smoothed sampling variance for the arcsin square root transformed direct estimate is a simple function of smoothed design effect and sample size.

The second level of the model assumes the transformed population parameter is related to the characteristics describing the states.

The pool of the candidate auxiliary variables includes 25+ state level demographic and socioeconomic variables obtained from the decennial Census 2000, 2010, and American Community Survey. For each outcome, classical model selection procedure is applied to select a reduced set of auxiliary variables. Logarithmic transformation was applied to those auxiliary variables.

Bayesian approach is used to estimate the parameters of the statistical model. In the Bayesian approach, a prior distribution is assumed for the unknown model parameters and combined with the data using Bayes' rule. Here, the prior distribution of the second level regression parameters (such as the regression coefficients and the variance components) could be thought of as a third level of the model. Alternatively, the approach could be thought of as a Bayesian analysis of a two level model. Since we know little a priori about the second level parameters, we assume a vague prior distribution. That is, we assume a prior distribution that is approximately constant over a wide range of (second level) parameter values. We have performed sensitivity analysis to verify that the choice of the prior distribution does not unduly influence the prevalence estimation results.

Markov Chain Monte Carlo (MCMC) methods have been used to produce the final estimates of the model parameters. Extensive model selection and model diagnosis procedures are used to select the final models and assess the goodness of fit for each model.

**References**

Moser, R.P., Naveed, S., Cantor, D., Blake, K.D., Rutten, L.J.F., Ramirez, A.S., Liu, B., Yu, M. (2013). HINTS Reports. National Cancer Institute. Section 5, p41-56 [View PDF]

Kish, L. (1965), Survey sampling, New York: John Wiley and Sons.

Fay, R.E., and Herriot, R.A. Estimates of income for small places: An application of James-Stein procedure to census data. *J Am Stat Assoc*. Vol. 74 (1979), 269-277.