Methodology for the Model-based Small Area Estimates of Tobacco Use and Polices

Statistical models are developed to produce estimates of several key outcomes related to tobacco use and smoking free policies.

The TUS/CPS is designed to produce reliable estimates at the national and state levels. However, to better evaluate tobacco control programs, monitor progress in the control of tobacco use, and conduct tobacco-related research, policy makers, cancer control planners and researchers often need county level data for tobacco related measures. Unfortunately, these areas do not have enough samples to support estimates with adequate precision. Therefore model-based small area estimation techniques are employed to produce county level estimates for a few key outcomes related to tobacco use and smoking free policies.

Model-Based Estimation Procedure for the Main Outcomes

In this section we sketch the model-based estimation procedure for the main outcomes of interest. For a more complete explanation including the mathematical formulation, see Liu and Gilary (2014) and Liu et al (2021).

For each outcome, the goal is to estimate the population parameter (e.g., percentage of adults age 18 and over who live in a residence where smoking inside is not allowed).

The direct estimate of the population parameter is first obtained based on TUS/CPS data from every available county with TUS/CPS sample size bigger than 3, Characteristics (auxiliary variables) describing each county in the U.S. is calculated from the Census or the five-year American Community Surveys. We adopted the Kish design effect formula described by Gabler, Haeder, and Lahiri (1999) to obtain and smooth the associated design effects. The county level direct estimates, the auxiliary variables describing the county, and the smoothed deign effects are then input into a mixed model based on the Fay-Herriot model (Fay and Herriot 1979) with arcsin square root transformation to the direct estimates:

The first level of the model (sampling model) assumes the arcsine square root transformed direct estimate follows a normal distribution with the corresponding population parameter as the mean, the corresponding smoothed sampling variance as the variance. The corresponding smoothed sampling variance for the arcsin square root transformed direct estimate is a simple function of smoothed design effect and sample size.

The second level of the model assumes the transformed population parameter is related to the characteristics describing the counties. In cases where the arcsine square root transformed model fit poorly, we considered model on the original scale (called probability-scale model).

The pool of the candidate auxiliary variables includes thirty county-level demographic and socioeconomic variables obtained from the American community survey, the decennial Census, and other administrative sources. Additionally it includes five state level smoking policy variables including state smoking bans, cigarette taxes, Medicaid coverage for tobacco related treatment, overall tobacco control funding, and year in which a quitting hotline was established.

For each outcome, classical backward model selection procedure is applied to select a reduced set of auxiliary variables. Logarithmic transformation was applied to those auxiliary variables. For probability-scale models, no transformation is applied to the auxiliary variables.

Bayesian approach is used to estimate the parameters of the statistical model. In the Bayesian approach, a prior distribution is assumed for the unknown model parameters and combined with the data using Bayes' rule. Here, the prior distribution of the second level regression parameters (such as the regression coefficients and the variance components) could be thought of as a third level of the model. Alternatively, the approach could be thought of as a Bayesian analysis of a two level model. Since we know little a priori about the second level parameters, we assume a vague prior distribution. That is, we assume a prior distribution that is approximately constant over a wide range of (second level) parameter values. We have performed sensitivity analysis to verify that the choice of the prior distribution does not unduly influence the prevalence estimation results.

Markov Chain Monte Carlo (MCMC) methods have been used to produce the final estimates of the model parameters. Extensive model selection and model diagnosis procedures are used to select the final models and assess the goodness of fit for each model.

Benchmarking the County Estimates to State Estimates

For data periods prior to 2018/2019, after producing the county level modeled estimates for the different outcomes, we control these estimates to the corresponding direct state level estimates computed from TUS/CPS. We make a simple ratio adjustment to the county-level estimates to ensure that they sum to the state totals. The final standard errors for county level estimates are adjusted to reflect this additional level of control. For data period 2018/2019, we decided to release the model-based estimates to the public without benchmarking. This decision was based on the observation that some out of range values resulted from the benchmarking process for the proportion estimates close to 1.0.

Reference
Liu B, Dompreh I, and Hartman A.M. Small area estimation of smoke-free workplace polices and home rules in U.S. counties. Nicotine & Tobacco Research, 2021. ntab015, https://doi.org/10.1093/ntr/ntab015 .

Liu B, Gilary A (2014). Small Area Estimation for the Tobacco-Use Supplement to the Current Population Survey. In JSM Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association. 2542-2552.

Fay, R.E., and Herriot, R.A. Estimates of income for small places: An application of James-Stein procedure to census data. J Am Stat Assoc. Vol. 74 (1979), 269-277.

Gabler, S., Haeder, S. and Lahiri, P. (1999). A Model Based Justification of Kish's Formula for Design Effects for Weighting and Clustering. Survey Methodology, 25, 105-106.

Return to top