Multiple Imputation

The 2nd Generation ICSD Multiple Imputation strategy is described in:

Curley, C. Krause, R., Hawkins C., Feiock R. (2017)."Dealing with Missing Data: A Comparative Exploration of Approaches Utilizing the Integrated City Sustainability Database."  Urban Affairs Review

The goal of this article is to compare three different techniques to deal with missing data to demonstrate their utility in analyzing survey data using the ICSD. We generate three versions of the ICSD data generated using each of the common missing data techniques–listwise deletion, mean replacement, and multiple imputation–and use them to run three identically specified models. Our analysis reveals great variation in the models’ performance based on the version of data used.  Understanding why data is missing and how to treat the missingness explain the inflation of certain findings as well as null results that diminish theoretical progress.

The multiple imputation approach was most appropriate and resulted in the strongest outcomes.  This is because the missing values in the ICSD are Missing at Random (MAR) and the pattern of missingness that emerged in the multivariate regression model that we estimated would have resulted in many observations being dropped in the absence of value replacement. Despite the strong performance of multiple imputed data in our example, we emphasize that there is not a one-size-fits-all “best” approach for handling missing data and it is imperative that researchers understand the causes behind the missingness in their own data and the consequences of each potential approach.

The mechanics of imputation may be relatively straightforward but by developing ‘informing variables’ –broad groupings of variables that have theoretic relationships–we have greater confidence that the results reflect more accurate explanatory relationships than alternative methods of handling missing data. Overall, the results of our analysis confirm the usefulness of the ICSD in the study of environmental and sustainability and other policy in U.S. cities, and provide suggested pathways for studying urban issues with survey data. City Sustainability Database (ICSD) is the first comprehensive data set of U.S. municipal governments and their sustainability programs and policies. The first generation ICSD merges the separate surveys.