Chapter 6 Application

Here, we detail the methods used to apply the parcel-level synthetic population to a use-case example as well as a hypothetical health impact assessment.

6.1 Use-case: proximity to major roadway

We assign proximity to major roadway for each parcel as a proxy for air pollution using the following scripts:

[04] - [Distance-to-road] - CALCULATE.R

  • Calculates shortest distance to major roadways.

[04b] - [Distance-to-road] - MERGE with synth pop.R

  • Merges the distances for parcels with the synthetic population households in the respective parcels.

[05a] - [Distance-to-road] - Make calculated and misclassified datasets.R

  • Randomize to make a ‘misclassified’ version of the parcel-level data

[05b] - [Distance-to-road] - Make combined dataset.R

  • Combine calculated and ‘misclassified’ versions into one dataset

[05c] - [Distance-to-road] - Make combined plots and maps.R

  • Creates plots and maps showing the results

[06a] - [Distance-to-road] - Checking for related variables.R

  • Examines potential social determinants of health that may be associated with proximity to major roadway (as a proxy for air pollution exposure)

6.2 Hypothetical health impact assessment

To evaluate whether the use of a synthetic population produced with matched vs. random allocation leads to different conclusions about health impacts, we conducted a hypothetical health impact assessment.

Health impact assessments use exposure-response associations derived from the epidemiological literature. We constructed a conceptual model for a logit regression that can be applied for our example with both a main effect and a modifier variable (for example, income-level).

\[ \text{logit}(Y) = \beta_0 + \beta_M \cdot \text{modifier} + \beta_E \cdot \text{exposure} + \beta_I \cdot \text{modifier} \cdot \text{exposure} \]

We used this model to estimate: the change in cases among householders with the exposure and modifier, testing different exposure-modifier associations and model coefficients.

While this model is presented as a hypothetical, exposures might include:

  • particulate matter
  • heat index
  • greenspace

Modifiers may include:

  • median household income
  • householder race and ethnicity
  • householder tenure
  • householder education status

The code used to conduct the hypothetical health impact assessment is provided here:

[HIA] - hypothetical health impact assessment.R

Basra, Komal, M. Fabian, Raymond Holberger, Robert French, and Jonathan Levy. 2017. “Community-Engaged Modeling of Geographic and Demographic Patterns of Multiple Public Health Risk Factors.” International Journal of Environmental Research and Public Health 14 (7): 730. https://doi.org/10.3390/ijerph14070730.
Geographic Information, Massachusetts Bureau of. 2025. MassGIS Data: Property Tax Parcels.” https://www.mass.gov/info-details/massgis-data-property-tax-parcels.
Jain, Shubham, Nicole Ronald, and Stephan Winter. 2015. “Creating a Synthetic Population: A Comparison of Tools.” In Proceedings of the 3rd Conference Transportation Reserch Group, Kolkata, India, 17–20.
Konduri, Karthik C, Daehyun You, Venu M Garikapati, and Ram M Pendyala. 2016. “Enhanced Synthetic Population Generator That Accommodates Control Variables at Multiple Geographic Resolutions.” Transportation Research Record 2563 (1): 40–50.
Levy, Jonathan I., Maria Patricia Fabian, and Junenette L. Peters. 2014. “Community-Wide Health Risk Assessment Using Geographically Resolved Demographic Data: A Synthetic Population Approach.” Edited by Yu-Kang Tu. PLoS ONE 9 (1): e87144. https://doi.org/10.1371/journal.pone.0087144.
MARG. 2016. PopGen: Synthetic Population Generator.” Mobility Analytics Research Group. http://www.mobilityanalytics.org/popgen.html.
Milando, Chad W., Maayan Yitshak-Sade, Antonella Zanobetti, Jonathan I. Levy, Francine Laden, and M. Patricia Fabian. 2021. “Modeling the Impact of Exposure Reductions Using Multi-Stressor Epidemiology, Exposure Models, and Synthetic Microdata: An Application to Birthweight in Two Environmental Justice Communities.” Journal of Exposure Science & Environmental Epidemiology 31 (3): 442–53. https://doi.org/10.1038/s41370-021-00318-4.
Nowok, Beata, Gillian M. Raab, and Chris Dibben. 2016. Synthpop : Bespoke Creation of Synthetic Data in r.” Journal of Statistical Software 74 (11). https://doi.org/10.18637/jss.v074.i11.
Templ, Matthias, Bernhard Meindl, Alexander Kowarik, and Olivier Dupriez. 2017. “Simulation of Synthetic Complex Data: The r Package simPop.” Journal of Statistical Software 79 (10): 1–38. https://doi.org/10.18637/jss.v079.i10.
Williamson, Paul. 2007. CO Instruction Manual V2.0 Working Paper 2002/1.” Population Microdata Unit, Dept. Of Geography, University of Liverpool, June. https://pcwww.liv.ac.uk/~william/microdata/CO%20070615/CO_software.html.
Ye, Xin, Karthik Konduri, Ram M Pendyala, Bhargava Sana, and Paul Waddell. 2009. “A Methodology to Match Distributions of Both Household and Person Attributes in the Generation of Synthetic Populations.” In 88th Annual Meeting of the Transportation Research Board, Washington, DC.