Chapter 3 Literature

Some literature on approaches to synthetic population generation:

3.1 PopGen

Synthetic population generator: PopGen (Mobility Analytics Research Group, 2016)

Enhanced synthetic population generator that accomodates control variables at multiple geographic resolutions (Konduri, You, Garikapati, and Pendyala, 2016)

A methodology to match distributions of both household and person attributes in the generation of synthetic populations (Ye, Konduri, Pendyala, Sana, and Waddell, 2009)

The PopGen software (MARG 2016) was designed by the Mobility Analytics Research Group, with the main team of scientists cited in the second source above (Konduri et al. 2016), and was last updated in 2016.

The methodology was presented at the 8th Annual Meeting of the Transportation Research Board in Washington, D.C., USA and is described in (Ye et al. 2009).

N.B. Might want more here but methods are mathematically complex; requires further reading.

3.2 PopSynWin

Creating a synthetic population: A comparison of tools (Jain, Ronald, and Winter, 2015)

This article was produced for the 3rd Conference of the Transportation Research Group of India by a group of infrastructure engineers at The University of Melbourne, Australia (Jain, Ronald, and Winter 2015). The work generates and compares synthetic populations using two software programs: PopSynWin (iterative proportional fitting algorithm) and PopGen (see above, iterative proportional update algorithm). Differences between actual and synthesized population characteristics are presented. Authors concluded that the PopGen software yielded better results, with closer matches of person level characteristic distributions to that of the actual population.

3.3 synthpop R

synthpop: Bespoke Creation of Synthetic Data in R (Nowok, Raab, & Dibben, 2016)

The synthpop R package was published in 2016 and is described in (Nowok, Raab, and Dibben 2016) with a step-by-step example.

Can choose between several sampling method options to develop synthetic data:
- random sample from observed data (default)
- function of other synthesized data
- non-parametric methods: classification and regression trees
- parametric methods: synthesis based on variable type (numeric, binary, unordered and order factors)
  - normal linear regression (preserving marginal distribution/not)
  - logistic regression
  - (ordered/not) polytomous logistic regression
  - predictive mean matching)

3.4 simPop R

Simulation of Synthetic Complex Data: The R Package simPop (Templ, Meindl, Kowarik & Dupriez, 2017)

The simPop R package was published in 2022 and is described in (Templ et al. 2017).

References

Jain, Shubham, Nicole Ronald, and Stephan Winter. 2015. “Creating a Synthetic Population: A Comparison of Tools.” In Proceedings of the 3rd Conference Transportation Reserch Group, Kolkata, India, 17–20.

Konduri, Karthik C, Daehyun You, Venu M Garikapati, and Ram M Pendyala. 2016. “Enhanced Synthetic Population Generator That Accommodates Control Variables at Multiple Geographic Resolutions.” Transportation Research Record 2563 (1): 40–50.

MARG. 2016. “PopGen: Synthetic Population Generator.” Mobility Analytics Research Group. http://www.mobilityanalytics.org/popgen.html.

Nowok, Beata, Gillian M. Raab, and Chris Dibben. 2016. “Synthpop : Bespoke Creation of Synthetic Data in r.” Journal of Statistical Software 74 (11). https://doi.org/10.18637/jss.v074.i11.

Templ, Matthias, Bernhard Meindl, Alexander Kowarik, and Olivier Dupriez. 2017. “Simulation of Synthetic Complex Data: The r Package simPop.” Journal of Statistical Software 79 (10): 1–38. https://doi.org/10.18637/jss.v079.i10.

Ye, Xin, Karthik Konduri, Ram M Pendyala, Bhargava Sana, and Paul Waddell. 2009. “A Methodology to Match Distributions of Both Household and Person Attributes in the Generation of Synthetic Populations.” In 88th Annual Meeting of the Transportation Research Board, Washington, DC.