Chapter 3 Literature
Some literature on approaches to synthetic population generation:
3.1 PopGen
Synthetic population generator: PopGen (Mobility Analytics Research Group, 2016)
Enhanced synthetic population generator that accomodates control variables at multiple geographic resolutions (Konduri, You, Garikapati, and Pendyala, 2016)
A methodology to match distributions of both household and person attributes in the generation of synthetic populations (Ye, Konduri, Pendyala, Sana, and Waddell, 2009)
The PopGen software (MARG 2016) was designed by the Mobility Analytics Research Group, with the main team of scientists cited in the second source above (Konduri et al. 2016), and was last updated in 2016.
The methodology was presented at the 8th Annual Meeting of the Transportation Research Board in Washington, D.C., USA and is described in (Ye et al. 2009).
N.B. Might want more here but methods are mathematically complex; requires further reading.
3.2 PopSynWin
Creating a synthetic population: A comparison of tools (Jain, Ronald, and Winter, 2015)
This article was produced for the 3rd Conference of the Transportation Research Group of India by a group of infrastructure engineers at The University of Melbourne, Australia (Jain, Ronald, and Winter 2015). The work generates and compares synthetic populations using two software programs: PopSynWin (iterative proportional fitting algorithm) and PopGen (see above, iterative proportional update algorithm). Differences between actual and synthesized population characteristics are presented. Authors concluded that the PopGen software yielded better results, with closer matches of person level characteristic distributions to that of the actual population.
3.3 synthpop R
synthpop: Bespoke Creation of Synthetic Data in R (Nowok, Raab, & Dibben, 2016)
The synthpop R package was published in 2016 and is described in (Nowok, Raab, and Dibben 2016) with a step-by-step example.
Can choose between several sampling method options to develop synthetic data:
- random sample from observed data (default)
- function of other synthesized data
- non-parametric methods: classification and regression trees
- parametric methods: synthesis based on variable type (numeric, binary, unordered and order factors)
- normal linear regression (preserving marginal distribution/not)
- logistic regression
- (ordered/not) polytomous logistic regression
- predictive mean matching)
3.4 simPop R
Simulation of Synthetic Complex Data: The R Package simPop (Templ, Meindl, Kowarik & Dupriez, 2017)
The simPop R package was published in 2022 and is described in (Templ et al. 2017).