On integrating the number of synthetic data sets m into the a priori synthesis approach

Jackson, James and Mitra, Robin and Francis, Brian and Dove, Iain (2022) On integrating the number of synthetic data sets m into the a priori synthesis approach. In: Privacy in Statistical Databases : International Conference, PSD 2022, Paris, France, September 21–23, 2022, Proceedings. Lecture Notes in Computer Science . Springer, Cham, pp. 205-219. ISBN 9783031139444

Text (Multiple_Data_Sets (19))
Multiple_Data_Sets_19_.pdf - Accepted Version
Available under License Creative Commons Attribution.
Download (865kB)

Abstract

The synthesis mechanism given in Jackson et al. (2022) uses saturated models, along with overdispersed count distributions, to generate synthetic categorical data. The mechanism is controlled by tuning parameters, which can be tuned according to a specific risk or utility metric. Thus expected properties of synthetic data sets can be determined analytically a priori, that is, before they are generated. While Jackson et al. (2022) considered the case of generating m = 1 data set, this paper considers generating m > 1 data sets. In effect, m becomes a tuning parameter and the role of m in relation to the risk-utility trade-off can be shown analytically. The paper introduces a pair of risk metrics, τ3(k,d) and τ4(k,d) that are suited to m > 1 data sets; and also considers the more general issue of how best to analyse categorical data sets: average the data sets pre-analysis or average results post-analysis. Finally, the methods are demonstrated empirically with the synthesis of a constructed data set which is used to represent the English School Census.

Item Type:

Contribution in Book/Report/Proceedings

Uncontrolled Keywords:

Research Output Funding/yes_externally_funded

Subjects:

?? yes - externally fundedno ??

Departments:

Faculty of Science and Technology > Mathematics and Statistics

ID Code:

178318

Deposited By:

ep_importer_pure

Deposited On:

01 Nov 2022 11:30

Refereed?:

Yes

Published?:

Published

Last Modified:

13 Dec 2025 13:41

URI:

https://eprints.lancs.ac.uk/id/eprint/178318

Altmetric