Using saturated models for data synthesis

Jackson, James and Francis, Brian and Mitra, Robin and Dove, Iain (2022) Using saturated models for data synthesis. In: Proceedings of the 36th International Workshop on Statistical Modelling : July 18-22, 2022 - Trieste, Italy. EUT Edizioni Università di Trieste, Trieste 2022, ITA, pp. 205-210. ISBN 9788855113090

[thumbnail of Jackson et al. (2022) IWSM]
Text (Jackson et al. (2022) IWSM)
Jackson_et_al._2022_IWSM.pdf - Published Version
Available under License Creative Commons Attribution.

Download (727kB)

Abstract

The use of synthetic data sets are becoming ever more prevalent, as regulations such as the General Data Protection Regulation (GDPR), which place greater demands on the protection of individuals’ personal data, are coupled with the conflicting demand to make more data available to researchers. This paper discusses the approach of synthesizing categorical data at the aggregated (contingency table) level using a saturated count model, which adds noise - and hence protection - to cell counts. The paper also discusses how distributional properties of synthesis models are intrinsic to generating synthetic data with suitable risk and utility profiles.

Item Type:
Contribution in Book/Report/Proceedings
Subjects:
?? synthetic datadata privacycount models ??
ID Code:
173351
Deposited By:
Deposited On:
14 Dec 2022 13:35
Refereed?:
No
Published?:
Published
Last Modified:
17 Sep 2024 23:56