Continuous Diffusion for Mixed-Type Tabular Data

Markus Mueller*, Kathrin Gruber, Dennis Fok

*Corresponding author for this work

Research output: Working paperAcademic

Abstract

Score-based generative models (or diffusion models for short) have proven successful across many domains in generating text and image data. However, the consideration of mixed-type tabular data with this model family has fallen short so far. Existing research mainly combines different diffusion processes without explicitly accounting for the feature heterogeneity inherent to tabular data. In this paper, we combine score matching and score interpolation to ensure a common type of continuous noise distribution that affects both continuous and categorical features alike. Further, we investigate the impact of distinct noise schedules per feature or per data type. We allow for adaptive, learnable noise schedules to ensure optimally allocated model capacity and balanced generative capability. Results show that our model consistently outperforms state-of-the-art benchmark models and that accounting for heterogeneity within the noise schedule design boosts the sample quality.
Original languageEnglish
PublisherarXiv
DOIs
Publication statusPublished - 2023

Erasmus Sectorplan

  • Sector plan SSH-Breed

Fingerprint

Dive into the research topics of 'Continuous Diffusion for Mixed-Type Tabular Data'. Together they form a unique fingerprint.

Cite this