Skip to main navigation Skip to search Skip to main content

Continuous Diffusion for Mixed-Type Tabular Data

Research output: Chapter/Conference proceedingConference proceedingAcademicpeer-review

1 Citation (Scopus)

Abstract

Score-based generative models, commonly referred to as diffusion models, have proven to be successful at generating text and image data. However, their adaptation to mixed-type tabular data remains underexplored. In this work, we propose CDTD, a Continuous Diffusion model for mixed-type Tabular Data. CDTD is based on a novel combination of score matching and score interpolation to enforce a unified continuous noise distribution for both continuous and categorical features. We explicitly acknowledge the necessity of homogenizing distinct data types by relying on model-specific loss calibration and initialization schemes. To further address the high heterogeneity in mixed-type tabular data, we introduce adaptive feature- or type-specific noise schedules. These ensure balanced generative performance across features and optimize the allocation of model capacity across features and diffusion time. Our experimental results show that CDTD consistently outperforms state-of-the-art benchmark models, captures feature correlations exceptionally well, and that heterogeneity in the noise schedule design boosts sample quality. Replication code is available at https://github.com/muellermarkus/cdtd.

Original languageEnglish
Title of host publication13th International Conference on Learning Representations, ICLR 2025
Pages72107-72146
Number of pages40
ISBN (Electronic)9798331320850
Publication statusPublished - 2025
Event13th International Conference on Learning Representations, ICLR 2025 - Singapore, Singapore
Duration: 24 Apr 202528 Apr 2025

Publication series

Series13th International Conference on Learning Representations, ICLR 2025

Conference

Conference13th International Conference on Learning Representations, ICLR 2025
Country/TerritorySingapore
CitySingapore
Period24/04/2528/04/25

Bibliographical note

Publisher Copyright:
© 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.

Fingerprint

Dive into the research topics of 'Continuous Diffusion for Mixed-Type Tabular Data'. Together they form a unique fingerprint.

Cite this