HyperAIHyperAI
15 days ago

Tabular Data Generation using Binary Diffusion

Vitaliy Kinakh, Slava Voloshynovskiy
Tabular Data Generation using Binary Diffusion
Abstract

Generating synthetic tabular data is critical in machine learning, especiallywhen real data is limited or sensitive. Traditional generative models oftenface challenges due to the unique characteristics of tabular data, such asmixed data types and varied distributions, and require complex preprocessing orlarge pretrained models. In this paper, we introduce a novel, lossless binarytransformation method that converts any tabular data into fixed-size binaryrepresentations, and a corresponding new generative model called BinaryDiffusion, specifically designed for binary data. Binary Diffusion leveragesthe simplicity of XOR operations for noise addition and removal and employsbinary cross-entropy loss for training. Our approach eliminates the need forextensive preprocessing, complex noise parameter tuning, and pretraining onlarge datasets. We evaluate our model on several popular tabular benchmarkdatasets, demonstrating that Binary Diffusion outperforms existingstate-of-the-art models on Travel, Adult Income, and Diabetes datasets whilebeing significantly smaller in size.

Tabular Data Generation using Binary Diffusion | Latest Papers | HyperAI