HyperAIHyperAI

Command Palette

Search for a command to run...

Tabular Data Generation using Binary Diffusion

Vitaliy Kinakh Slava Voloshynovskiy

Abstract

Generating synthetic tabular data is critical in machine learning, especiallywhen real data is limited or sensitive. Traditional generative models oftenface challenges due to the unique characteristics of tabular data, such asmixed data types and varied distributions, and require complex preprocessing orlarge pretrained models. In this paper, we introduce a novel, lossless binarytransformation method that converts any tabular data into fixed-size binaryrepresentations, and a corresponding new generative model called BinaryDiffusion, specifically designed for binary data. Binary Diffusion leveragesthe simplicity of XOR operations for noise addition and removal and employsbinary cross-entropy loss for training. Our approach eliminates the need forextensive preprocessing, complex noise parameter tuning, and pretraining onlarge datasets. We evaluate our model on several popular tabular benchmarkdatasets, demonstrating that Binary Diffusion outperforms existingstate-of-the-art models on Travel, Adult Income, and Diabetes datasets whilebeing significantly smaller in size.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp