TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations

Alan Arazi, Eilam Shapira, Roi Reichart

발행일: 5/26/2025

TabSTAR: A Foundation Tabular Model With Semantically Target-Aware
Representations

초록

While deep learning has achieved remarkable success across many domains, ithas historically underperformed on tabular learning tasks, which remaindominated by gradient boosting decision trees (GBDTs). However, recentadvancements are paving the way for Tabular Foundation Models, which canleverage real-world knowledge and generalize across diverse datasets,particularly when the data contains free-text. Although incorporating languagemodel capabilities into tabular tasks has been explored, most existing methodsutilize static, target-agnostic textual representations, limiting theireffectiveness. We introduce TabSTAR: a Foundation Tabular Model withSemantically Target-Aware Representations. TabSTAR is designed to enabletransfer learning on tabular data with textual features, with an architecturefree of dataset-specific parameters. It unfreezes a pretrained text encoder andtakes as input target tokens, which provide the model with the context neededto learn task-specific embeddings. TabSTAR achieves state-of-the-artperformance for both medium- and large-sized datasets across known benchmarksof classification tasks with text features, and its pretraining phase exhibitsscaling laws in the number of datasets, offering a pathway for furtherperformance improvements.

논문 세부 정보 보기