Command Palette
Search for a command to run...
T-Wix Russian SFT Dataset
Date
Size
Paper URL
T-Wix is a Russian SFT dataset, and the related paper is "From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning", which aims to enhance the model's capabilities from solving algorithmic and mathematical problems to dialogue, logical thinking and reasoning patterns. The dataset contains 499,598 Russian language samples, including 468,614 general samples covering a variety of areas, including mathematics, science, programming, general knowledge, instruction following, role-playing, etc. The reasoning samples contain 30,984 data points, focusing on advanced mathematics and science problems and providing detailed reasoning traces.
Citation
@inproceedings{stoianov-etal-2026-pro, title = “{T}-pro 2.0: An Efficient {R}ussian Hybrid-Reasoning Model and Playground”, author = “Stoianov, Dmitrii and Taranets, Danil and Tsymboi, Olga and Latypov, Ramil and Dautov, Almaz and Kruglikov, Vladislav and Nikita, Surkov and Abramov, German and Gein, Pavel and Abulkhanov, Dmitry and Gashkov, Mikhail and Zelenkovskiy, Viktor and Batalov, Artem and Medvedev, Aleksandr and Potapov, Anatolii”, editor = "Croce, Danilo and" Leidner, Jochen and Moosavi, Nafise Sadat”, booktitle = “Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 3: System Demonstrations)”, month = mar, year = “2026”, address = “Rabat, Marocco”, publisher = “Association for Computational Linguistics”, url = “https://aclanthology.org/2026.eacl-demo.22/”, doi = "10.18653/v1/2026.eacl-demo.22", pages = “297–319”, ISBN = "979-8-89176-382-1" }
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.