3 months ago

Developing Question-Answering Models in Low-Resource Languages: A Case Study on Turkish Medical Texts Using Transformer-Based Approaches

{Murat Aydogan Mert Incidelen}

Abstract

In this study, transformer-based pre-trained language models were fine-tuned using medical texts for question-answering (QA) tasks in Turkish, a low-resource language. Variations of the BERTurk pre-trained language model created using large Turkish corpus were used for QA tasks. The study presents a medical Turkish QA dataset created using Turkish Wikipedia and medical theses located in the Thesis Center of the Council of Higher Education in Turkey. This dataset, containing a total of 8200 question-answer pairs, is used to fine-tune the BERTurk model. The performance of the models was evaluated by Exact Match (EM) and F1 score. The BERTurk (cased, 32k) model achieved an EM of 51.097 and an F1 score of 74.148, while the BERTurk (cased, 128 k) model achieved an EM of 55.121 and an F1 score of 77.187. The results show that pre-trained language models can be successfully used for question-answer tasks in low-resource languages such as Turkish. This study lays an important foundation for Turkish medical text processing and automatic QA tasks and sheds light on future research in this field.

Benchmarks

Benchmark	Methodology	Metrics
question-answering-on-medturkquad-medical	BERTurk (cased, 128k)	Exact Match: 55.121 F1 Score: 77.187
question-answering-on-medturkquad-medical	BERTurk (cased, 32k)	Exact Match: 51.097 F1 Score: 74.148

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning