HyperAIHyperAI
17 days ago

Developing Question-Answering Models in Low-Resource Languages: A Case Study on Turkish Medical Texts Using Transformer-Based Approaches

{Murat Aydogan, Mert Incidelen}
Abstract

In this study, transformer-based pre-trained language models were fine-tuned using medical texts for question-answering (QA) tasks in Turkish, a low-resource language. Variations of the BERTurk pre-trained language model created using large Turkish corpus were used for QA tasks. The study presents a medical Turkish QA dataset created using Turkish Wikipedia and medical theses located in the Thesis Center of the Council of Higher Education in Turkey. This dataset, containing a total of 8200 question-answer pairs, is used to fine-tune the BERTurk model. The performance of the models was evaluated by Exact Match (EM) and F1 score. The BERTurk (cased, 32k) model achieved an EM of 51.097 and an F1 score of 74.148, while the BERTurk (cased, 128 k) model achieved an EM of 55.121 and an F1 score of 77.187. The results show that pre-trained language models can be successfully used for question-answer tasks in low-resource languages such as Turkish. This study lays an important foundation for Turkish medical text processing and automatic QA tasks and sheds light on future research in this field.