HyperAIHyperAI
9 days ago

Chemical detection and indexing in PubMed full text articles using deep learning and rule-based methods

{Sérgio Matos, João Rafael Almeida, João Figueira Silva, Rui Antunes, Tiago Almeida}
Abstract

Identifying chemicals in biomedical scientific literature is a crucial task for drug development research. The BioCreative NLM-Chem challenge promoted the development of automatic systems that can identify chemicals in full-text articles and decide which chemical concepts are relevant to be indexed. This work describes the participation of the BIT.UA team from the University of Aveiro, where we propose a three-stage automatic pipeline that individually tackles (i) chemical mention detection, (ii) entity normalization and (iii) indexing. We adopted a deep learning solution based on a biomedical BERT variant for chemical identification. For normalization we used a rule-based approach and a hybrid version that explores a dense retrieval mechanism. Similarly, for indexing we also followed two distinct approaches: a rule-based, and a TF-IDF based method. Our best official results are consistently above the official median and benchmark in the three subtasks, with respectively 0.8454, 0.8136, and 0.4664 F1-scores.