Article

Adverse Drug Reaction extraction: Tolerance to entity recognition errors and sub-domain variants

Santiso, Sara; Perez, Alicia; Casillas, Arantza

Computer Science

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE

2021

VL / 199 - BP / - EP /

abstract

Background and Objective: This work tackles the Adverse Drug Reaction (ADR) extraction in Electronic Health Records (EHRs) written in Spanish. This task is within the framework of natural language processing. It consists of extracting relations between drug-disease pairs, with the drug as the causing agent of the reaction. To this end, a pipeline is employed: first, relevant clinical entities are recognized (e.g. drugs, active ingredients, findings, symptoms); next, drug-disease candidate pairs are judged as either ADR or non-ADR. To develop this task, it is necessary to tackle some challenges. First, EHRs show high lexical variability. Second, EHRs are scarce due to their sensitive information. Third, the ADR detection stage has to cope with errors derived from the entity recognition. Methods: To develop the ADR detection we decided to employ a deep neural network approach. In order to asses the tolerance to external variations, the system was exposed to different levels of noise. First, with three corpora that contain documents from different hospitals, size and class imbalance ratio. Furthermore, it was exposed to cross-corpus relation extraction. Second, we assessed the sensitivity of the ADR detection stage to noise introduced by the automatic Medical Entity Recognition (MER). Results: The system can cope with cross-hospital predictions provided that it was trained with a large corpus. In the most challenging situation an f-measure of 75.2 was achieved. With respect to the tolerance to errors derived from the entity recognition step, with a medical entity recognizer that missed 20% of the entities, the f-measure in the ADR detection stage decreased to 68.6. Conclusions: The ADR extraction is tackled as a cause-effect relation extraction task between drugs and diseases. It is advisable to employ as many EHRs as possible in order to make more robust the ADR extraction. Despite the entities missed in the MER step, the drop in the performance is not high with the proposed system. (C) 2020 Elsevier B.V. All rights reserved.