Concept Recognition in French Biomedical Text Using Automatic Translation

Muhammad Afzal, Saber Ahmad Akhondi, Herman Haagen, Erik van Mulligen, Jan Kors

Research output: Contribution to journalArticleAcademic

3 Citations (Scopus)


We describe the development of a concept recognition system for French documents and its application in task 1b of the 2015 CLEF eHealth challenge. This community challenge included recognition of entities in a French medical corpus, normalization of the recognized entities, and normalization of entity mentions that had been manually annotated. Normalization had to be based on the Unified Medical Language System (UMLS). We addressed all three subtasks by a dictionary-based approach using Peregrine, our open-source indexing engine. To increase the coverage of our initial French terminology, we explored the use of two automatic translators, Google Translate and Microsoft Translator, to translate English UMLS terms into French. The corpus consisted of 1665 titles of French Medline abstracts and 6 French drug labels of the European Medicines Agency (EMEA). The corpus was manually annotated with concepts from the UMLS, and split in an equally-sized training and test set. The best performance on the training set was obtained with a terminology that contained the intersection of the translated terms in combination with several post-processing steps to reduce the number of false-positive detections. When evaluated on the test set, our system achieved F-scores of 0.756 and 0.665 for entity recognition on the EMEA documents and Medline titles, respectively. For subsequent entity normalization, the F-scores were 0.711 and 0.587. Entity normalization given the manually annotated entity mentions resulted in F-scores of 0.872 and 0.671. Our system obtained the highest F-scores among the systems that participated in the challenge.
Original languageUndefined/Unknown
Pages (from-to)162-173
Number of pages12
JournalLecture Notes in Computer Science
Publication statusPublished - 2016

Research programs

  • EMC NIHES-03-77-01

Cite this