Abstract
Corpora with specific entities and relationships annotated are essential to train and evaluate text-mining systems that are developed to extract specific structured information from a large corpus. In this paper we describe an approach where a named-entity recognition system produces a first annotation and annotators revise this annotation using a web-based interface. The agreement figures achieved show that the inter-annotator agreement is much better than the agreement with the system provided annotations. The corpus has been annotated for drugs, disorders, genes and their inter-relationships. For each of the drug-disorder, drug-target, and target-disorder relations three experts have annotated a set of 100 abstracts. These annotated relationships will be used to train and evaluate text-mining software to capture these relationships in texts. (C) 2012 Elsevier Inc. All rights reserved.
Original language | Undefined/Unknown |
---|---|
Pages (from-to) | 879-884 |
Number of pages | 6 |
Journal | Journal of Biomedical Informatics |
Volume | 45 |
Issue number | 5 |
DOIs | |
Publication status | Published - 2012 |
Research programs
- EMC NIHES-03-77-01