Event extraction from biomedical texts using trimmed dependency graphs
This thesis explores the automatic extraction of information from biomedical publications. Such techniques are urgently needed because the biosciences are publishing continually increasing numbers of texts. The focus of this work is on events. Information about events is currently manually curated from the literature by biocurators. Biocuration, however, is time-consuming and costly so automatic methods are needed for information extraction from the literature. This thesis is dedicated to modeling, implementing and evaluating an advanced event extraction approach based on the analysis of syntactic dependency graphs. This work presents the event extraction approach proposed and its implementation, the JReX (Jena Relation eXtraction) system. This system was used by the University of Jena (JULIE Lab) team in the "BioNLP 2009 Shared Task on Event Extraction" competition and was ranked second among 24 competing teams. Thereafter JReX was the highest scorer on the worldwide shared U-Compare event extraction server, outperforming the competing systems from the challenge. This success was made possible, among other things, by extensive research on event extraction solutions carried out during this thesis, e.g., exploring the effects of syntactic and semantic processing procedures on solving the event extraction task. The evaluations executed on standard and community-wide accepted competition data were complemented by real-life evaluation of large-scale biomedical database reconstruction. This work showed that considerable parts of manually curated databases can be automatically re-created with the help of the event extraction approach developed. Successful re-creation was possible for parts of RegulonDB, the world's largest database for E. coli. In summary, the event extraction approach justified, developed and implemented in this thesis meets the needs of a large community of human curators and thus helps in the acquisition of new knowledge in the biosciences.
Use and reproduction:
All rights reserved