Using Textpresso for Information Retrieval, Fact Extraction
Correspondence: (Login to view email address)
- WormBase, Caltech, 1200 E. California Blvd, Pasadena, CA 91125
- Document Type:
- Presentation
- Date:
- Received 01 June 2009 17:44 UTC; Posted 02 June 2009
- Subjects:
- Genetics & Genomics, Bioinformatics
- Abstract:
Ten years ago WormBase1 started as a repository for sequence data for the model
organism Caenorhabditis elegans and has since striven to include the curation of all
genetic and molecular data published for this nematode. With a publication rate in the C.
elegans field of approximately 800 papers per year, WormBase (WB) has the opportunity to include information from every paper published. Currently there are ~11,000 full text research papers (mid-1970’s to the present) downloaded into the WB curation database, from which over 27 data types (i.e. genetic interactions, transgene objects, gene expression patterns, mutant phenotypes etc.) are extracted by curators. Textpresso2 is an open source text-mining tool capable of rapid searches for keywords, as well as concepts, from the full text of research papers. Curators at WB use Textpresso on a daily basis for many aspects of literature curation, from simple keyword searches to semi- or fully automated entity and fact extraction, which feed into curation pipelines or directly into the curation database itself. In addition, Textpresso greatly aids prioritization of literature curation by retrieving papers based on their full contents rather than solely on their abstracts. Such retrievable contents can range from the very particular (such as a gene simply being mentioned in the Materials and Methods section of a paper) to the complex (such as molecular functions that involve cellular components). As WB expands to incorporate the genomes of other nematodes, we will be working with Textpresso developers to set up a library of literature for related nematodes. We expect Textpresso to be crucial for most efficiently directing our efforts in literature curation, and for most quickly providing data to users searching the literature. In this workshop we will show how we use Textpresso in our curation pipeline to help with literature queries, to prioritize our workflow, and to automate data and fact extraction.
1 WormBase
2 Textpresso- Collection:
- 3rd International Biocuration Conference
- Presented at:
- 3rd International Biocuration Conference, 16 April 2009
Discussion
- Votes:
-
2 votes
- Comments:
-
0 comments
- (Login to share with a colleague)
Additional information
- License:
- This document is licensed to the public under the Creative Commons Attribution 3.0 License
- How to cite this document:
-
Yook, Karen, Van Auken, Kimberly, Sternberg, Paul, and Consortium, The WormBase. Using Textpresso for Information Retrieval, Fact Extraction. Available from Nature Precedings <http://dx.doi.org/10.1038/npre.2009.3302.1> (2009)
- Version info:
-
Other versions of this document in Nature Precedings
None.
Other versions of this document elsewhere on the web
None known.
