<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/" version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/">
  <channel>
    <title>Nature Precedings - Tag feed for curation</title>
    <link>http://precedings.nature.com/tags/curation</link>
    <description>Recently posted documents tagged with 'curation'</description>
    <dc:publisher>Nature Publishing Group</dc:publisher>
    <dc:language>en</dc:language>
    <prism:publicationName>Nature Precedings</prism:publicationName>
    <image>
      <title>Nature Precedings</title>
      <url>http://precedings.nature.com/images/header_logo.gif</url>
      <link>http://precedings.nature.com</link>
    </image>
    <atom:link type="application/rss+xml" rel="self" href="http://precedings.nature.com/tags/curation/feed"/>
    <item>
      <title>The Eukaryote Genome Annotation Platform at Genoscope</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3457.1</link>
      <description>The Genoscope annotation workflow for eukaryote genomes relies on evidence from ab initio gene models predictions combined with homology searches, using collections of expressed sequences &amp;#8211; full length cDNAs, ESTs or massive-scale mRNA sequences from the same or closely related organisms &#8211; proteins or other genomic sequences. Global analysis of these drafts or complete sequences are then combining both approaches in the form of gene prediction data integration using GAZE, capable to identify a majority of the existing gene features. Although of very good quality, gene-modelling remains still tentative at the end of the process. Even though computational predictors are useful on large scale annotation for global genomics analysis, there is no complete genome for which all gene structures, in terms of exons, introns and coding regions, have been experimentally confirmed.Finished genomes can provide exciting insights into the genome organization and evolution. Additional experimental data generated by genome sequencing projects give assistance to genome annotation aiming to a better understanding of the biology of the organism. Therefore, gene models and annotation can be improved by human curation to find errors or to resolve incongruous evidence on the automatic annotation of the genome. We now provide to collaborators carrying sequencing projects with a distributed annotation platform allowing expert evaluation of the annotation, in addition to our automated gene prediction pipeline.To ensure at most the participation of the scientific community, an annotation tool for revising annotations has been set up using components of the Generic Model Organism Database toolkit, which provides tools for managing organism databases. A CHADO database, linked to an Apollo graphical interface, permit users to correct gene structures and store them in a dedicated organism database, as we will show on a few examples. Such a tool would facilitate connecting and comparing predicted annotations with existing biological data, becoming the repository of complete annotated finished genome sequence. </description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3457.1</guid>
      <pubDate>Fri, 24 Jul 2009 15:28:24 UTC</pubDate>
      <dc:title>The Eukaryote Genome Annotation Platform at Genoscope</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3457.1</dc:identifier>
      <dc:date>2009-07-24</dc:date>
      <dc:creator>Betina M. Porcel</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-07-24T15:28:24Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3457/version/1/files/npre20093457-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Online Training of New Curators</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3420.1</link>
      <description>The basic information in Reactome is provided by bench biologists who are experts on a particular pathway, the Reactome Team is always working hard to drive engagement. This engagement between experts, curators, editors and reviewers requires maintenance and improvement, and in this sense Reactome is itself a model for large biocuration projects that are driven by community engagement.  This tutorial will highlight issues from the perspective of online training participants, the trainer&amp;#8217;s and the audience&amp;#8217;s. From the audience perspective the tutorial will introduce the concepts that drive the Reactome data model, cover the basic steps that a researcher would have to follow in order to breakdown a biological pathway into its &amp;#8220;reaction-based&amp;#8221; Reactome representation. Introduce the user to the tools that are used by authors, the &amp;#8220;authortool&amp;#8221; and the tools used by curators, the &amp;#8220;curatortool&amp;#8221; to move that data into the Reactome database. From the trainer perspective the tutorial will focus on the essential role that a clear explanation of a resource&amp;#8217;s data model plays in priming the audience for the technical aspects of biocuration. Technical challenges and online delivery methods will be discussed and examples of systems used will be presented  with discussion of the negative and positive aspects. Pedagogical models for enhancing audience participation will be briefly presented. The Reactome project is a collaboration among Cold Spring Harbor Laboratory, The European  Bioinformatics Institute, and The Gene Ontology Consortium to develop a curated resource of core pathways and reactions in human biology. The information in this database is authored by biological researchers with expertise in their fields, maintained by the Reactome editorial staff, and cross referenced with the sequence databases at NCBI, Ensembl and UniProt, the UCSC Genome Browser , KEGG (Gene and Compound ), ChEBI, PubMed and GO. The information is then managed by groups of curators at CSHL and EBI, peer-reviewed by other researchers and published on the web. While Reactome is targeted at human pathways, it also includes many individual biochemical reactions from non-human systems such as rat, mouse, pufferfish and zebrafish. This makes the database relevant to the many researchers who work on model organisms. All the information in Reactome is backed up by its provenance: either a literature citation or an electronic inference based on sequence similarity. Reactome is a free on-line resource, and Reactome software is open-source.</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3420.1</guid>
      <pubDate>Mon, 13 Jul 2009 09:06:34 UTC</pubDate>
      <dc:title>Online Training of New Curators</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3420.1</dc:identifier>
      <dc:date>2009-07-13</dc:date>
      <dc:creator>Marc E. Gillespie</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-07-13T09:06:34Z</prism:publicationDate>
      <prism:category>Presentation</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3420/version/1/files/npre20093420-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Using Textpresso for Information Retrieval, Fact Extraction</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3302.1</link>
      <description>Ten years ago WormBase1 started as a repository for sequence data for the modelorganism Caenorhabditis elegans and has since striven to include the curation of allgenetic and molecular data published for this nematode. With a publication rate in the C.elegans field of approximately 800 papers per year, WormBase (WB) has the opportunity to include information from every paper published. Currently there are ~11,000 full text research papers (mid-1970&amp;#8217;s to the present) downloaded into the WB curation database, from which over 27 data types (i.e. genetic interactions, transgene objects, gene expression patterns, mutant phenotypes etc.) are extracted by curators. Textpresso2 is an open source text-mining tool capable of rapid searches for keywords, as well as concepts, from the full text of research papers. Curators at WB use Textpresso on a daily basis for many aspects of literature curation, from simple keyword searches to semi- or fully automated entity and fact extraction, which feed into curation pipelines or directly into the curation database itself. In addition, Textpresso greatly aids prioritization of literature curation by retrieving papers based on their full contents rather than solely on their abstracts. Such retrievable contents can range from the very particular (such as a gene simply being mentioned in the Materials and Methods section of a paper) to the complex (such as molecular functions that involve cellular components). As WB expands to incorporate the genomes of other nematodes, we will be working with Textpresso developers to set up a library of literature for related nematodes. We expect Textpresso to be crucial for most efficiently directing our efforts in literature curation, and for most quickly providing data to users searching the literature. In this workshop we will show how we use Textpresso in our curation pipeline to help with literature queries, to prioritize our workflow, and to automate data and fact extraction.1 WormBase2 Textpresso</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3302.1</guid>
      <pubDate>Tue, 02 Jun 2009 14:49:35 UTC</pubDate>
      <dc:title>Using Textpresso for Information Retrieval, Fact Extraction</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3302.1</dc:identifier>
      <dc:date>2009-06-02</dc:date>
      <dc:creator>Kimberly Van Auken</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-06-02T14:49:35Z</prism:publicationDate>
      <prism:category>Presentation</prism:category>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3302/version/1/files/npre20093302-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Integrating Text Mining into the MGI Biocuration Workflow</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3262.1</link>
      <description>A major challenge for the development of resources for functional and comparative genomics is the extraction of data from the biomedical literature.  Although text retrieval and extraction for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases.In September 2008, Mouse Genome Informatics (MGI) at The Jackson Lab initiated a search for dictionary-based text mining tools that we could integrate into our curation workflow.  MGI has rigorous document triage and annotation procedures designed to identify articles about mouse genome biology and determine whether those articles should be curated.  We currently screens approximately 1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information.  Although we don&#8217;t foresee that human curation tasks can be fully automated in the near future, we are eager to implement entity name recognition and gene tagging tools that can help streamline our curation workflow and simplify gene indexing tasks in the MGI system. In this presentation, we discuss our search process and the steps we took to identify a short list of potential tools for further evaluation. We present our performance metrics and success criteria, and pilot projects in progress.  The primary applications under current review are Fraunhofer SCAI&#8217;s ProMiner and NCBO&#8217;s Open-Biomedical Annotator.  </description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3262.1</guid>
      <pubDate>Wed, 20 May 2009 21:16:19 UTC</pubDate>
      <dc:title>Integrating Text Mining into the MGI Biocuration Workflow</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3262.1</dc:identifier>
      <dc:date>2009-05-20</dc:date>
      <dc:creator>Karen G. Dowell</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-05-20T21:16:19Z</prism:publicationDate>
      <prism:category>Presentation</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3262/version/1/files/npre20093262-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Data Curation in Biology &amp;#8211; Past, Present and Future</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3225.1</link>
      <description>Data curation has been critical in the development of biology from Darwin and Linnaeus to UniProt, the careful collection and organisation of data has been the spring from which new hypotheses and understanding have emerged. In this presentation, I will describe how we have used data curation in my own research group &amp;#8211; and also present an overview of curation at the EBI. With new technical developments and the move towards the semantic web, the role of curation in the future needs to develop to take advantage of these new opportunities. This will be discussed.</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3225.1</guid>
      <pubDate>Tue, 12 May 2009 13:19:11 UTC</pubDate>
      <dc:title>Data Curation in Biology &amp;#8211; Past, Present and Future</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3225.1</dc:identifier>
      <dc:date>2009-05-12</dc:date>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-05-12T13:19:11Z</prism:publicationDate>
      <prism:category>Presentation</prism:category>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Molecular Cell Biology</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3225/version/1/files/npre20093225-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Present and future of proteomics data curation at the PRIDE database</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3222.1</link>
      <description>Significant progress has been made in improving the accessibility and utility of the large amounts of generated high-throughput proteomics data by the introduction of publicly available proteomics repositories. One such repository is PRIDE (the &#8216;PRoteomics IDEntifications&#8217; database, http://www.ebi.ac.uk/pride). PRIDE stores mass spectrometry related data, including peptide and protein identifications, mass spectra and valuable additional metadata.At present, data curation in PRIDE is limited to data submission support. The format in which all submissions need to take place is PRIDE XML. Mass spectrometry derived data is very heterogeneous in terms of experimental approaches, instrumentation, data formats, etc. This is why conversion of all this different data to PRIDE XML is far from being trivial and can be very time consuming, since tailored submission pipelines must be often constructed. However, the situation has now ameliorated since some new tools like PRIDE converter (http://code.google.com/p/pride-converter). are now available for submitters to convert their data to PRIDE XML.In the near future, data curation in PRIDE will be significantly extended. High-quality data will be included in a new repository called PRIDE-plus. First of all, it will be necessary to create a set of minimal requirement rules to decide which datasets can be included in PRIDE-plus. Then, the design and implementation of new curation tools to perform data quality assessment will be essential. It will also be necessary to do research into the automation of these new curation and annotation tasks.</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3222.1</guid>
      <pubDate>Wed, 06 May 2009 20:45:07 UTC</pubDate>
      <dc:title>Present and future of proteomics data curation at the PRIDE database</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3222.1</dc:identifier>
      <dc:date>2009-05-06</dc:date>
      <dc:creator>Juan Antonio Vizcaino</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-05-06T20:45:07Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Biotechnology</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3222/version/1/files/npre20093222-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>DDBJ Activities: Contribution to the Research in Information Biology</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3211.1</link>
      <description>DDBJ (DNA Data Bank of Japan; http://www.ddbj.nig.ac.jp/ its database activities in 1986. From the beginning, DDBJ has been one of INSDC (International Nucleotide Sequence Database Collaboration; http://www.insdc.org/) that is a tripartite collaboration with EMBL-Bank/EBI and GenBank/NCBI.The total base number of the primary nucleotide sequence data collected and distributed by INSDC exceeded 100 Gbases in August 2005. Since then it took only three years for the total base number to be doubled (200 Gbases). Now, the collaboration is being expanded to Traces (DNA sequence chromatograms) and Short Reads (raw reads data from 454, Solexa, SOLiD etc). DDBJ is also collecting and releasing gene expression data at CIBEX (Center for Information Biology gene EXpression database; http://cibex.nig.ac.jp/). Furthermore, DDBJ contributed to international annotation jamborees such as FANTOM (mouse), H-Inv (human), RAP (rice) and E. coli K12. DDBJ provides many services to the research in information biology or bioinformatics. They include Web-API for Biology (WABI) http://www.xml.nig.ac.jp/ and All-round Retrieval of Sequence and Annotation (ARSA) http://arsa.ddbj.nig.ac.jp/. These activities are presented with the perspective of DDBJ in the coming years.</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3211.1</guid>
      <pubDate>Wed, 06 May 2009 20:37:55 UTC</pubDate>
      <dc:title>DDBJ Activities: Contribution to the Research in Information Biology</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3211.1</dc:identifier>
      <dc:date>2009-05-06</dc:date>
      <dc:creator>Jun Mashima</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-05-06T20:37:55Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3211/version/1/files/npre20093211-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Reflect: Augmented Browsing for the Life Scientist</title>
      <link>http://precedings.nature.com/documents/3212/version/1</link>
      <description>Anyone who regularly reads life science literature often comes across names of genes, proteins, or small molecules that they would like to know more about. To make this process easier, we have developed a new, free service called Reflect (http://reflect.ws) that can be installed as a plug-in to Firefox or Internet Explorer. Reflect tags gene, protein, and small molecule names in any web page, typically within a few seconds, and without affecting document layout. Clicking on a tagged gene or protein name opens a popup showing a concise summary that includes synonyms, database identifiers, sequence, domains, 3D structure, interaction partners, subcellular location, and related literature. Clicking on a tagged small molecule name opens a popup showing 2D structure and interaction partners. The popups also allow navigation to commonly used databases. In the future we plan to add further entity types to Reflect, including outside the life sciences.</description>
      <guid>http://precedings.nature.com/documents/3212/version/1</guid>
      <pubDate>Mon, 04 May 2009 14:50:53 UTC</pubDate>
      <dc:title>Reflect: Augmented Browsing for the Life Scientist</dc:title>
      <dc:identifier>hdl:10101/npre.2009.3212.1</dc:identifier>
      <dc:date>2009-05-04</dc:date>
      <dc:creator>Se&#225;n I. O'Donoghue</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-05-04T14:50:53Z</prism:publicationDate>
      <prism:category>Manuscript</prism:category>
      <prism:section>Biotechnology</prism:section>
      <prism:section>Chemistry</prism:section>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Molecular Cell Biology</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3212/version/1/files/npre20093212-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Curating Genetic Association Literature for Common Diseases</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3174.1</link>
      <description>Papers describing genetic associations with common diseases are currently being published at a rapid rate. These new papers add to an already large body of literature which includes candidate gene studies, genome wide association studies, review papers, and meta-analyses. Related papers describe the basic epidemiology of these common diseases, as well as interactions between genes and environment, other genes, and drugs (pharmacogenomics), all of which may affect disease predisposition and management. As the field of personalized genomics continues to grow and mature, this body of literature is being synthesized in various ways. One aspect of utmost importance is to track gene-disease associations over time to see if they are replicated in different populations by different authors. This has historically1 been a limitation of genetic association studies and if not addressed, can be a major barrier to the adoption of personalized genomics. Another key need is to systematically collect data on the magnitude of the gene-disease effect (typically an odds ratio), variant identifier, allele frequency, risk and non-risk alleles, and other key information from the papers. This is often made more complicated the way that the data is presented by authors (in particular, it is often difficult to tell which is the risk allele), as well as by DNA strand issues.  We have built a literature curation database that addresses these two key needs, using Ruby on Rails with MySQL. Lack of consistent standards for the reporting of gene-disease associations, either by journal editors or other consortia or agencies, make automated computer curation infeasible at this time.  Thus, Navigenics does all curation manually, employing a team of epidemiologists and human geneticists. To minimize human error, we have incorporated quality control measures, including two independent literature reviews, into the data collection system. Methods for the collection, interpretation, storage, and retrieval of genetic association data from large numbers of papers will be discussed.</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3174.1</guid>
      <pubDate>Tue, 28 Apr 2009 18:39:06 UTC</pubDate>
      <dc:title>Curating Genetic Association Literature for Common Diseases</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3174.1</dc:identifier>
      <dc:date>2009-04-28</dc:date>
      <dc:creator>Elana Silver</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-04-28T18:39:06Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3174/version/1/files/npre20093174-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Broadening Pfam Protein Sequence Annotations</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3194.1</link>
      <description>Pfam is a database of conserved protein families or domains commonly used for genome annotation and sequence classification. It comprises two parts: (1) Pfam-A families, which are fully annotated and consist of a representative seed alignment, HMMs, and a full alignment comprising all sequences that score above the curated threshold; (2) Pfam-B families, which are automatically generated clusters of domains not matched by Pfam-A but that often indicate conserved sequence regions. Pfam release 23.0 predicts at least one Pfam-A domain on 74% of the sequences in UniProtKB, and predicts either a Pfam-A or Pfam-B domain on 93% of the sequences in UniProtKB.With the ever increasing rate of deposition of new proteins of all qualities into the underlying repositories, it is essential that Pfam continues to grow in order to maintain its coverage. We have used a number of strategies to improve the annotation provided by Pfam, and these include both building new families and expanding existing ones. Pfam has also greatly benefited from contributions from its user community. New family and functional annotation submissions from an S. pombe curator have ensured that Pfam has a high coverage &amp;#8211; 83% &amp;#8211; of the S. pombe proteome. Many of the early Pfam-A models have not been altered since they were first deposited. As the diversity of the sequence databases grows, the diversity within a Pfam seed alignment can become too narrow for representing the breadth of sequences that should belong to that family. The result is that some of the early Pfam-A HMMs fail to detect remote homologues. To address this problem we have rebuilt a large proportion of Pfam-A families, which has increased the Pfam-A coverage by 1-2%. Another strategy we have used has been that of targeted building, where a particular system or complex is examined in detail to ensure families exist for all components and annotation is consistent. In terms of building new Pfam-A families, the two major starting points are Pfam-B clusters and novel structures. From these we have built ~1000 families between releases 22.0 and 23.0, and a further 800 families since release 23.0.Between Pfam releases 22.0 and 23.0 we have changed the the way in which Pfam-B families are generated. Previously, Pfam-B families were created from PRODOM clusters that were based on a much smaller sequence database than the one upon which Pfam was built. We now use the ADDA algorithm that generates clusters from the same underlying sequence database as Pfam is based on, thus resulting in a more comprehensive Pfam-B contribution. This has increased the sequence coverage contributed by Pfam-B substantially from 3.9% to 11.8%. In a further drive to improve coverage, Pfam is currently evaluating a new release of the HMMER software (HMMER3) used to construct and search the Pfam HMMs. Early results show that HMMER3 is ~100 fold faster and has increased specificity and sensitivity compared with HMMER2. </description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3194.1</guid>
      <pubDate>Tue, 28 Apr 2009 18:38:09 UTC</pubDate>
      <dc:title>Broadening Pfam Protein Sequence Annotations</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3194.1</dc:identifier>
      <dc:date>2009-04-28</dc:date>
      <dc:creator>Jaina Mistry</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-04-28T18:38:09Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3194/version/1/files/npre20093194-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
  </channel>
</rss>
