<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/" version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/">
  <channel>
    <title>Nature Precedings - Tag feed for databases</title>
    <link>http://precedings.nature.com/tags/databases</link>
    <description>Recently posted documents tagged with 'databases'</description>
    <dc:publisher>Nature Publishing Group</dc:publisher>
    <dc:language>en</dc:language>
    <prism:publicationName>Nature Precedings</prism:publicationName>
    <image>
      <title>Nature Precedings</title>
      <url>http://precedings.nature.com/images/header_logo.gif</url>
      <link>http://precedings.nature.com</link>
    </image>
    <atom:link type="application/rss+xml" rel="self" href="http://precedings.nature.com/tags/databases/feed"/>
    <item>
      <title>Biocuration Workflow Catalogue</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3250.1</link>
      <description>As the first phase of a knowledge engineering study of biocuration workflows, we performed a preliminary task-modeling exercise on seven separate bioinformatics systems. This involved constructing UML activity diagrams from detailed interviews with curators in order to understand the organization of the process the biocurators used to populate their system. The objective of this work was to identify common patterns within the workflows where we might apply text mining methods to accelerate curation. We compiled a number of workflows in a common format but were largely unable to consolidate these structures into a formal structure that facilitated comparison across workflows. We presented this work as a slideshow and publish this account of the catalog as supplementary information.</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3250.1</guid>
      <pubDate>Wed, 13 May 2009 19:12:22 UTC</pubDate>
      <dc:title>Biocuration Workflow Catalogue</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3250.1</dc:identifier>
      <dc:date>2009-05-13</dc:date>
      <dc:creator>Gully Burns</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-05-13T19:12:22Z</prism:publicationDate>
      <prism:category>Manuscript</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3250/version/1/files/npre20093250-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>DDBJ Activities: Contribution to the Research in Information Biology</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3211.1</link>
      <description>DDBJ (DNA Data Bank of Japan; http://www.ddbj.nig.ac.jp/ its database activities in 1986. From the beginning, DDBJ has been one of INSDC (International Nucleotide Sequence Database Collaboration; http://www.insdc.org/) that is a tripartite collaboration with EMBL-Bank/EBI and GenBank/NCBI.The total base number of the primary nucleotide sequence data collected and distributed by INSDC exceeded 100 Gbases in August 2005. Since then it took only three years for the total base number to be doubled (200 Gbases). Now, the collaboration is being expanded to Traces (DNA sequence chromatograms) and Short Reads (raw reads data from 454, Solexa, SOLiD etc). DDBJ is also collecting and releasing gene expression data at CIBEX (Center for Information Biology gene EXpression database; http://cibex.nig.ac.jp/). Furthermore, DDBJ contributed to international annotation jamborees such as FANTOM (mouse), H-Inv (human), RAP (rice) and E. coli K12. DDBJ provides many services to the research in information biology or bioinformatics. They include Web-API for Biology (WABI) http://www.xml.nig.ac.jp/ and All-round Retrieval of Sequence and Annotation (ARSA) http://arsa.ddbj.nig.ac.jp/. These activities are presented with the perspective of DDBJ in the coming years.</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3211.1</guid>
      <pubDate>Wed, 06 May 2009 20:37:55 UTC</pubDate>
      <dc:title>DDBJ Activities: Contribution to the Research in Information Biology</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3211.1</dc:identifier>
      <dc:date>2009-05-06</dc:date>
      <dc:creator>Jun Mashima</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-05-06T20:37:55Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3211/version/1/files/npre20093211-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>HGNC: The Why and How of Standardised Gene Nomenclature</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3182.1</link>
      <description>The HUGO Gene Nomenclature Committee (HGNC) aims to approve a unique gene symbol and gene name for every human gene.  Standardisation of gene symbols is necessary to allow researchers and curators to refer to the same gene without ambiguity.  Consistent use of gene symbols in publications and across different websites makes it easy for researchers to find all relevant information for a particular gene and facilitates data mining and retrieval.  For each gene that we name we curate relevant information including symbol aliases, chromosomal location, locus type, sequence accessions and links to relevant databases.  Therefore, our website is a central resource for human genetics. We endeavour to approve gene symbols that are acceptable to researchers to encourage widespread use of our symbols.  In order to achieve this, we contact researchers that work on particular genes for advice before approving symbols and allow researchers to submit gene symbols to us directly for our consideration.  We attend conferences to discuss difficult nomenclature matters and to gain community agreement.  We interact with annotators of genes and proteins to provide symbols and names that accurately reflect the nature of each gene and its products.  We also work with the gene nomenclature committees for other organisms, and aim to approve equivalent gene symbols for orthologous genes in human and other vertebrate species, especially mouse and rat. We will demonstrate the steps that are required to name a gene, and will show how and where the nomenclature of a particular gene is used.  We will also explain the nature of our collaborations with particular journals and other databases in striving to achieve the use of a common gene nomenclature by all.</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3182.1</guid>
      <pubDate>Tue, 28 Apr 2009 18:42:54 UTC</pubDate>
      <dc:title>HGNC: The Why and How of Standardised Gene Nomenclature</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3182.1</dc:identifier>
      <dc:date>2009-04-30</dc:date>
      <dc:creator>Ruth Seal</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-04-28T18:42:54Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3182/version/1/files/npre20093182-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Broadening Pfam Protein Sequence Annotations</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3194.1</link>
      <description>Pfam is a database of conserved protein families or domains commonly used for genome annotation and sequence classification. It comprises two parts: (1) Pfam-A families, which are fully annotated and consist of a representative seed alignment, HMMs, and a full alignment comprising all sequences that score above the curated threshold; (2) Pfam-B families, which are automatically generated clusters of domains not matched by Pfam-A but that often indicate conserved sequence regions. Pfam release 23.0 predicts at least one Pfam-A domain on 74% of the sequences in UniProtKB, and predicts either a Pfam-A or Pfam-B domain on 93% of the sequences in UniProtKB.With the ever increasing rate of deposition of new proteins of all qualities into the underlying repositories, it is essential that Pfam continues to grow in order to maintain its coverage. We have used a number of strategies to improve the annotation provided by Pfam, and these include both building new families and expanding existing ones. Pfam has also greatly benefited from contributions from its user community. New family and functional annotation submissions from an S. pombe curator have ensured that Pfam has a high coverage &amp;#8211; 83% &amp;#8211; of the S. pombe proteome. Many of the early Pfam-A models have not been altered since they were first deposited. As the diversity of the sequence databases grows, the diversity within a Pfam seed alignment can become too narrow for representing the breadth of sequences that should belong to that family. The result is that some of the early Pfam-A HMMs fail to detect remote homologues. To address this problem we have rebuilt a large proportion of Pfam-A families, which has increased the Pfam-A coverage by 1-2%. Another strategy we have used has been that of targeted building, where a particular system or complex is examined in detail to ensure families exist for all components and annotation is consistent. In terms of building new Pfam-A families, the two major starting points are Pfam-B clusters and novel structures. From these we have built ~1000 families between releases 22.0 and 23.0, and a further 800 families since release 23.0.Between Pfam releases 22.0 and 23.0 we have changed the the way in which Pfam-B families are generated. Previously, Pfam-B families were created from PRODOM clusters that were based on a much smaller sequence database than the one upon which Pfam was built. We now use the ADDA algorithm that generates clusters from the same underlying sequence database as Pfam is based on, thus resulting in a more comprehensive Pfam-B contribution. This has increased the sequence coverage contributed by Pfam-B substantially from 3.9% to 11.8%. In a further drive to improve coverage, Pfam is currently evaluating a new release of the HMMER software (HMMER3) used to construct and search the Pfam HMMs. Early results show that HMMER3 is ~100 fold faster and has increased specificity and sensitivity compared with HMMER2. </description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3194.1</guid>
      <pubDate>Tue, 28 Apr 2009 18:38:09 UTC</pubDate>
      <dc:title>Broadening Pfam Protein Sequence Annotations</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3194.1</dc:identifier>
      <dc:date>2009-04-28</dc:date>
      <dc:creator>Jaina Mistry</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-04-28T18:38:09Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3194/version/1/files/npre20093194-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Increasing Access To Bioinformatics Resources: Increase Community Curation by Increasing Your Community</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3163.1</link>
      <description>The amount of biological data has risen exponentially over the last decade. Along with this rise, the number and types of bioinformatics resources has risen such that the sheer number of bioinformatics resources is overwhelming. For these resources to attain their full potential, they must be efficiently and extensively utilized. But in such a plethora of resources, how does a researcher new to a field find the resources that will meet their needs. Once a resource is found, how does the researcher quickly learn to utilize that resource fully? There are resource lists such as those provided by the journal Nucleic Acid Research (NAR), BioMed Central and the Univ. of Pittsburgh Health Sciences Library, and most resources include their own documentation. But lists and site documentation don&#8217;t always cast a wide enough net to catch all users. For a resource to truly maximize their user community, it often takes multiple different outreach approaches. OpenHelix specializes in providing customized outreach services to bioinformatics resources, including those featured on this poster. Based on the conclusions from our Phase I SBIR grant, which tested the efficiency of several methods for training researchers on the use of genomic resources, OpenHelix (www.openhelix.com) has developed, and currently provides up-to-date online training materials on a large number of bioinformatics resources, covering major providers and research areas. Through a Phase II SBIR grant from NHGRI and other funding, we are developing a search portal for online bioinformatics resources.  With the search portal, researchers will be able to find the bioinformatics and genomics online databases and resources most relevant to their needs. The search portal will contain an index of hundreds of the most popular and powerful resources, as well as the content of over 100 OpenHelix tutorials. Using various ranking techniques, the portal will be able to provide more relevant results than a simple keyword search.  Additionally, as part of an extension of our Phase II grant, we are providing a sponsored suite of Model Organism Database trainings that include GBrowse, FlyBase, MGI, RGD, SGD, WormBase and ZFIN. We are also developing, and will be offering, sponsored live trainings on resources at NCBI as a way of filling the void left by the demise of the NCBI Field Guide outreach programs. Conclusion: OpenHelix aims to increase community curation by helping resources increase their user community</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3163.1</guid>
      <pubDate>Fri, 24 Apr 2009 15:45:30 UTC</pubDate>
      <dc:title>Increasing Access To Bioinformatics Resources: Increase Community Curation by Increasing Your Community</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3163.1</dc:identifier>
      <dc:date>2009-04-24</dc:date>
      <dc:creator>Jennifer M. Williams</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-04-24T15:45:30Z</prism:publicationDate>
      <prism:category>Presentation</prism:category>
      <prism:section>Biotechnology</prism:section>
      <prism:section>Cancer</prism:section>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Molecular Cell Biology</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3163/version/1/files/npre20093163-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>A Framework for BioCuration Workflows (part II)</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3126.1</link>
      <description>This is the second part of the talk &#8216;A Framework for BioCuration Workflows&#8217;, given by Martin Krallinger from the Spanish National Cancer Research Centre at the &amp;#8216;Text Mining for the BioCuration Workflow&amp;#8217; workshop. The first part was held by Gully APC Burns (USC Information Sciences Institute, USA). This presentation covered some of the main general tasks often shared by existing literature biocuration workflows: Identification of relevant articles, identification and normalization of the actual bio-entities, the extraction of the annotation event and the identification of some evidential support (e.g. experimental evidence). Some of the important aspects, bottlenecks and potential text mining approaches were briefly introduced for each general workflow task. This talk provided a short overview of some of the important aspects to integrate text mining systems into a given biocuration workflow and showed how heterogeneous workflows can be even for a relatively straight forward task such as the identification of curation relevant literature. A more detailed example of the biocuration of protein-protein interactions through the workflow followed by the BioGRID database was presented.</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3126.1</guid>
      <pubDate>Wed, 22 Apr 2009 21:14:40 UTC</pubDate>
      <dc:title>A Framework for BioCuration Workflows (part II)</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3126.1</dc:identifier>
      <dc:date>2009-04-22</dc:date>
      <dc:creator>Martin Krallinger</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-04-22T21:14:40Z</prism:publicationDate>
      <prism:category>Presentation</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3126/version/1/files/npre20093126-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Curation and annotation for BioModels Database, a resource of published quantitative kinetic models</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3124.1</link>
      <description>BioModels Database (http://www.ebi.ac.uk/biomodels/) is a free resource for storing, viewing and retrieving published, peer-reviewed, quantitative models of biochemical and cellular systems. As a storage format, BioModels Database uses the Systems Biology Markup Language (SBML), but also allows submission and export of models in various other commonly used formats.To offer scientists reliable information, models are curated to comply with the MIRIAM (Minimal Information Requested In the Annotation of biochemical Models) standard. This curation process involves verification of the model structure, the parameter and variable values and its mathematical relations. Furthermore reproduction of results in the reference publication is checked.The different elements of the models are extensively annotated with references to controlled vocabularies and links to other databases, to allow for identification and search. Those references and links are provided in the exported SBML files as a URN (Uniform Resource Name), identifying the data-type and the data-set, and a qualifier, indicating the relation between the element and the referenced data-set. The URNs follow the MIRIAM scheme and are resolved, for instance to URLs, using the Web Services of MIRIAM Resources (http://www.ebi.ac.uk/miriam/).</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3124.1</guid>
      <pubDate>Wed, 22 Apr 2009 12:58:44 UTC</pubDate>
      <dc:title>Curation and annotation for BioModels Database, a resource of published quantitative kinetic models</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3124.1</dc:identifier>
      <dc:date>2009-05-06</dc:date>
      <dc:creator>Lukas Endler</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-04-22T12:58:44Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3124/version/1/files/npre20093124-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Data, Databases, and Communities</title>
      <link>http://dx.doi.org/10.1038/npre.2009.2884.1</link>
      <description>Overview* Publishing landscape and challenges* Nature gateways and databases* NCI-Nature Pathway Interaction Database* A few other communication tools</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.2884.1</guid>
      <pubDate>Fri, 20 Feb 2009 16:58:29 UTC</pubDate>
      <dc:title>Data, Databases, and Communities</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.2884.1</dc:identifier>
      <dc:date>2009-02-20</dc:date>
      <dc:creator>Matthew Day</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-02-20T16:58:29Z</prism:publicationDate>
      <prism:category>Presentation</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/2884/version/1/files/npre20092884-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Identifying Data Sharing in Biomedical Literature</title>
      <link>http://precedings.nature.com/documents/1721/version/2</link>
      <description>Many policies and projects now encourage investigators to share their raw research data with other scientists. Unfortunately, it is difficult to measure the effectiveness of these initiatives because data can be shared in such a variety of mechanisms and locations. We propose a novel approach to find shared datasets: using NLP techniques to identify declarations of dataset sharing within the full text of primary research articles. Using regular expression patterns and machine learning algorithms on open access biomedical literature, our system was able to identify 61% of articles with shared datasets with 80% precision. A simpler version of our classifier achieved higher recall (86%), though lower precision (49%). We believe our results demonstrate the feasibility of this approach and hope to inspire further study of dataset retrieval techniques and policy evaluation.</description>
      <guid>http://precedings.nature.com/documents/1721/version/2</guid>
      <pubDate>Mon, 04 Aug 2008 20:32:00 UTC</pubDate>
      <dc:title>Identifying Data Sharing in Biomedical Literature</dc:title>
      <dc:identifier>hdl:10101/npre.2008.1721.2</dc:identifier>
      <dc:date>2008-08-04</dc:date>
      <dc:creator>Heather Piwowar</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2008-08-04T20:32:00Z</prism:publicationDate>
      <prism:category>Manuscript</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/1721/version/2/files/npre20081721-2.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Envisioning a data reuse registry</title>
      <link>http://dx.doi.org/10.1038/npre.2008.2152.1</link>
      <description>Repurposing research data holds many benefits for the advancement of biomedicine, yet is very difficult to measure and evaluate. We propose a data reuse registry to maintain links between primary research datasets and studies that reuse this data. Such a resource could help recognize investigators whose work is reused, illuminate aspects of reusability, and evaluate policies designed to encourage data sharing and reuse.</description>
      <guid>http://dx.doi.org/10.1038/npre.2008.2152.1</guid>
      <pubDate>Mon, 04 Aug 2008 20:13:00 UTC</pubDate>
      <dc:title>Envisioning a data reuse registry</dc:title>
      <dc:identifier>doi:10.1038/npre.2008.2152.1</dc:identifier>
      <dc:date>2008-08-04</dc:date>
      <dc:creator>Heather A. Piwowar</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2008-08-04T20:13:00Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/2152/version/1/files/npre20082152-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
  </channel>
</rss>
