<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/" version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:media="http://search.yahoo.com/mrss/">
  <channel>
    <title>Nature Precedings - Tag feed for database curation</title>
    <link>http://precedings.nature.com/tags/database%20curation</link>
    <description>Recently posted documents tagged with 'database curation'</description>
    <dc:publisher>Nature Publishing Group</dc:publisher>
    <dc:language>en</dc:language>
    <prism:publicationName>Nature Precedings</prism:publicationName>
    <image>
      <title>Nature Precedings</title>
      <url>http://precedings.nature.com/images/header_logo.gif</url>
      <link>http://precedings.nature.com</link>
    </image>
    <atom:link type="application/rss+xml" rel="self" href="http://precedings.nature.com/tags/database%20curation/feed"/>
    <item>
      <title>H-InvDB release 6, a comprehensive annotation resource for human genes and transcripts</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3251.1</link>
      <description>H-Invitational Database (H-InvDB; http://www.h-invitational.jp/) is an integrated database of human genes and transcripts. By extensive analyses of all human transcripts, we provide curated annotations of human genes and transcripts that include gene structures, alternative splicing isoforms, non-coding functional RNAs, protein functions, functional domains, sub-cellular localizations, metabolic pathways, protein 3D structure, genetic polymorphisms, relation with diseases, gene expression profiling, molecular evolutionary features, protein-protein interactions (PPIs) and gene families/groups.  The latest release of H-InvDB (release 6.0) provide annotation for 219,765 human transcripts in 43,159 human gene clusters based on human FLcDNAs and mRNAs.H-InvDB consists of two main views, the Transcript view and the Locus view, and six auxiliary databases with web-based viewers; G-integra, H-ANGEL, DiseaseInfo Viewer, Evola, PPI view and Gene Family/Group view.  We also provides several data mining tools such as &#8220;Navi search&#8221;: consists of 16 search contents each of which includes items for the search condition (http://www.h-invitational.jp/hinv/c-search/hinvNaviTop.jsp), &#8220;PANDA&#8221;: Priority ANalysis for Disease Association (PANDA) system (http://www.h-invitational.jp/panda/app), H-InvDB now provides web service APIs of SOAP and REST to use H-InvDB data in programs. (http://www.h-invitational.jp/hinv/hws/doc/)</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3251.1</guid>
      <pubDate>Thu, 14 May 2009 21:30:10 UTC</pubDate>
      <dc:title>H-InvDB release 6, a comprehensive annotation resource for human genes and transcripts</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3251.1</dc:identifier>
      <dc:date>2009-05-14</dc:date>
      <dc:creator>Chisato Yamasaki</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-05-14T21:30:10Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <prism:section>Evolutionary Biology</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3251/version/1/files/npre20093251-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Literature Triage and Indexing in the Mouse Genome Informatics (MGI) Group</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3246.1</link>
      <description>The Mouse Genome Informatics (MGI; http://www.informatics.jax.org) group is comprised of several collaborating projects including the Mouse Genome Database (MGD) Project, the Gene Expression Database (GXD) Project, the Mouse Tumor Biology (MTB) Database Project, and the Gene Ontology (GO) Project. Literature identification and collection is performed cooperatively amongst the groups.In recent years many institutional libraries have transitioned from a focus largely on print holdings to one of electronic access to journals. This change has necessitated adaptation on the part of the MGI curatorial group. Whereas the majority of journals covered by the group used to be surveyed in paper form, those journals are now surveyed electronically. Approximately 160 journals have been identified as those most relevant to the various database groups. Each curator in the group has the responsibility of scanning several journals for articles relevant to any of the database projects. Articles chosen via this process are marked as to their potential significance for various projects. Each article is catalogued in a Master Bibliography section of the MGI database system and annotated to the database sections for which it has been identified as relevant. A secondary triage process allows curators from each group to scan the chosen articles and mark ones desired for their project if such annotation has been missed on the initial scan.Once articles have been identified for each database project a variety of processes are implemented to further categorize and index data from those articles. For example, the Alleles and Phenotype section of the MGD database indexes each article marked for MGD and in this indexing process they identify each mouse gene and allele examined in the article. The GXD database indexing process has a different focus. In this case articles are indexed with regard to the stage of development used in the study as well as the assay technique used. In each case the indexing gives an overview of the data held in the article and assists in the more extensive curation performed in the following step of the curation process. Indexing also provides each group with valuable information used to prioritize and streamline the overall curation process.The MGI projects are supported by NHGRI grants HG000330, HG00273, and HG003622, NICHD grant HD033745, and NCI grant CA089713.</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3246.1</guid>
      <pubDate>Thu, 14 May 2009 21:29:18 UTC</pubDate>
      <dc:title>Literature Triage and Indexing in the Mouse Genome Informatics (MGI) Group</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3246.1</dc:identifier>
      <dc:date>2009-05-14</dc:date>
      <dc:creator>Debra M. Krupke</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-05-14T21:29:18Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Cancer</prism:section>
      <prism:section>Developmental Biology</prism:section>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Immunology</prism:section>
      <prism:section>Molecular Cell Biology</prism:section>
      <prism:section>Neuroscience</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3246/version/1/files/npre20093246-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Reactome &amp;#8211; a knowledgebase of human biological pathways</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3200.1</link>
      <description>Reactome (http://www.reactome.org) is an expert-authored, peer-reviewed knowledgebase of human reactions and pathways that functions as a data mining resource and electronic textbook. Its current release includes 2921 human proteins, 2871 reactions and 4167 literature citations. This curated dataset is integrated with a functional interaction network assembled computationally from non-curated sources of information including protein-protein interactions, gene co-expression, and gene ontology annotations, providing access. A new entity-level pathway viewer and improved search and data mining tools facilitate searching and visualizing pathway data and the analysis of user-supplied high-throughput data sets.Reactome has increased its utility to the model organism communities with improved orthology prediction methods allowing pathway inference for 22 species and through collaborations to create manually curated Reactome pathway datasets for species including Arabidopsis, Oryza sativa (rice), Drosophila and Gallus gallus (chicken). Reactome&amp;#8217;s data content and software can all be freely used and redistributed under open source terms. Reactome instances are cross-referenced to corresponding ones in databases including EntrezGene, OMIM, Ensembl, UniProt, the UCSC Genome Browser, KEGG, ChEBI, and Gene Ontology.</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3200.1</guid>
      <pubDate>Sun, 03 May 2009 15:44:21 UTC</pubDate>
      <dc:title>Reactome &amp;#8211; a knowledgebase of human biological pathways</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3200.1</dc:identifier>
      <dc:date>2009-05-07</dc:date>
      <dc:creator>Bijay Jassal</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-05-03T15:44:21Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Chemistry</prism:section>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Immunology</prism:section>
      <prism:section>Molecular Cell Biology</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3200/version/1/files/npre20093200-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Curation of NISEED, an integrative framework for the digital representation of embryonic development</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3168.1</link>
      <description>NISEED (Network for In situ Expression and Embryological Data) is a generic infrastructure for the creation, maintenance and integration of molecular and anatomical information on model organisms. We applied it to ascidians which are marine invertebrate chordates. These animals constitute model organisms of choice for developmental biology because their embryos develop with a small number of cells and an invariant lineage, allowing their study with a cellular level of resolution.  In ANISEED (Ascidian NISEED), embryogenesis of ascidian is represented at the level of the genome via functional gene annotations, cis-regulatory elements or gene expression data, at the level of the cell by representing its morphology, fates, lineage, and relations with its neighbors, or at the level of the whole embryo by representing its anatomy and morphogenesis at successive developmental stages. The system provides also tool and standard to enter, annotate, curate and manage data. All results can be accessed through the ANISEED website at http://aniseed-ibdm.univ-mrs.fr</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3168.1</guid>
      <pubDate>Fri, 24 Apr 2009 16:14:03 UTC</pubDate>
      <dc:title>Curation of NISEED, an integrative framework for the digital representation of embryonic development</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3168.1</dc:identifier>
      <dc:date>2009-04-24</dc:date>
      <dc:creator>Delphine Dauga</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-04-24T16:14:03Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Developmental Biology</prism:section>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3168/version/1/files/npre20093168-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>Text mining for Swiss-Prot curation: A story of success and failure</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3166.1</link>
      <description>A text mining group has been set up at the Swiss Institute of Bioinformatics, with objective to develop and adapt information retrieval and extraction tools to help Swiss-Prot curators in their daily annotation work.  After over 7 year activities, this group has gathered a significant amount of experience about the need in text mining for biocuration.The first observation we made is that there is no &#8220;in-a-box&#8221; solution which can satisfy every needs. Each curator has his/her own strategy to find information from the literature and none of the existing information retrieval systems is able to compete with it, more for reason of habits than for reason of performance. Second observation: to be completely operative, an information retrieval system should be embedded in the annotation platform. For instance, it should be possible to copy/paste information, such as the article reference or some interesting sentences, directly in the database format. Most of the existing online programs are hardly adaptable for this task and their use usually results in additional editing efforts for the curators. From this observation, we can derive the fact that integrating text mining services is usually more costly than expected since wrappers and user interfaces need significant developments sometimes fairly user-specific.After noticing these problems in the design and use of a generic information retrieval system for the Swiss-Prot curators, we focused our effort on text mining applications for database update. The follow-up of the literature is essential in the process of database maintenance and there are needs for automatic information extraction tools on a large panel of topics. We developed several IE applications in the field of:- PTM information (phosphorylation, glycosylation, disulfide bridge)- Subcellular localization- Variant/mutation detection and characterization- New sequence with enzymatic activities- New characterization of enzymes.These tools are integrated into pipelines which follow PubMed daily outcomes and generate list of selected abstracts with highlights on the relevant sentences. These procedures are done independently of the usual annotation workflow, so that curators can mine these preselected data whenever they work on database entry updates.To conclude, we have identified big challenges in text mining services after discussion with the curators. One of them is the detection of novel information, especially those related to a new function or a new characterization of a protein or one of its close homologues. We are currently working on this task in the framework of the collaborative project &#8220;EAGL&#8221;. Another challenge is definitely the large-scale screening of newly published full-text papers to complement the often incomplete information in abstracts. This becomes more and more indispensable, not really for the annotation of widely studied &#8220;hot&#8221; proteins, but to find new data on uncharacterized ones. For instance, when no gene name has been attributed to a sequence, the only way to retrieve information is to use the orf names, which are never provided in abstracts.Finally, one should definitely stress that many of these information retrieval and extraction tasks could be greatly simplified with the requirement of metadata at the article submission time, such as an official HGNC gene name or a UniProt reference.</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3166.1</guid>
      <pubDate>Fri, 24 Apr 2009 14:31:04 UTC</pubDate>
      <dc:title>Text mining for Swiss-Prot curation: A story of success and failure</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3166.1</dc:identifier>
      <dc:date>2009-04-24</dc:date>
      <dc:creator>Anne-Lise Veuthey</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-04-24T14:31:04Z</prism:publicationDate>
      <prism:category>Presentation</prism:category>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3166/version/1/files/npre20093166-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
    <item>
      <title>IMGT/GENE-DB: genomic reference sequences for human and mouse IG and TR genes and alleles</title>
      <link>http://dx.doi.org/10.1038/npre.2009.3158.1</link>
      <description>The immunoglobulin (IG) and T cell receptor (TR) major loci span about 6 Megabases (Mb) of the human genome on chromosomes 2, 7, 14 and 22, and 9 Mb in mouse on chromosomes 6, 12, 13, 14 and 16. There are seven major loci: three IG loci (IGH, IGK, IGL) and four TR loci (TRA, TRB, TRG, TRD), with a distinct repartition of the variable (V), diversity (D), joining (J) and constant (C) genes. The human genome comprises a total number of 608-665 IG and TR genes (371-422 IG and 237-243 TR), depending on the haplotypes, per haploid genome 1, 2 of which 531-588 genes are located in the major loci (distributed in 369-418 V, 32 D, 105-109 J and 25-29 C genes). There are also 77 orphons (68 IG and 9 TR) including two processed IG genes, outside the major loci. The number of functional IG and TR genes is 308-356 (136-171 IG and 172-185 TR) per haploid genome. The mouse genome comprises an approximate number of 876 IG and TR genes (624 IG and 252 TR). All these genomic data are available in the IMGT&#174; gene database, IMGT/GENE-DB 3. The major contribution of IMGT/GENE-DB has been to establish, for the first time, a standardized nomenclature of the IG and TR genes and alleles of humans and other vertebrates. In April 2009, IMGT/GENE-DB manages 1999 genes and 3026 alleles. [1] Lefranc M.-P. and Lefranc G., The Immunoglobulin FactsBook, Academic Press, London, 458 pages (2001).[2] Lefranc M.-P. and Lefranc G., The T cell receptor FactsBook, Academic Press, London, 398 pages (2001).[3] Giudicelli V. et al. Nucleic Acids Res., 33, D256-261 (2005).</description>
      <guid>http://dx.doi.org/10.1038/npre.2009.3158.1</guid>
      <pubDate>Thu, 23 Apr 2009 17:57:23 UTC</pubDate>
      <dc:title>IMGT/GENE-DB: genomic reference sequences for human and mouse IG and TR genes and alleles</dc:title>
      <dc:identifier>doi:10.1038/npre.2009.3158.1</dc:identifier>
      <dc:date>2009-04-23</dc:date>
      <dc:creator>Fatena Bellahcene</dc:creator>
      <prism:publicationName>Nature Precedings</prism:publicationName>
      <prism:publicationDate>2009-04-23T17:57:23Z</prism:publicationDate>
      <prism:category>Poster</prism:category>
      <prism:section>Genetics &amp; Genomics</prism:section>
      <prism:section>Immunology</prism:section>
      <prism:section>Bioinformatics</prism:section>
      <media:thumbnail url="http://precedings.nature.com/documents/3158/version/1/files/npre20093158-1.pdf.thumb.png"/>
      <creativeCommons:license>http://creativecommons.org/licenses/by/3.0/</creativeCommons:license>
    </item>
  </channel>
</rss>
