doi:10.1038/npre.2009.3155.1
1 vote

GENCODE: Creating a Validated Manually Annotated Geneset for the Whole Human Genome

A. Bignell1, A. Frankish1, B. Aken1, M. Diekhans2, F. Kokocinski1, M. Lin3, M. Tress4, J. Van Baren5, I. Barnes1, T. Hunt1, D. Carvalho-Silva1, C. Davidson1, S. Donaldson1, J. Gilbert1, E. Hart1, M. Kay1, R. Kinsella1, D. Lloyd1, J. Loveland1, J. E. Mudge1, C. Snow1, J. Vamathevan1, L. Wilming1, M. Brent5, M. Gerstein6, R. Guigó7, R. Harte2, M. Kellis3, S. Searle1, J. Harrow1 & T. Hubbard1

Correspondence: (Login to view email address)

  1. Wellcome Trust Sanger Institute
  2. Center for Biomolecular Science and Engineering, University of California, Santa Cruz
  3. MIT Computer Science and Artificial Intelligence Laboratory, Broad Institute of MIT and Harvard
  4. Spanish National Cancer Research Centre (CNIO)
  5. Laboratory for Computational Genomics and Department of Computer Science, Washington University, St. Louis
  6. Department of Molecular Biophysics and Biochemistry, Yale University
  7. Centre for Genomic Regulation, Barcelona, Spain
Document Type:
Poster
Date:
Received 23 April 2009 10:51 UTC; Posted 23 April 2009
Subjects:
Genetics & Genomics, Bioinformatics
Tags:
Abstract:

The Human and Vertebrate Analysis and Annotation (HAVANA) group at the Wellcome Trust Sanger Institute produced the manually annotated geneset for the Encyclopedia of DNA Elements (ENCODE) pilot project and, as part of the Gencode subgroup, are reprising this role in the scale up to cover the whole human genome. Our manual annotation is checked computationally and validated experimentally. Loci and transcripts predicted to be absent from the initial annotation are identified by comparison with a number of state-of-the-art algorithms for identifying exons, splice sites, transcripts and pseudogenes. Where novel features are confirmed the annotation is updated. Annotated coding transcripts are analysed to assess their coding potential by investigating patterns of conservation within the coding sequence (CDS) and comparing predicted secondary structures of annotated CDSs to similar proteins with solved structures. Annotated coding transcripts are also checked against the current set of human Consensus CDSs (CCDS) to check agreement with other participating centres (EBI, NCBI, & UCSC).

An initial round of annotation and analysis of chromosomes 21 and 22 has shown that while HAVANA annotation is both comprehensive and robust, it has benefitted from computational review. 13 novel non-coding loci, 27 novel splice variants and 6 extensions to existing variants were identified, many of which were found using supporting EST/mRNA sequences that were not present at the time of initial annotation. Fewer than 10 annotated CDSs required reclassification, no CCDS sequences required updating and 26 novel pseudogene were added. The annotation of human chromosome 2 is complete and we are currently annotating chromosomes 3 and 7. Data from all members of Gencode is distributed via DAS and is now visible in our Zmap annotation interface, allowing assessment of computational predictions contemporaneous with first-pass gene annotation.

Collection:
3rd International Biocuration Conference
Presented at:
3rd International Biocuration Conference, 16 April 2009

Discussion

Votes:

1 vote

(Login to vote)

Comments:

0 comments

(Login to post a comment)

(Login to share with a colleague)

Additional information

License:
This document is licensed to the public under the Creative Commons Attribution 3.0 License
How to cite this document:

Bignell, A., Frankish, A., Aken, B., Diekhans, M., Kokocinski, F., Lin, M., Tress, M., Van Baren, J., Barnes, I., Hunt, T., Carvalho-Silva, D., Davidson, C., Donaldson, S., Gilbert, J., Hart, E., Kay, M., Kinsella, R., Lloyd, D., Loveland, J., Mudge, J., Snow, C., Vamathevan, J., Wilming, L., Brent, M., Gerstein, M., Guigó, R., Harte, R., Kellis, M., Searle, S., Harrow, J., and Hubbard, T.. GENCODE: Creating a Validated Manually Annotated Geneset for the Whole Human Genome. Available from Nature Precedings <http://dx.doi.org/10.1038/npre.2009.3155.1> (2009)

Version info:

Other versions of this document in Nature Precedings

None.

Other versions of this document elsewhere on the web

None known.

Participate

Related Documents

Advertisement