GENCODE: Creating a Validated Manually Annotated Geneset for the Whole Human Genome
Correspondence: (Login to view email address)
- Wellcome Trust Sanger Institute
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz
- MIT Computer Science and Artificial Intelligence Laboratory, Broad Institute of MIT and Harvard
- Spanish National Cancer Research Centre (CNIO)
- Laboratory for Computational Genomics and Department of Computer Science, Washington University, St. Louis
- Department of Molecular Biophysics and Biochemistry, Yale University
- Centre for Genomic Regulation, Barcelona, Spain
PDF (601.5 KB)
- Document Type:
- Poster
- Date:
- Received 23 April 2009 10:51 UTC; Posted 23 April 2009
- Subjects:
- Genetics & Genomics, Bioinformatics
- Abstract:
The Human and Vertebrate Analysis and Annotation (HAVANA) group at the Wellcome Trust Sanger Institute produced the manually annotated geneset for the Encyclopedia of DNA Elements (ENCODE) pilot project and, as part of the Gencode subgroup, are reprising this role in the scale up to cover the whole human genome. Our manual annotation is checked computationally and validated experimentally. Loci and transcripts predicted to be absent from the initial annotation are identified by comparison with a number of state-of-the-art algorithms for identifying exons, splice sites, transcripts and pseudogenes. Where novel features are confirmed the annotation is updated. Annotated coding transcripts are analysed to assess their coding potential by investigating patterns of conservation within the coding sequence (CDS) and comparing predicted secondary structures of annotated CDSs to similar proteins with solved structures. Annotated coding transcripts are also checked against the current set of human Consensus CDSs (CCDS) to check agreement with other participating centres (EBI, NCBI, & UCSC).
An initial round of annotation and analysis of chromosomes 21 and 22 has shown that while HAVANA annotation is both comprehensive and robust, it has benefitted from computational review. 13 novel non-coding loci, 27 novel splice variants and 6 extensions to existing variants were identified, many of which were found using supporting EST/mRNA sequences that were not present at the time of initial annotation. Fewer than 10 annotated CDSs required reclassification, no CCDS sequences required updating and 26 novel pseudogene were added. The annotation of human chromosome 2 is complete and we are currently annotating chromosomes 3 and 7. Data from all members of Gencode is distributed via DAS and is now visible in our Zmap annotation interface, allowing assessment of computational predictions contemporaneous with first-pass gene annotation.
- Collection:
- 3rd International Biocuration Conference
- Presented at:
- 3rd International Biocuration Conference, 16 April 2009
Discussion
- Votes:
-
1 vote
- Comments:
-
0 comments
- (Login to share with a colleague)
Additional information
- License:
- This document is licensed to the public under the Creative Commons Attribution 3.0 License
- How to cite this document:
-
Bignell, A., Frankish, A., Aken, B., Diekhans, M., Kokocinski, F., Lin, M., Tress, M., Van Baren, J., Barnes, I., Hunt, T., Carvalho-Silva, D., Davidson, C., Donaldson, S., Gilbert, J., Hart, E., Kay, M., Kinsella, R., Lloyd, D., Loveland, J., Mudge, J., Snow, C., Vamathevan, J., Wilming, L., Brent, M., Gerstein, M., Guigó, R., Harte, R., Kellis, M., Searle, S., Harrow, J., and Hubbard, T.. GENCODE: Creating a Validated Manually Annotated Geneset for the Whole Human Genome. Available from Nature Precedings <http://dx.doi.org/10.1038/npre.2009.3155.1> (2009)
- Version info:
-
Other versions of this document in Nature Precedings
None.
Other versions of this document elsewhere on the web
None known.