McGuffin home> The Genomic Threading Database
Liam J. McGuffin & David T. Jones
Jones home>
Description

Genomic
Threading
Database
Help Page

The Genomic Threading Database (GTD) contains structural annotations of proteomes, translated from the genomes of key organisms. Annotations are made using a modified version of our recently developed GenTHREADER software.

The GTD is part of the e-Protein project. More...

Number of annotated genomes: 261

Number of annotated sequences: 1,219,063

Number of aligned residues: 265,673,588


For queries regarding the GTD: l.mcguffin@cs.ucl.ac.uk


Options:

Please cite the following references:

The GTD
  • McGuffin, L. J., Bryson, K., Street, S. A., Sorensen, S. A. & Jones, D. T. (2004) The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms. Nucleic Acids Res., 32, D196-D199.
  • McGuffin, L. J., Street, S. A., Sorensen, S. A. & Jones, D. T. (2004) The Genomic Threading Database. Bioinformatics, 20, 131-132.

GenTHREADER
  • McGuffin, L. J., Jones, D. T. (2003) Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics, 19, 874-881.
  • Jones, D. T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J. Mol. Biol. 287: 797-815.


News:
  • 05/12/05: JYDE system used to carry out multi-site deep fold recognition on the Human proteome in under 24 hours. Over 500 CPUs were used from three independent domains. Results have been uploaded to the GTD and are ready for comparison with other databases.
  • 11/11/05: The latest version of the Human proteome (NCBI35.nov version from ENSEMBL) has been successfully annotated using the JYDE system incorporating clusters from Imperial and UCL. The latest profile-profile version of mGenTHREADER was used for the annotation to provide high quality sequence to structure alignments.
  • 24/08/05: Tetradon annotation uploaded.
  • 18/08/05: UniProt PDAS services (http://bioinf.cs.ucl.ac.uk:8000/servlet/pdas.pdasServlet2/das) have been combined to one managable datasource, which has now been validated and registered with the DAS registry (http://das.sanger.ac.uk/registry/)
  • 16/08/05: Chimp, dog, bee and chicken annotations added to the DSN lists of PDAS servlets. PDAS searches have also been improved and should be about 10 times faster.
  • 28/07/05: Chicken annotation uploaded.
  • 05/07/05: Chimp annotation uploaded.
  • 27/06/05: Annotations of a further 16 bacterial genomes have been uploaded. More soon...
  • 14/06/05: Annotations of a further 19 bacterial genomes have been uploaded. More soon...
  • 04/03/05: Due to popular demand machine readable output is now available through "Keyword search". Click "Machine readable" checkbox in "Output Options".
  • 28/01/05: Annotations of a further 12 bacterial genomes have been uploaded. More soon...
  • 28/01/05: An alternative Protein DAS servlet which uses UniProt Accession IDs has been implemented (http://bioinf.cs.ucl.ac.uk:8000/servlet/pdas.pdasServlet2/das)
  • 05/10/04: Annotations of a further 6 bacterial genomes have been uploaded.
  • 01/10/04: Annotations of a further 26 bacterial genomes have been uploaded. More on the way soon...
  • 17/08/04: Protein DAS servlet integrated into the GTD (http://bioinf.cs.ucl.ac.uk:8000/servlet/pdas.pdasServlet/das) - annotations for human, rat, mouse, mosquito and zebrafish are now viewable through ENSEMBL.
  • 29/07/04: BLAST search of sequences within GTD implemented.
  • 12/03/04: Annotations for the following organisms have been updated: Drosophila, budding yeast, fission yeast, worm and rice.
  • 12/03/04: Danio rerio and new version of Human annotation uploaded.
  • 11/03/04: Rattus norvegicus annotation uploaded.
  • 08/03/04: Annotations of a further 12 bacterial genomes have been uploaded.
  • 10/02/04: Guillardia theta annotation uploaded.
  • 10/02/04: Plasmodium yoelii annotation uploaded.
  • 09/02/04: Neurospora crassa annotation uploaded.
  • 05/02/04: Caenorhabditis briggsae annotation uploaded.
  • 04/02/04: Ciona intestinalis annotation uploaded.
  • 03/02/04: Annotations of a further 22 bacterial genomes have been uploaded.
  • 23/01/04: Arabidopsis thaliana annotation uploaded.
  • 21/01/04: Annotations of 18 new bacterial genomes have been uploaded and 25 have been updated. Please check the new summary pages for dates when annotations have been carried out.
  • 20/01/04: The whole database has now been transferred to a new faster server.
  • 09/01/04: Downloadable, machine readable GTD lists are now available - click on the link above.
  • 24/11/03: mGenTHREADER test runs are now complete for both Plasmodium falciparum and Bacillus anthracis. Annotations have been uploaded for testing.
  • 10/09/03: Models are now generated from alignments - see help page for further details.
  • 31/07/03: Summary view now includes fold frequencies.
  • 09/07/03: Updated summary view.
  • 17/06/03: Recalculation of p-values and reassignment of confidence categories.
  • 15/05/03: Annotations of 100 bacterial proteomes have been uploaded.
  • 10/03/03: Budding yeast annotation uploaded.
  • 06/03/03: Fission yeast annotation uploaded.
  • 03/03/03: Puffer fish annotation uploaded.
  • 25/02/03: Worm, fruitfly and rice annotations have now been uploaded.
  • Feb 2003: New site now online. Human, mouse and mosquito annotations have been uploaded for testing.

Overview of GenTHREADER, e-Protein and the GTD:

GenTHREADER is a fast and powerful protein fold recognition method, which can be applied to either whole, translated genomic sequences (proteomes), as in the case of the GTD, or individual protein sequences, as in the case of the PSIPRED server.

The GenTHREADER protocol has recently been updated so that it now makes use of FSSP structural alignment profiles, PSI-BLAST profiles and predicted secondary structure (using PSIPRED). This increases both the sensitivity of the method and enhances the accuracy of alignments, but means that it is also slower than the normal GenTHREADER method.

However, with colleagues in computer science we are prototyping grid technology in order to speed up the computationally intensive process of reliably annotating whole proteomes. (Click here to see a demo of distributed GenTHREADER)

We are also working in conjuction with Imperial College and the European Bioinformatics Institute as part of the e-Protein project. This project aims to develop a distibuted pipeline for proteome annotation using grid middleware and in house software developed at each site. The resulting annotations will be deposited in MySQL-based relational databases, such as the GTD, hosted at each site. These databases will eventually be connected through a single interface using DAS technology.

NOTE that the GTD is freely accessible to academic and non-profitmaking sites ONLY. Commercial users wishing to access the database should contact us for further information.


McGuffin, L. J., Street, S., Sorensen, S. A. & Jones, D. T. (2004) The Genomic Threading Database. Bioinformatics, 20, 131-132.
UCL home | Bioinformatics home | McGuffin home | Jones home |