Overview of Prediction Methods

Predict Secondary Structure (PSIPRED)

PSIPRED is a  simple and accurate secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST (Position Specific Iterated - BLAST). Using a very stringent cross validation method to evaluate the method's performance, PSIPRED 3.2 achieves an average Q3 score of 81.6%. Predictions produced by PSIPRED were also submitted to the CASP4 evaluation and assessed during the CASP4 meeting, which took place in December 2000 at Asilomar. PSIPRED 2.0 achieved an average Q3 score of 80.6% across all 40 submitted target domains with no obvious sequence similarity to structures present in PDB, which ranked PSIPRED top out of 20 evaluated methods (an earlier version of PSIPRED was also ranked top in CASP3 held in 1998). It is important to realise, however, that due to the small sample sizes, the results from CASP are not statistically significant, although they do give a rough guide as to the current "state of the art". For a more reliable evaluation, the EVA web site at Columbia University provides a continuous evaluation. NOTE that at the time of writing, the EVA site is no longer being updated. Downloads: The PSIPRED V3.2 software can be downloaded from HERE. Please note that you should read the license terms given in the README file if you wish to incorporate PSIPRED in another program or Web server. Older releases of PSIPRED can be downloaded here HERE.

MEMSAT3 : Transmembrane Topology Prediction

MEMSAT V3 is a widely used all-helical membrane protein prediction method MEMSAT. The method was benchmarked on a test set of transmembrane proteins of known topology. From sequence data MEMSAT was estimated to have an accuracy of over 78% at predicting the structure of all-helical transmembrane proteins and the location of their constituent helical elements within a membrane. Academic users can download MEMSAT3 code here.

MEMSATSVM : Transmembrane helix prediction

MEMSATSVM is highly accurate predictor of transmembrane helix topology. It is capable to discriminating signal peptides and identifying the cytosolic and extra-cellular loops. Users can download MEMSATSVM from here.

MEMPACK : Transmembrane helix contact prediction

MEMPACK is a membrane helix packing predictor. The process leverages MEMSATSVM predictions to predict possible inter-helix interactions. The final step a helix packing is produced that orients the helices such that the greatest number of predicted interactions face one another .Users can download MEMPACK from here.

GenTHREADER : Fold Recognition

GenTHREADER is a fast and relatively powerful fold recognition method, which can be applied to either whole, translated genomic sequences (proteomes) as in the case of the GTD or individual protein sequences as in the case of the PSIPRED server. It is not as sensitive at mGenTHREADER but is much faster.

pGenTHREADER: Fold Recognition

This method is now our recommended method for fold recognition and identification of distant homologues. Essentially it is the based on the original GenTHREADER method, but makes use of profile-profile alignments and predicted secondary structure (using PSIPRED) as inputs. This increases both the sensitivity of the method and enhances the accuracy of alignments, but also makes it much slower than the normal GenTHREADER method as PSI-BLAST needs to be run on the target sequence before the search can begin.

pDomTHREADER: Domain Recognition

pDomTHREADER is an accurate and sensitive superfamily discrimination, combining information from both sequence and structure to produce highly accurate domain alignments. The method employs the same underlying threading algorithm as pGenTHREADER, however it aligns sequences to a domain-based template library rather than a chain-based template library. The use of smaller regions of structure for templates means that different features of the alignments are required for optimal scoring. The final prediction score results from an SVM trained on a combination of 5 different feature inputs; template coverage, alignment score, template length, solvation and pairwise potentials. Compared with other superfamily discrimination methods using Hidden Markov Models and PSI-BLAST profile alignments, we found that pDomTHREADER provided higher coverage on the CATH S35 superfamilies. Additionally, pDomTHREADER produced more accurate alignments that can be used to better predict domain boundaries. For more information regarding the method, please consult the reference above. Please note that the pDomTHREADER method is tuned for performance in fine superfamily discrimination, for fold recognition problems or structural annotation of very distant sequences, pGenTHREADER should be used.

DomPred & DOMSSEA : Domain Boundary Prediction

DomPred is a protein structural domain boundary predictor. The DomPred process runs 2 independent protein domain predictors; DomPred and DOMSSEA. The DomPred process begins by using PSI-BLAST to match a database of Pfam-A domains to the query sequence, where not clear domains can be match it then proceeds to search the nrdb90 sequence database with PSI-BLAST. The final prediction is procduced by analysing the locations of all the N and C boundaries for each hit. For the DOMSSEA process predicted secondary structure patterns in the query sequence are matched to a library of SCOP domain secondary structure patterns.

DISOPRED3: Protein intrinsic disorder prediction

DISOPRED3 represents the latest release of our successful machine-learning based approach to the detection of intrinsically disordered regions. The method was originally trained on evolutionarily conserved sequence features of disordered regions from missing residues in high-resolution X-ray structures. DISOPRED2 mainly addressed the marked class imbalance between ordered and disordered amino acids as well as the different sequence patterns associated with terminal and internal disordered regions using SVMs. DISOPRED3 extends the previous architecture with two independent predictors of intrinsic disorder - a neural network and a nearest neighbor classifier - which were trained to identify long intrinsically disordered regions using data from the PDB and DisProt databases. The intermediate results are integrated by an additional neural network. DISOPRED3 was blindly tested and compared during the ninth and tenth rounds of the world-wide CASP experiments, where it was found to achieve high levels of specificity (about 99%) and therefore precision (about 75%). Indeed, the official assessment teams ranked DISOPRED3 at the top or near the top across a number of tests and evaluation measures. To provide insights into the biological roles of proteins, DISOPRED3 also predicts protein binding sites within disordered regions using a SVM that examines patterns of evolutionary sequence conservation, positional information and amino acid composition of putative disordered regions. Using a stringent test set, DISOPRED3 predictions were found to improve over existing methods, achieving approximately 20% precision and 30% recall. These results highlight the need for additional efforts in the area.

FFPred 3 : GO Term prediction

FFPred 3 is a predictor (based on Support Vector Machines) for protein GO term annotations. It is specifically trained to predict terms for Human and other eukaryotic sequences when GO terms can not be predicted by other means. An incoming protein sequence is analysed by a large suite of protein physico-chemical property predictors covering many features such as signal peptides, membrane helices, secondary structure, disorder. Then these features act as inputs for a large set of SVM models, one for each GO term to be predicted. At the end all high scoring SVMs are aggregated in the FFPred tab of the results page. See the FFPred2 publication here - FFPred 3 is a more recent version of the tool, that can provide predictions for all Gene Ontology domains (Biological Process, Molecular Function, Cellular Component).


BioSerf is a fully automated homology modelling server. The process runs 3 template selection methods; PSIBLAST against PDB fasta, pGenTHREADER and HHBlits against the PDB. The best scoring matches given conservative cut-offs are then aggregated. All-by-all TM scores are then calculated for the full set of putative templates and the matrix of scores is analysed to remove any possible outlying templates whose structure is too dissimilar to the full set of templates. Finally the best 10 templates are selected from the remaining templates and a model is built using MODELLER.