Ensure you enter your valid MODELLER key via the BioSerf tab.

You input looks like it may be a multiple sequence alignment, please ensure it is in valid fasta format and that all sequences are the same length

PSIPRED HELP & TUTORIALS

The PSIPRED Protein Structure Analysis Workbench aggregates several UCL structure prediction methods into one location, allowing users to run a number of analyses simultaneously. The following document gives a brief description of the services and how to use additionally summarising the results the each analysis produces.

This guide is divided into three main sections. The first two sections explain the Input Form and the Results pages; the last section redirects to our Tutorials page, where a few cases are examined in more detail. You can view the input form at the main web page for the PSIPRED Server. You can also click here to view a fully interactive mock version of a typical results page.

CONTENTS

 

PSIPRED INPUT

 

The input form allows users to select the analyses they wish to perform and input their query sequence. There are a number of mandatory fields.

Choose Method

You must choose at least 1 method to run. If no method is chosen PSIPRED secondary structure prediction will run by default.

Input Sequence

Type your AMINO ACID sequence here. Please do not try to enter a nucleic acid sequence. We recommend that you enter your sequence as a plain single-letter string like this:

    ALGSNLNTPVEQLHAALKAISQLSNTHLVTTSSFYKSKPLGPQDQPDYVNAVAKIETEL

Alternatively, you can enter your sequence in FASTA format, but the description text will be ignored by the server.

Note that there is an upper limit to the length of sequences which can be submitted. For mGenTHREADER that limit is 1000 residues. For the other methods, the limit is 1500 residues. If your sequence is longer than this, try breaking it into likely domains before submitting it. Our DomPred server can help you in doing this.

You can also input a Multiple Sequence Alignment (MSA) in FASTA format, please be aware that not every method will run with MSA input.

Submission Details

Email Address

Enter your e-mail address here. Results will be returned as soon as they are available - usually within 40 minutes, though sometimes longer depending on the server load. Bear in mind that if you enter an incorrect e-mail address or do not provide and e-mail address. , there is no way the server can contact you! Also watch out that your anti-spam software isn't rejecting the messages from our server. You are not required to enter your email address but we recommend that users provide one.

Password

This field should be ignored if you are accessing the server from an academic site (i.e. a University). If you are a commercial user who has a current license to use the PSIPRED server then you should enter your password here. Please contact us if your password does not work for some reason. Note that if your e-mail address is commercial - e.g. ends .com or .co.uk - then you must enter your PSIPRED password in order to use the server. This applies even if you are an academic user who is using a private e-mail account. PSIPRED passwords are only granted to licensed users or commercial collaborators.

Short Identifier

Use this field to assign a short memorable name to your prediction job. This is useful so that you can identify particular jobs in your mailbox. This is particularly important because PSIPRED will not necessarily return your results in the order you submitted them! Generally speaking, shorter jobs will be returned first. The name you specify will be included in the subject line of the e-mail messages sent to you from the server. For example, here is a possible message header for a job called "MySeq":


        From: psipred@cs.ucl.ac.uk

        Date: Fri, 14 Jan 2002 14:55:39 GM

        To: Some.User@somesite.somewhere.edu

        Subject: PSIPRED Sequence analysis results for job ID:dfec480c-01fc-11e4-883f-00163e110593/MySeq

Filtering Options

Once you have filled in the main form you can switch tabs to select any filtering options. To reduce the false positive rate of fold recognition methods, particularly when applied to long sequences, it is important that biased regions of the target sequence are filtered out before the prediction is carried out. The PSIPRED server uses the PFILT program to perform the masking and has 3 filtering options, which will filter out low complexity regions, likely transmembrane segments and coiled-coil regions. The default setting is for just low-complexity regions of the sequence to be masked out. Regions which are masked out will be replaced with 'X' (unknown) residues.

Obviously, if you filter out transmembrane helices and then try to use MEMSAT3 to predict the transmembrane topology, you will not get sensible results. For GenTHREADER and mGenTHREADER we recommend turning on all filtering if you are expecting matches to globular proteins.

DOMPRED Options

If you have selected a DOMPRED job then the DOMPRED tab will appear in the input form. DOMPRED runs 2 independent protein structural domain prediction algorithms, DOMPRED and DomSSEA. This tab allows you to control options for both methods

PSI-BLAST sequence alignment domain prediction

The PSI-BLAST sequence alignment domain prediction searches the query sequence against a large database of sequences (nrdb90), including sequences from Pfam-A.

Pfam-A search
Domain sequences from Pfam-A are searched against the query sequence, and if significant sequence matches are found (as defined by the chosen E-value cut-off), this is indicated on the DomPred results page. A separate table displaying such hits accessible from the results page. 

Query vs sequence database
In cases where no clear homology exists to known domain sequences, such as Pfam-A domain sequences, a different strategy is required. Here, the query sequence is searched against a non-redundant sequence database (nrdb90), utilising the given parameters specified in the input form to identify significantly matching sequence homologues. These matching sequences are then used to identify possible domain boundaries within the query sequence (and therefore predict if single or multi-domain).
The domain boundary prediction procedure utilises an algorithm to identify residue positions to which the N and C termini of matching database hits are aligned to the query sequence. The positions of the N and C termini from all the PSI-BLAST database matches are simply summed along the query sequence. Cases where both N and C termini hits are found in similar regions along the query sequence are given a higher weighting.
The summed profile is then smoothed using a window of 15 residues, and Z-score's calculated over this profile. Significant peaks (Zscore>1.5) over the mean termini value of the query are assigned as putative domain boundaries. Termini hits to the first and last 50 residues of the alignment profile are not considered as these regions often contain a large multiple of alignment termini that correspond to the true termini ends of the query sequence.
The alignment profile generated by the PSI-BLAST alignments (and drawn by gnuplot) is shown at the top of the results page. Putative domain boundaries are indicated by peaks in the plot. Peaks considered to be significant by the algorithm are indicated.
In cases where significant peaks are found, and the query sequence is predicted to be multi-domain, multi-domain predictions given by DomSSEA are given higher significance.

Input E-value cut-off (default 0.01)
Optimisation of the PSI-BLAST sequence alignment domain prediction showed an E-value cut-off of 0.01 to give the best trade-off between the sensitivity and selectivity (define?) of domain boundary prediction. Decreasing the E-value (ie reducing the number of 'significant' aligned sequences) was found to reduce sensitivity however increase the selectivity of domain boundary prediction.

Input number of PSI-BLAST iterations (default 5)
The default number of PSI-BLAST iterations used is 5. Decreasing the iteration number may increase the speed of the PSI-BLAST search, but my also result in the failure to identify more distant homologues. The user should be aware that the higher the iteration value the higher the risk of introducing profile wander into the PSI-BLAST sequence search.

DomSSEA Prediction

This is constitutively turned on

DOMPRED PSIPRED options

You can also select whether the DOMPRED analysis also performs a PSIPRED secondary structure prediction and displays those results.

DISOPRED Options

The DISOPRED options allow the user to control the underlying sensitivity by controlling the False Positive Rate and also whether a PSIPRED secondary structure prediction should be included.

Additionally users can control if the analysis allows them to download the underlying PSI-BLAST output.

 

 

BioSerf Options

BioSerf is a fully automated homology modelling pipeline which uses MODELLER to construct a final homology model. Because of the licence terms if you select a BioSerf job you are required to provide the MODELLER Key available from the Sali Lab.

 

 

RESULTS

 

The PSIPRED server produces a large number of differing results pages. Here we briefly describe these outputs. At any point you can follow this link try the static example results to explore the functionality of the results pages.

Sequence Summary Page

The results summary page is the main output page for PSIPRED server sequence results. This gives a brief summary of the results returned as annotated on the sequence you have submitted to the server. At the top of the page the Job ID details are listed including the short identifier you provide for the job and the unique private ID assigned by our server. Below this the series of tabs allow you to view the specific outputs for each analysis that was run. The Summary page is then divided in to 3 sections

Secondary Structure Map/TM Helix Map

The first region lays out the query sequence and annotates the residues as per the key. If you have run a PSIPRED job residues will be annotated as per the predicted secondary structure. If you have run a MEMSAT, MESATSVM or MEMPACK job residues will be annotated as per the location of predicted TM Helices. If you have run both types of analysis you can toggle between these annotations with the appropriate buttons. Also note that if a DISOPRED or DOMPRED job has been run then predicted disordered residues and any putative domain boundaries will be marked. Please note that all domain boundaries will be annotated, this is not to imply they are all always simultaneously applicable.

Sequence Resubmission

This sequence of the summary page allows you to resubmit your sequence or a subsequence of it for further analysis. First use the slider to select the sequence region you wish to resubmit (or input the linear coordinates in the Start and Stop boxes). Next Click the 'Select Methods' button. This will bring up a panel that allows you to select new analysis methods for you sequence or sub-sequence. Finally click the new "Resubmit" button to submit a new job to the server. One obvious use would be to resubmit domain subsequences after running a DOMPRED job.

GenTHREADER, pDomTHREADER or pGenTHREADER Summary

The final, lower section of the Summary Page presents a simple alignment cartoon of any GenTHREADER hits you have found if you also ran a GenTHREADER, pDOMTHREADER or pGenTHREADER analysis. Each hit is laid out as per the region on your query sequence that it hit. With the left hand side of the cartoon being the 1st residue and the right hand border being the final residue. Each row represents each structural hit calculated by one of the GenTHREADER methods. The PDB chain ID or CATH domain ID appears at the left. Each bar is coloured as per the GenTHREADER confidence regions. If you mouse over any of the hits a further summary of the alignment is given. On the right hand side of each row you can select to have a simple homology model build for that structural alignment with your query sequence. This only work if you provide a valid MODELLER key.

BioSerf Output

If you provide a valid MODELLER key you will have been able to run a BioSerf job. BioSerf is a fully automated homology modelling service which integrates PSI-BLAST, HHBlits, PSIPRED, GenTHREADER and MODELLER. The final output is a PDB file which can be viewed by clicking the BioSerf tab on the results page. The file is viewed using the Jmol plugin and requires that your web browser has java enabled and installed. All standard Jmol commands can be used to explore the structure.

 

 

 

DISOPRED Output

If you asked for disordered region predictions, the DISOPRED tab will be available with the disorder profile plot. The graph shows the DISOPRED3 disorder confidence levels against the sequence positions as a solid blue line. The grey dashed horizontal line marks the threshold above which amino acids are regarded as disordered. For disordered residues, the orange line shows the confidence of disordered residues being involved in protein-protein interactions. The Summary Tab annotates this information on the query sequence.

 

 

DOMPRED Output

Clicking the DOMPRED tab brings up the DOMPRED output. This output is divided in to 2 sections. The DOMPRED output and the DOMSSEA output. The DOMPRED output shows the graph output by the PSI-BLAST aligned termini algorithm. The graph annotates secondary structure regions, peaks in the aligned termini profile indicate regions that may form a Structural domain boundary. The putative domain boundaries are listed in the summary statistics immediately below the graph.

Below the PSI-BLAST summary is the DomSSEA table. In this method SCOP structural domains are matched to the query sequence. Where more than one domain matches sequentially on the query sequence it can be possible to predict a possible domain boundary.

All the possible domain boundaries are annotated on the query sequence available via the Summary Tab.

FFPred Output

The FFPred tab gives a summary of the FFPred output. FFPred attempts to predict GO terms for eukaryotic proteins using a series of SVM. The top of the page gives two tables which summarise these predictions. The First table, labelled Strict, gives only terms which were predicted using SVMs trained by the most strict methodology. In this case the training data for the SVMs included only UniProt proteins whose GO Terms were confirmed using the IDA evidence code (Inferred By Direct Assay). The table then summarises the scoring for each term giving the Ontology (Molecular Process or Biological Function), the probability and the reliability of the SVM. SVMs are regarded as reliable when there selectivity, sensitivity and correlation coefficients all lie above strict critical thresholds.

The second table gives predictions based on more broadly trained SVMs. In this prediction all annotations in UniProt given all evidence codes were included. This table provides a broader set of predictions but may be considered to be less reliable. Below the tables are summaries of the features that were calculated for the incoming query sequence and were classified by the SVMs.

GenTHREADER Outputs

The GenTHREADER, DomTHREADER and pGenTHREADER tabs all link to tables of the output statistics for each GenTHREADER job. Each table show the number of structural hits for the query sequence. These are full PDB chains for GenTHREADER and pGenTHREADER and CATH domains for pDomTHREADER. For each structure the first portion of the table gives summary statistics

  • Conf. : The hit confidence category based on p-value; GUESS (<1), LOW (<=0.1), MEDIUM (<=0.01), HIGH (<=0.001), CERT (<=0.0001)
  • Net Score: The GenTHREADER raw score
  • P-Value : The p-value
  • Pair E: The Pairwise Energy
  • Solv E: The solvation Energy
  • Aln Score: The Pairwise alignment score
  • Aln Len: The length of the alignment
  • Str Len: The length of the structural hit
  • Seq Len: The length of the query sequence
  • Domain Start: The start of the domain (pDomTHERADER only)
  • Domain End: The end of the domain (pDomTHERADER only)
  • Domain Code: The CATH code for the domain hit (pDomTHREADER only)

The latter portion of the table links out to other resources and has the following columns

  • View Alignment: A button that opens JalView to view an annotated alignment. Known ligand binding residues are annotated on the hit
  • SCOP Codes: A link that searches SCOP for the PDB chain (genTHREADER and pGenTHREADER only)
  • CATH Codes: A link that searches CATH for the PDB chain (genTHREADER and pGenTHREADER only)
  • Structure: A thumbnail image of the hit, clicking the link will take you to PDBSum
  • CATH Entry: A link that searches CATH web services to summarise the hit.

MEMSAT-SVM Output

In the MEMSATSVM tab there are several diagrams and reports which summarise the MEMSAT-SVM output. Importantly MEMSAT-SVM jobs also run MEMSAT3 which allows you to compare the prediction with both methods. The first diagram shows a cartoon of the MEMSATSVM and MEMSAT3 TM helix predictions. MEMSATSVM predictions now include a prediction of pore-lining helices. The key for the schematic can be found at the bottom of the diagram. Below the schematic are the traces for the assorted SVM outputs that the MEMSATSVM prediction was based on. Further down the page are a series of cartoon diagrams of the membrane topology annotated with the predicted helix coordinates. Finally at the bottom of the page are the output reports from both the MEMSAT3 and MEMSATSVM methods.

MEMPACK Output

If you select a MEMPACK job the MEMPACK tab will take you to the diagram of transmembrane helix packing which mempack outputs. Running a MEMPACK job will also run a MEMSATSVM job. The MEMPACK output shows a top down diagram of the possible packing of the predicted transmembrane helices. Possible residues contacts are predicted between each helix then the helices are arranged and oriented to maximise the number of helix contacts that face one another.

 

PSIPRED Output

The last analysis page gives the PSIPRED diagrammatic output. These diagrams annotate the query sequence with secondary structure cartoons and confidence value at each position in the alignment. The confidence is given as a series of blue bar graphs.

Downloads

The final tab offers any plain text and ancillary downloads for each of the methods you have chosen. These are broken up in sections as per each analysis method.

 

TUTORIALS

Finally, you can find examples of use of the PSIPRED server at our Tutorials page.