The FFSearch tool for feature-based sorting of proteins


FFSearch aggregates several tools that can be used to sort proteins present in its database according to their similarity to a given protein, or list of proteins, submitted by the user.

Please see 'Overview' tab below for a more detailed description.


Current contributors: David T. Jones, Federico Minneci

For queries regarding FFSearch: email psipred@cs.ucl.ac.uk and include 'FFSearch' in the subject.


INPUT - Single Input Sequence (as a single amino acidic sequence, raw or FASTA format)


Any valid amino acidic sequence between 15 and 1500 aa in length can be inserted here:

If you wish to test these services follow this link to retrieve a test fasta sequence.
 

INPUT - Alternatively: Input List of protein identifiers (UniProt AC numbers, one per line)


Only proteins from an organism present in our databases can be used:
Model Organism for this
list of input proteins:
Human (Homo sapiens)
Mouse (Mus musculus)
    
If you wish to test these services follow this link to retrieve a test list (select 'Human' above).
 

Choose Analysis Method

Distance-based ranking Ranking based on GO term probability Classification of query sequence
BLAST distance (using bl2seq) Probability given by FFPred v2.0's SVMs Naive Bayesian classification
Feature-based Euclidean distance    
Feature-based Manhattan distance    
 

OUTPUT - Choose Model Organism and Maximum Number for proteins displayed in the output

    
Model Organism for displayed proteins:
Human (Homo sapiens)
Mouse (Mus musculus)
Maximum number of
proteins to be displayed:
 

Submission Details

 
Email Address for job completion alert (optional):
 
Short identifier for submission:
 

Select the GO term whose Support Vector Machine needs to be used to sort the outputs.

Insert your positive and negative examples here (UniProt AC numbers only, one per line).


Positives: Negatives:

Using FFSearch.


In order to use FFSearch, simply click on the 'Main Input' tab above and fill the form, according to the following instructions.

Firstly, please enter either a protein sequence (one amino acidic sequence only, either as a raw sequence or in FASTA format) or a list of UniProtKB/Swiss-Prot AC numbers corresponding to example proteins from one of our supported proteomes (see 'NOTE' below). This is your 'INPUT': the protein(s) will be compared to the target proteome you later specify in the 'OUTPUT' options.
Then, please choose the desired analysis type. Note that if you choose the analysis that uses the BLAST-based distance, you can specify only one protein (either a sequence or one valid UniProtKB/Swiss-Prot AC number) in your 'INPUT' section. This is obviously due to the fact that BLAST is not using our feature-based method - thus, this method requires one amino acidic sequence that will be aligned to the target proteome.
Finally, you can select which proteome you want to screen your protein(s) against, and how many proteins you want to visualise in the results page ('OUTPUT' options).

A short identifier is needed to label your submission - please keep it short and simple (any spaces will be replaced by underscore characters '_'). Moreover, if you indicate your email address you will receive an automated email as soon as your job has completed, containing a link to the appropriate result page (this is not required, but highly recommended).
At the bottom of the form you will find the button for submitting your job, together with the button you can use to reset the form (including any tabs opened during the submission).


NOTE:

When submitting an input list of identifiers, only proteins that are included in our database (for the chosen organism) can be used as example proteins. If any other proteins are present in the input lists, they will be discarded, and the analysis will be performed using only those input proteins that are found in the database.
See the 'Overview' tab for lists of Swiss-Prot identifiers of all proteins in the database.

On the other hand, when the 'single raw sequence' input is used, any valid amino acidic sequence can be used for the analysis. If the sequence is not found in our database, its features will be calculated by FFSearch at runtime - note that this will increase the job's total running time.

FFSearch: server overview and database content.


FFSearch aggregates several tools that can be used to sort proteins present in its database according to their similarity to a given protein, or list of proteins, submitted by the user.

Its database includes proteins for two model organisms, human (Homo sapiens) and mouse (Mus musculus). There are currently 19281 UniProtKB/Swiss-Prot proteins for human, and 15838 UniProtKB/Swiss-Prot proteins for mouse.

The sorting is based on protein sequences only. It can either be done simply using a BLAST-based distance between proteins, or using one of several methods that rely on the description of proteins in terms of their sequence features. The features used here are the same that are employed by FFPred version 2.0.

There feature-based methods for sorting the proteins are either simply distance-based (using Euclidean or Manhattan distance between proteins, calculated in the 235-dimensional space of protein features), or based on the output of one of FFPred2's Support Vector Machines (there is one for each GO term that appears in the vocabulary of FFPred version 2.0), or based on a naive Bayesian classifier - in the latter case, a list of 'negatives' needs to be provided as well, using an appropriate input tab.

The database

Plain text files listing all proteins currently included in FFSearch's database (UniProt/Swiss-Prot AC numbers and Gene Symbols) can be downloaded here for human and mouse respectively.