Phosphorylation Site Prediction
We used the phosphorylation sites of our large-scale studies on eukaryotic
phosphorylation sites to construct organism-specific phosphorylation site predictors on the
basis of a support vector machine (SVM). To create a negative set of the same
size, we randomly chose sites from proteins of the same species that were not present in the
phosphoset. SVMs attempt to partition true from false sites by separating them
in a high dimensional vector space with the help of hyperplanes and kernel
functions. We used the primary sequence comprising the site and its twelve
surrounding residues as features.
The accuracies of the prediction based on primary sequences are very high:
E.g., in the case of human:
You can submit any sequence of interest and set a certain cutoff directly on
the Precision-Recall-Curve for the prediction.
The recall reflects the proportion of true positives to the sum of true positives and false negatives, whereas the precision describes the number of true positives out of all predicted positives. Sites that are predicted to be phosphorylated are automatically matched to annotated kinase motifs.
The following paper describes the design and results of the SVM in detail:
'PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites',
Florian Gnad, Shubin Ren, Juergen Cox, Jesper V Olsen, Boris Macek, Mario Oroshi, Matthias Mann (2007). Genome Biology, 8:R250.