HOW TO PREDICT ACETYLATION SITES
Note: The acetylation site predictor will be published in the near future. The corresponding article will describe methods, accuracy and application in more detail.
Our support vector machine based acetylation predictor was trained and tested on around 3600 human in-vivo acetylation sites. The mass spectrometry based identification of these sites is described in detail by Chuna et al. (Science, 2009 August 14).
To train the predictor, we used the identified acetylation sites (positive set) and randomly selected lysines (negative set) along with their surrounding sequences. The negative set contained as many sites as the positive set.
To set a certain stringeny, just scroll over the Precision-Recall-Diagram and choose your own cuttoff. In the result page, the predicted acetylation sites are listed along with the corresponding true positive rate achieved in the training with the yielded score.
INPUT FORMAT
A) SINGLE PROTEIN PREDICTION
If you intend to predict the occurence of acetylation sites on a single protein, you can either insert a single protein sequence without further description or in FASTA format.

The protein sequence:
- should only contain characters that present amino acids; however spaces, new lines and tabs are allowed and will be deleted automatically
- should contain at least 13 amino acids (while there is no upper limit regarding sequence length)
- can contain upper and/or lower case letters

The FASTA header (if available) can contain any characters, but should be separated from the protein sequence via new line.
B) MULTIPLE PROTEIN PREDICTION
If you intend to predict the occurence of acetylation sites on multiple proteins, you have to submit the corresponding sequences in FASTA Format. Restrictions on protein sequence and FASTA header are the same as for single protein submissions as listed above.
You can submit maximum 20 protein sequences.