Method
|
SWISS-2DPAGE
|
Method
|
PIP-DB
|
---|
RMSD
|
%
|
Outliers
|
RMSD
|
%
|
Outliers
|
---|
IPC_protein
|
0.476
|
0
|
10
|
IPC_protein
|
1.019
|
0
|
141
|
Toseland
|
0.521
|
10.9
|
18
|
Toseland
|
1.086
|
16.7
|
153
|
Bjellqvist
|
0.590
|
30.0
|
31
|
Bjellqvist
|
1.085
|
16.3
|
150
|
ProMoST
|
0.597
|
32.1
|
29
|
Dawson
|
1.081
|
15.3
|
161
|
Dawson
|
0.599
|
32.5
|
37
|
Wikipedia
|
1.087
|
16.9
|
163
|
Wikipedia
|
0.619
|
39.0
|
35
|
Rodwell
|
1.095
|
19.1
|
167
|
Rodwell
|
0.628
|
41.7
|
37
|
Grimsley
|
1.121
|
26.6
|
170
|
Grimsley
|
0.572
|
24.5
|
21
|
Solomons
|
1.103
|
21.4
|
159
|
Solomons
|
0.635
|
44.2
|
44
|
Lehninger
|
1.102
|
21.1
|
161
|
Lehninger
|
0.640
|
45.8
|
44
|
ProMOST
|
1.111
|
23.5
|
150
|
Nozaki
|
0.679
|
59.4
|
43
|
pIR
|
1.152
|
35.8
|
184
|
Thurlkill
|
0.691
|
63.9
|
39
|
Nozaki
|
1.165
|
39.9
|
170
|
DTASelect
|
0.677
|
58.8
|
35
|
Thurlkill
|
1.180
|
44.9
|
176
|
EMBOSS
|
0.724
|
76.9
|
49
|
DTASelect
|
1.186
|
47.1
|
173
|
Sillero
|
0.721
|
75.5
|
50
|
pIPredict
|
1.195
|
50.0
|
182
|
pIR
|
0.761
|
92.4
|
37
|
EMBOSS
|
1.198
|
51.2
|
191
|
pIPredict
|
0.768
|
95.9
|
33
|
Sillero
|
1.202
|
52.4
|
187
|
Patrickios
|
1.600
|
1227.9
|
243
|
Patrickios
|
2.623
|
3918
|
604
|
Avg_pIa
|
0.614
|
37.1
|
32
|
Avg_pIa
|
1.101
|
20.9
|
160
|
-
aAverage from all pKa sets without the Patrickios (highly simplified pKa set) and IPC sets. Note, that the average pI is calculated on the level of individual protein or peptide
- Both SWISS-2DPAGE and PIP-DB were cleaned of outliers (MSE > 3 between experimental pI and average predicted pI) and clustered by CD-HIT with 99 % sequence identity threshold, as described in the Materials and Methods (982 and 1,307 proteins, respectively), but they were not divided into training and testing datasets. Thus, the results for the IPC sets are slightly overestimated, but this is not relevant, as shown by the comparison of Tables 1 and 2
- Outliers correspond to the number of predictions for which the difference between the experimental pI and the predicted pI exceeded the threshold of an MSE of 3 for the protein dataset