IPC – Isoelectric Point Calculator
© The Author(s). 2016
Received: 13 August 2016
Accepted: 10 October 2016
Published: 21 October 2016
Accurate estimation of the isoelectric point (pI) based on the amino acid sequence is useful for many analytical biochemistry and proteomics techniques such as 2-D polyacrylamide gel electrophoresis, or capillary isoelectric focusing used in combination with high-throughput mass spectrometry. Additionally, pI estimation can be helpful during protein crystallization trials.
Here, I present the Isoelectric Point Calculator (IPC), a web service and a standalone program for the accurate estimation of protein and peptide pI using different sets of dissociation constant (pKa) values, including two new computationally optimized pKa sets. According to the presented benchmarks, the newly developed IPC pKa sets outperform previous algorithms by at least 14.9 % for proteins and 0.9 % for peptides (on average, 22.1 % and 59.6 %, respectively), which corresponds to an average error of the pI estimation equal to 0.87 and 0.25 pH units for proteins and peptides, respectively. Moreover, the prediction of pI using the IPC pKa’s leads to fewer outliers, i.e., predictions affected by errors greater than a given threshold.
The IPC service is freely available at http://isoelectric.ovh.org Peptide and protein datasets used in the study and the precalculated pI for the PDB and some of the most frequently used proteomes are available for large-scale analysis and future development.
This article was reviewed by Frank Eisenhaber and Zoltán Gáspári
KeywordsIsoelectric point Proteomics pKa dissociation constant
Analysis of proteins starts from the heterogeneous mixture (lysate) from which protein fraction needs to be isolated. Next, individual proteins are separated and finally identified. The procedure relies on physicochemical properties of amino acids such as a molecular mass or a charge. Over the years, many techniques were introduced to allow to accomplish the task. One of the oldest, but still widely used technique is 2-D polyacrylamide gel electrophoresis (2D-PAGE) [1, 2], where proteins are separated in two dimensions on a gel and identified using estimated molecular weight and isoelectric point (pI is the pH value at which the net charge of a macromolecule is zero, and therefore its electrophoretic mobility is stopped). Unfortunately, 2D-PAGE suffers from several intrinsic technical problems (e.g., performs poorly for very large, very small, extremely acidic or basic proteins). Therefore, 2D-PAGE has been today replaced in many cases by gel-free techniques such as high-throughput mass spectrometry (MS) [3, 4]. Nevertheless, before the mass spectrometry is applied, the sample is digested by trypsin into short peptides and then fractionated by isoelectric focusing into so called fractions which allows to reduce MS analysis complexity. Although molecular techniques for protein analysis have changed, the interpretation of the results from those techniques in many cases rely on accurate estimations of pI for reference polypeptides.
For polypeptides, pI depends mostly on the acid dissociation constants (pKa) of the ionizable groups of seven charged amino acids: glutamate (δ-carboxyl group), aspartate (ß-carboxyl group), cysteine (thiol group), tyrosine (phenol group), histidine (imidazole side chains), lysine (ε-ammonium group) and arginine (guanidinium group). Additionally, the charge of the amine and carboxyl terminal groups contribute to pI and can greatly affect pI of short peptides . Overall, the net charge of the protein or peptide is strongly related to the solution (buffer) pH and can be approximated using the Henderson-Hasselbalch equation . It should be kept in mind that the values of dissociation constants used in the calculations are usually derived empirically and can vary substantially depending on the experimental setup such as temperature or buffer ionic strength (herein presented method, Isoelectric Point Calculator, is compared to 15 such pKa sets). On the other hand, pKa values or pI can be derived computationally giving the large sets of proteins or peptides for which pI information is known. This is the approach, presented in this study. The problem of computational prediction of pI was already addressed by two other research groups using artificial neural networks (ANN)  and support vector machines (SVM) [8, 9]. Here, I present IPC program which is based on the optimization using a basin-hopping procedure . Presented results shows that IPC overperform all currently, available algorithms.
Comparison to other algorithms
Prediction of isoelectric point on the 25 % testing datasets
Prediction of isoelectric point on the 75 % training datasets
Prediction of isoelectric points for SWISS-2DPAGE and PIP-DB databases
It should be stressed that the relative difference between the performance of different pKa sets is often small and statistically insignificant (e.g., pI calculated by Bjellqvist vs. Dawson pKa sets on protein datasets), but even general knowledge of which pKa sets are better and which should be used for a particular type of data (e.g., protein versus peptides) is not commonly used (Fig. 3, bottom two panels). Furthermore, presented results demonstrate that prediction of pI is easier for short peptides than for proteins as the former contain less charged and modified amino acids (e.g. compare RMSD values between peptide and protein datasets). Similarly, the dataset on which methods are trained and/or evaluated can result in different estimations of RMSD error. For example, Fig. 1 shows that PIP-DB contains multiple outliers and duplicates in comparison to SWISS-2DPAGE. This noise in the data leads to almost a doubling of the RMSD (Table 3). Nevertheless, the method order is usually preserved.
As mentioned earlier, one of the main limitations of IPC is that it uses a nine-parameter model which is a highly simplistic approximation, and does not take into account many aspects of proteins such as post translational modification. It should be noted that posttranslational modifications occur much more frequently in Eukaryotic proteins than in Prokaryotic, thus it is interesting to investigate how accurately pI can be predicted in these two kingdoms separately. As illustrated in Additional file 1: Table S1 all pI prediction methods perform better on prokaryotic proteins. This suggests that when working with Eukaryotic proteins one should keep in mind that pI prediction accuracy can be decreased due possible posttranslational modifications. In such cases other, more specialized programs such as ProMoST can be used when researcher has detailed knowledge about posttranslational modifications.
Additional source of bias may come from the fact that some proteins can have more than one splicing variant, while in herein study only first, major isoform of protein was used. Thus, there is possibility that this may lead to dataset of proteins different than those for which pI was measured. As illustrated in Additional file 1: Table S2 most of analyzed proteins possess only one isoform (2,106 out of 2,254 in the protein dataset, 93.6 % of cases) and when reanalyzing the data using only those proteins the results are virtually identical. It should be stressed that even for proteins having more than one splicing isoform it is highly unlikely that the authors worked with and then reported pI from less abundant, minor isoforms.
Moreover, if is easy to notice that presented here new pKa values are different from those which were derived earlier experimentally. One should remember that even experimental setup can have strong impact on the results. For instance pKa values obtained by Thurkill et al. were measured using alanine pentapeptides with charged residue in the center. This was done to minimalize the contribution from neighboring residues, but this setup is extremely far from the real situation in the proteins (contribution from surrounding side groups of residues which are not alanine, post translational modifications, etc.). Thus, optimized pKa’s can be seen as more precise as they indirectly take into account such complexity. In the Additional file 1: Table S3 one can find average pKa values from previously used scales compared to IPC values. On peptide dataset most of differences is due terminal residues, which could be expected as in the peptides terminal charge can constitute big proportion of overall charge, thus N-terminus pKa value in previous studies was underestimated, while C-terminus pKa was overestimated in comparison to IPC values. On the other hand, for proteins one can notice that the main differences are observed for cysteines reflecting possible contribution from disulfide bridges and for lysine, histidine, and tyrosine which are frequently posttranslationally modified. Moreover, this effect is less abundant for arginine (also frequently modified), but it should be noted that arginine is bigger and contains more charged groups thus most likely modification effect (if exists) is less profound.
New, herein presented pKa sets, optimized computationally, can be considered as important improvement in isoelectric point estimation based only on sequence information. IPC had been compared to numerous methods, including 15 other pKa sets, two machine learning approaches and the consensus. Datasets used in the study were crossvalidated during training and additionally performance was measured on 25 % subsets not used during training. In all cases, IPC produced superior results. For instance, the isoelectric point prediction algorithm performance measured on proteins derived from different databases (Table 3) differ in absolute value (measurements done on different proteins), but the overall order of methods in the benchmark stays almost the same with IPC leading in all cases. The same is true if we divide datasets according to organism (for details see Additional file 1: Figure S1) from which proteins come. As expected, for all methods the prediction accuracy is decreased for Eukaryotic proteins as they can be frequently posttranslationally modified in contrast to Prokaryotic proteins in which posttranslational modifications are less abundant (Additional file 1: Table S1). As there is no information about posttranslational modifications in used databases (SWISS-2DPAGE and PIP-DB) it was not possible to investigate this issue in more detail. Yet, both separation of proteins into Eukaryotic vs. Prokaryotic and detailed analysis of new pKa values shows that the potential bias coming from posttranslationally modification was partially incorporated during optimization procedure which changed pKa values mostly for amino acids frequently modified.
To Authors’ knowledge IPC web server is the only website on which protein isoelectric point can be predicted using so many different pKa values sets including two, new ones presented here. Accurate estimation of isoelectric point is frequently used for identification of proteins during 2D-PAGE and mass spectrometry. Moreover, the knowledge of isoelectric point can be useful during crystallization trials .
Isoelectric point, Henderson–Hasselbalch equation, pKa values for the ionizable groups of proteins
The isoelectric point (pI) is the pH at which the net charge of a protein is zero. For polypeptides, the isoelectric point depends primarily on the dissociation constants (pKa) for the ionizable groups of seven charged amino acids: glutamate (δ-carboxyl group), aspartate (ß-carboxyl group), cysteine (thiol group), tyrosine (phenol group), histidine (imidazole side chains), lysine (ε-ammonium group) and arginine (guanidinium group). Moreover, the charge of the terminal groups (NH2 and COOH) can greatly affect the pI of short peptides. Generally, the Glu, Asp, Cys, and Tyr ionizable groups are uncharged below their pKa and negatively charged above their pKa. Similarly, the His, Lys, and Arg ionizable groups are positively charged below their pKa and uncharged above their pKa . This has certain implications. For example, during electrophoresis, the direction of protein migration on the gel depends on the charge. If the buffer pH (and as a result, the gel pH) is higher than the protein isoelectric point, the particles will migrate to the anode (negative electrode), and if the buffer pH is lower than the isoelectric point, they will migrate to the cathode. When the gel pH and the protein isoelectric point are equal, the proteins stop to migrate.
Overall, the net charge of the protein or peptide is related to the solution (buffer) pH. We can use the Henderson-Hasselbalch equation  to calculate the charge at a certain pH:
The charge of a macromolecule at a given pH is the sum of the positive and negative charges of the individual amino acids given by Eqs. 1 and 2. When the pKa values are set, the only variable in the equations is the pH of the buffer, and by iteratively changing the pH, we can easily calculate the isoelectric point. The result will be almost certainly different than the real isoelectric point because many proteins are chemically modified (e.g., amino acids can be phosphorylated, methylated, acetylated), which can change their charge. The occurrence of cysteines (negative charge), which may oxidize and lose charge when they form disulfide bonds in the protein, is also problematic. Moreover, one must consider the charged residue exposure to solvent, dehydration (Born effect), charge-dipole interactions (hydrogen bonds), and charge-charge interactions .
Most commonly used pKa values for the ionizable groups of proteins. Note that Bjellqvist and ProMoST use different amounts of additional pKa values (not shown), which take into account the relative position of the ionized group (whether it is located on the N- or C- terminus or in the middle). For more details, see References 4 and 5 and the “Theory” section on the IPC web site
The IPC peptide pKa set was optimized using peptides from three, high-throughput experiments:
unmodified 5,758 peptides from Gauci et al.  – peptides from zebrafish lysate fractionated using isoelectric focusing
PHENYX dataset (7,582 peptides)  – peptides from Drosophila Kc167 cell line fractionated using isoelectric focusing on off-gel electrophoresis device
SEQUEST dataset (7,629 peptides)  – peptides from Drosophila Kc167 cell line fractionated using isoelectric focusing on off-gel electrophoresis device
The IPC protein pKa set was optimized using proteins from two databases:
SWISS-2DPAGE, release 19.2 (2,530 proteins)  – based on the literature data about pI linked to UNIPROT accession numbers
- b)PIP-DB (4,947 entries)  – based on literature data, provide pI and sequence information for about half of the records (for details see Table 5).Table 5
Detailed statistics for the available datasets
Initial no. entries
No. entries with sequence and pI
No. entries after removing outliers
No. entries after removing duplicates
Gauci et al.
16,882  
2,324  
First, the raw data from the individual datasets was parsed to the unified fasta format with information about the isoelectric point stored in the headers. Next, datasets consisting of proteins and datasets consisting of peptides were merged into two datasets (IPC_protein and IPC_peptide, respectively). The data was carefully validated, e.g., if multiple experimental pI values were reported, the average was used. The first, major splicing form of the protein (most widely expressed) taken from UniProt  was used for SWISS-2DPAGE. None information about experimental methods used for obtaining isoelectric points or their specificity was used implicitly during this study. Similarly, as the information about post translational modifications (PTMs) was not included directly in SWISS-2DPAGE and PIP-DB, it was not possible to investigate in detail PTMs contribution to pI and they were assumed to be absent. Outliers representing possible annotation errors in databases were removed (proteins with mean standard error (MSE) > 3 between the experimental isoelectric point and the average predicted pI; note that under this cutoff, no peptides were removed; it should be stressed that removed outliers do not differ from other proteins with the respect of amino acid content, predicted protein disorder  and secondary structure , for details see Additional file 1: Table S4). Next, redundant data was removed using CD-HIT  (0.99 sequence identity threshold was used; in this case, it was adequate to use such a high sequence identity because even single mutations in the charged residues can lead to dramatic changes in pI; moreover other sequence identity thresholds gave similar results; data not shown). This step also removed duplicates (multiple entries assigned to the same sequence coming from two different databases). Finally, 25 % of the randomly chosen proteins and peptides were excluded for final testing, and the remaining 75 % were used for 10-fold cross-validated training.
Calculation of the isoelectric point
As noted before, the isoelectric point is determined by iteratively calculating the sum of Eqs. 1 and 2 for the individual charged groups for a given pH. The calculation can be performed exhaustively, but this would not be practical. Instead, the bisection algorithm  is used, which in each iteration halves the search space (initially, the pH is set to 7) and then moves higher or lower by 3.5 (half of 7) depending on the charge. In the next iteration, the pH is changed by 1.75 (half of 3.5), and so on. This process is repeated until the algorithm reaches the desired precision. Bisection improves the speed by 3–4 orders of magnitude, and after approximately a dozen of iterations, the algorithm converges with 0.001 precision. Next, the speed improvement can be obtained by starting the search from a rough approximation of the solution rather than 7 (in this case, a pH of 6.68 was used, which is the average isoelectric point for approximately 318,000 proteins taken from the SwissProt database , 90 % sequence identity threshold was used).
To measure the performance, two metrics were used i.e., the root-mean-square deviation (RMSD) and the number of outliers, defined as pI predictions with a mean standard error (MSE) larger than the given threshold in comparison with the experimental pI. To remove potential outliers, for the protein datasets, an MSE of three was used, and for peptide datasets, an MSE of 0.25 was used. Moreover, for the preliminary analysis, the Pearson correlation was used.
The optimization procedure was designed to obtain nine optimal pKa values (corresponding to the N- and C-termini and the C, D, E, H, K, R, and Y charges). The cost function was defined as the root-mean-square deviation (RMSD) between the true isoelectric points from the available datasets and those calculated using the new pKa set(s). Optimization was performed using a basin-hopping procedure  which uses a standard Monte Carlo algorithm with Metropolis criterion to decide whether to accept a new solution. The previously published pKa values were used as the initial seeds. To limit the search space, a truncated Newton algorithm  was used, with 2 pH unit bounds for the pKa variables (e.g., if the starting point for Cys pKa was 8.5, the solution was allowed in the interval [6.5, 10.5]). The optimization was run iteratively multiple times using intermediate pKa sets until the algorithm converged and no better solutions could be found. To avoid overfitting, both the IPC_protein and IPC_peptide datasets were randomly divided into 75 % training datasets (used for pKa optimization) and 25 % testing datasets (not used during optimization). During training, nested 10-fold cross-validation was used . Thus, the IPC was optimized separately on k-1 partitions and tested on the remaining partition. The training was repeated ten times in all combinations. The resulting pKa sets were averaged. In general, this process resulted in slower convergence of the algorithm and a longer training time but prevented overfitting. Apart from the nine-parameter model (nine pKa values for charged residues) also more advanced models similar to Bjellqvist and ProMoST were also tested. Their performance was on a similar level thus the simpler, nine-parameter model was used in the final version of IPC.
Reviewer’s report 1
Frank Eisenhaber, Bioinformatics, A*STAR’s Biomedical Sciences Institute
The author reviews the state of the art in the pI computation from protein sequence, provides an improved software tool and presents a WWW site with lots of related information, a WWW server and the software download.
Reviewer recommendations to authors
This is a very carefully prepared MS that can be published as is.
Authors’ response: I thank the reviewer for highlighting the general interest of presented tool and his positive reaction to the manuscript
Reviewer’s report 2
Zoltán Gáspári, Pazmany University, Budapest
The manuscript describes a novel set of pKa values for peptides and proteins. The set can be used to estimate the isoelectric point of these macromolecules. The problem is of importance in protein/peptide studies and improvements in the pKa data sets used can be useful.
Authors’ response: I thank the reviewer for his supportive comments of the study and for highlighting the general interest of presented findings. I have made a concerted effort to address all of his concerns.
Reviewer recommendations to authors
It is really interesting that the prediction works better for prokaryotic than for eukaryotic proteins. Can the author perform a bit more detailed analysis on this topic besides pointing out the role of PTMs? Do the worst outliers exhibit characteristic amino acid distributions? for example, eukaryotic proteomes are abundant in intrinsically disordered proteins for which the peptide data set might yield better results in some cases.
It could be interesting if the author could give any further insights into the variations of the pKa values in the sets and especially the divergence of the newly suggested values relative to those in the literature. There is already a discussion of this in the manuscript just before the Conclusions section but as it is both an important and an interesting aspect, the manuscript might benefit from a more detailed analysis of this question.
Authors’ response: I decided that longer discussion about this topic would be too technical and too speculative, and after all it would not change the results and I doubt that this will be interesting for broad readership. Additionally, it would not improve the flow of the manuscript (this is rather off topic). Nevertheless, I also think that it is interesting aspect, thus I added this information to Additional file 1 : Table S3 underscoring the most divergent values and briefly discussing it possible source.
A short description of the origin of the data sets used could also be helpful for the reader.
Authors’ response: The asked information can be found in lines 288–299 in which the Reviewer can read about the organism and technique used for the generation of peptide sets, and references to original studies from which data had been taken. Moreover, all original files for datasets are available as hyperlinks from first column of the Table 5 and also from http://isoelectric.ovh.org/datasets.html – in case if they would be not available in the future from their source urls. For proteins, the information about the experimental technique and the organism is available only partially (see e.g. http://isoelectric.ovh.org/datasets/ch2d19_2.dat ). In any case, those data were not used directly during the datasets construction or optimization not to favor any technique or an organism. For instance, for protein dataset most proteins comes from eukaryotic organisms, 1455 sequences versus 837 sequences coming from Prokaryotes. More detailed data about organism distribution can be seen on the pie plots in the supplement (Additional file 1 : Figure S1). In the nutshell, most of the protein sequences come from human, E. coli, S. aureus, R. norvegicus, M. musculus and yeast. Moreover, PIP-DB in this respect is more diverse having data from multiple organisms. Unfortunately, similar analysis for the methods tag is not possible as this tag is not very informative (for SWISS-2DPAGE 2124/2186 entries are tagged as “MAPPING ON GEL” and for PIP-DB 2007/2427 entries are tagged as different versions of isoelectric focusing).
I think that current, brief description the Reviewer can find in lines 288–299 is sufficient and more detailed descriptions of the methods from the original studies is out of the scope of presented manuscript and would extend the manuscript unnecessarily with minor benefit for the Readers.
The author states that when multiple data were available for the isoelectric point, the average was taken. It would be nice to know how divergent these data were and whether the author has any hints on whether this affects the performance in any detectable way.
Authors’ response: The information about the divergence is available in the headers of the fasta files e.g. http://ipc.netmark.pl/datasets/pip_ch2d19_2_1st_isoform_outliers_3units_cleaned_0.99.fasta contains:
This record comes from SWISS-2DPAGE database and the header means that two pI measurements are known: 5.17 and 5.27. Moreover, it can be noticed that reported molecular weights (55.1 and 54.8 kDa) differ from predicted 53.9 kDa which could indicate that this sequence contains post translational modifications which may or may not influence the isoelectric point (neither SWISS-2DPAGE or PIP-DB database contains information about the modifications), but indirectly it can be seen by molecular weight increase, other possible bias may come from the technique used to measuring pI and molecular weight or any random factors between measurements.
include as many measurements as possible preferably coming from different databases
use the average of the measurements
as even after averaging the pI for some of sequences deviates highly from the average predicted pI (Fig. 1 ) I decided to investigate how much this could be explained by possible annotation errors in the databases. I re-checked randomly selected records with the biggest deviation between experimental and theoretical pI and their source publications until I stop to find obvious annotation errors (in this way I set a threshold on MSE > 3 for removing outliers).
In the 10-fold cross-validation process, how divergent were the resulting pKa sets that were averaged? What is the relation of this divergence to the diversity in the other data sets?
Authors’ response: From the observed divergence I would rather speculate that the landscape of the search space is quite flat with multiple local minima. There are many possible 9 sets of pKa values which produce only slightly worse results. Therefore, the optimization was run 2,000 times to allow for exploring the search space in the different places and the local minimum was refined by bashing-hopping.
Authors’ response: done as suggested, in both cases adding extra pKa values not included in original studies improved the results. Having the initial results from Patrickios, six-parameter model it was obvious that skipping Arg or terminal charges will have detrimental effect on the performance thus I decided to add them ad hoc, these values were taken as the average from few scales or most similar scale I know at the time of doing that (initially there were only 6–7 scales used, but over the years I implemented more and more scales).
The language of the manuscript needs careful revision. Most of the concepts can be deduced from the present version but the phrasing should be done with more care. So, although I think that the paper can be understood in its present form, I strongly recommend extensive language editing before final publication. Some examples: - “nine parametric model” for me would mean nine distinct models which are all parametric. Maybe the term “nine-parameter model” would be more appropriate (meaning a single model with 9 parameters). - “Basin-Hopping”: as this does not refer to names, simply “basin-hopping” can be written. - page 8, lines 203 and 233: instead of positively and negatively charged macromolecules, the author means residues here?
For the additional FASTA files some explanation of the information in the headers would be welcome.
Authors’ response: As requested I added in all FASTA files more information at the beginning about the content of the headers and how they should be interpreted (available as hyperlinks from three, right columns in the Table 5 and also from http://isoelectric.ovh.org/datasets.html 19 files in total). Although, the headers could be simplified and in current version they may have different form depending from which source they come from I decided to leave them as they are (even if sometimes they seems to be hard to understand immediately) as it is easy to check the correctness of the parsing in comparison to original files.
2-D polyacrylamide gel electrophoresis
Artificial neural networks
Isoelectric point calculator
Mean standard error
Protein data bank
- pI :
- pKa :
- R2 :
Support vector machines
LPK acknowledges all authors of previous works related to different pKa sets and datasets, especially developers of SWISS-2DPAGE database. The author thanks also Yasset Perez-Riverol for assistance with pIR package and Vladlen Skvortsov for assistance with pIPredict program. Additionally, LPK would like to thank all members of the Soeding lab for fruitful discussions.
Availability of data and materials
LPK conceived and developed the study, analyzed and interpreted the experiments, and wrote the article.
IPC usage is limited to academic and non-profit users as described in http://isoelectric.ovh.org/license.txt.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- O’Farrell PH. High resolution two-dimensional electrophoresis of proteins. J Biol Chem. 1975;250(10):4007–21.PubMedPubMed CentralGoogle Scholar
- Klose J. Protein mapping by combined isoelectric focusing and electrophoresis of mouse tissues. Humangenetik. 1975;26(3):231–43.PubMedGoogle Scholar
- Righetti PG, Castagna A, Herbert B, Reymond F, Rossier JS. Prefractionation techniques in proteome analysis. Proteomics. 2003;3(8):1397–407.View ArticlePubMedGoogle Scholar
- Heller M, Ye M, Michel PE, Morier P, Stalder D, Jünger MA, Aebersold R, Reymond F, Rossier JS. Added value for tandem mass spectrometry shotgun proteomics data validation through isoelectric focusing of peptides. J Proteome Res. 2005;4(6):2273–82.View ArticlePubMedGoogle Scholar
- Pace CN, Grimsley GR, Scholtz JM. Protein ionizable groups: pK values and their contribution to protein stability and solubility. J Biol Chem. 2009;284(20):13285–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Po HN, Senozan NM. The Henderson-Hasselbalch Equation: Its History and Limitations. J Chem Educ. 2001;78(11):1499.View ArticleGoogle Scholar
- Skvortsov VS, Alekseytchuk NN, Khudyakov DV, Romero Reyes IV. pIPredict: a computer tool for prediction of isoelectric points of peptides and proteins. Biochem (Mosc) Suppl Series B: Biomed Chem. 2015;9(3):296–303.View ArticleGoogle Scholar
- Perez-Riverol Y, Audain E, Millan A, Ramos Y, Sanchez A, Vizcaino JA, Wang R, Muller M, Machado YJ, Betancourt LH, et al. Isoelectric point optimization using peptide descriptors and support vector machines. J Proteome. 2012;75(7):2269–74.View ArticleGoogle Scholar
- Audain E, Ramos Y, Hermjakob H, Flower DR, Perez-Riverol Y. Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences. Bioinformatics. 2016;32(6):821–7.View ArticlePubMedGoogle Scholar
- Wales DJ, Doye JP. Global optimization by basin-hopping and the lowest energy structures of Lennard-Jones clusters containing up to 110 atoms. J Phys Chem A. 1997;101(28):5111–6.View ArticleGoogle Scholar
- Kiraga J, Mackiewicz P, Mackiewicz D, Kowalczuk M, Biecek P, Polak N, Smolarczyk K, Dudek MR, Cebrat S. The relationships between the isoelectric point and: length of proteins, taxonomy and ecology of organisms. BMC Genomics. 2007;8(1):163.View ArticlePubMedPubMed CentralGoogle Scholar
- Weiller GF, Caraux G, Sylvester N. The modal distribution of protein isoelectric points reflects amino acid properties rather than sequence evolution. Proteomics. 2004;4(4):943–9.View ArticlePubMedGoogle Scholar
- Oren A. Microbial life at high salt concentrations: phylogenetic and metabolic diversity. Saline Syst. 2008;4(1):1–13.View ArticleGoogle Scholar
- Kirkwood J, Hargreaves D, O’Keefe S, Wilson J. Using isoelectric point to determine the pH for initial protein crystallization trials. Bioinformatics. 2015;31(9):1444–51.View ArticlePubMedPubMed CentralGoogle Scholar
- Grimsley GR, Scholtz JM, Pace CN. A summary of the measured pK values of the ionizable groups in folded proteins. Protein Sci. 2009;18(1):247–51.PubMedGoogle Scholar
- Bjellqvist B, Basse B, Olsen E, Celis JE. Reference points for comparisons of two‐dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions. Electrophoresis. 1994;15(1):529–39.View ArticlePubMedGoogle Scholar
- Halligan BD, Ruotti V, Jin W, Laffoon S, Twigger SN, Dratz EA. ProMoST (Protein Modification Screening Tool): a web-based tool for mapping protein modifications on two-dimensional gels. Nucleic Acids Res. 2004;32 suppl 2:W638–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Gauci S, van Breukelen B, Lemeer SM, Krijgsveld J, Heck AJ. A versatile peptide pI calculator for phosphorylated and N-terminal acetylated peptides experimentally tested using peptide isoelectric focusing. Proteomics. 2008;8(23-24):4898–906.View ArticlePubMedGoogle Scholar
- Hoogland C, Mostaguir K, Sanchez JC, Hochstrasser DF, Appel RD. SWISS‐2DPAGE, ten years later. Proteomics. 2004;4(8):2352–6.View ArticlePubMedGoogle Scholar
- Bunkute E, Cummins C, Crofts FJ, Bunce G, Nabney IT, Flower DR. PIP-DB: the protein isoelectric point database. Bioinformatics. 2015;31(2):295–6.View ArticlePubMedGoogle Scholar
- The UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43(D1):D204–12.
- Yang ZR, Thomson R, McNeil P, Esnouf RM. RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics. 2005;21(16):3369–76.View ArticlePubMedGoogle Scholar
- McGuffin LJ, Bryson K, Jones DT. The PSIPRED protein structure prediction server. Bioinformatics. 2000;16(4):404–5.View ArticlePubMedGoogle Scholar
- Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.View ArticlePubMedGoogle Scholar
- Chapra SC, Canale RP. Numerical methods for engineering. New York: McGraw-Hill Companies, Inc; 2007. http://www.mheducation.com/highered/product/numerical-methods-engineers-chapra-canale/M007339792X.html.
- The UniProt Consortium. The universal protein resource (UniProt) in 2010. Nucleic Acids Res. 2010;38 suppl 1:D142–8.
- Byrd RH, Lu P, Nocedal J, Zhu C. A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput. 1995;16(5):1190–208.View ArticleGoogle Scholar
- Bengio Y, Grandvalet Y. No unbiased estimator of the variance of K-Fold cross-validation. J Mach Learn Res. 2004;5:1089–105.Google Scholar
- Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000;16(6):276–7.View ArticlePubMedGoogle Scholar
- Tabb DL, McDonald WH, Yates JR. DTASelect and contrast: tools for assembling and comparing protein identifications from shotgun proteomics. J Proteome Res. 2002;1(1):21–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Solomons TG. Organic chemistry. USA: John Wiley & Sons; 1992. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-EHEP003468.html.
- Sillero A, Ribeiro JM. Isoelectric points of proteins: theoretical determination. Anal Biochem. 1989;179(2):319–25.View ArticlePubMedGoogle Scholar
- Rodwell JD. Heterogeneity of component bands in isoelectric focusing patterns. Anal Biochem. 1982;119(2):440–9.View ArticlePubMedGoogle Scholar
- Patrickios CS, Yamasaki EN. Polypeptide amino acid composition and isoelectric point. II. Comparison between experiment and theory. Anal Biochem. 1995;231(1):82–91.View ArticlePubMedGoogle Scholar
- Nelson DL, Lehninger AL, Cox MM. Lehninger principles of biochemistry.New York: Macmillan learning; 2008. http://www.macmillanlearning.com/Catalog/product/lehningerprinciplesofbiochemistry-sixthedition-nelson#tab.
- Toseland CP, McSparron H, Davies MN, Flower DR. PPD v1.0—an integrated, web-accessible database of experimentally determined protein pK(a) values. Nucleic Acids Res. 2006;34(Database issue):D199–203.View ArticlePubMedGoogle Scholar
- Thurlkill RL, Grimsley GR, Scholtz JM, Pace CN. pK values of the ionizable groups of proteins. Protein Sci. 2006;15(5):1214–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Nozaki Y, Tanford C. The solubility of amino acids and two glycine peptides in aqueous ethanol and dioxane solutions: establishment of a hydrophobicity scale. J Biol Chem. 1971;246(7):2211–7.PubMedGoogle Scholar
- Dawson RMC. Data for biochemical research. Oxford: Clarendon Press; 1989. https://global.oup.com/academic/product/data-for-biochemical-research-9780198552994?cc=de&lang=en&.
- Cherry JM, Hong EL, Amundsen C, Balakrishnan R, Binkley G, Chan ET, Christie KR, Costanzo MC, Dwight SS, Engel SR, et al. Saccharomyces genome database: the genomics resource of budding yeast. Nucleic Acids Res. 2012;40(D1):D700–5.View ArticlePubMedGoogle Scholar