Skip to main content

Advertisement

Table 2 Phase I cross-validation results

From: Diverse approaches to predicting drug-induced liver injury using gene-expression profiles

 AccuracySensitivitySpecificityMCC
PC3MCF7PC3MCF7PC3MCF7PC3MCF7
Multilayer Perceptron0.630.650.690.690.320.350.010.03
Gradient Boosting0.670.600.690.670.390.270.04−0.05
K-nearest Neighbor0.680.640.700.720.500.410.110.12
Logistic Regression0.700.620.720.680.570.270.20−0.04
Gaussian Naïve Bayes0.350.350.710.730.320.320.020.03
Random Forest0.660.700.690.720.330.540.010.19
Support Vector Machines0.680.681.001.00
Voting-based Ensemble0.680.670.690.690.440.330.060.01
  1. These results indicate how each classification algorithm performed on the training set after hyperparameter tuning. Overall, the Logistic Regression and Random Forests algorithms performed best,thus we selected these for submission to the challenge. The voting-based ensemble never outperformed all the individual algorithms, yet it never performed worse than all the individual algorithms. Thus we also constructed a submission for the challenge based on this classifier. PC3 and MCF7 are names of prostate- and breast-cancer cell lines, respectively. Bolded values indicate relative strong performance for the three algorithms we selected in Phase I. MCC = Matthews Correlation Coefficient. We were unable to calculate specificity or MCC for the Support Vector Machines algorithm because it predicted all cell lines to have the same class label