Skip to main content

Table 2 Phase I cross-validation results

From: Diverse approaches to predicting drug-induced liver injury using gene-expression profiles

 

Accuracy

Sensitivity

Specificity

MCC

PC3

MCF7

PC3

MCF7

PC3

MCF7

PC3

MCF7

Multilayer Perceptron

0.63

0.65

0.69

0.69

0.32

0.35

0.01

0.03

Gradient Boosting

0.67

0.60

0.69

0.67

0.39

0.27

0.04

−0.05

K-nearest Neighbor

0.68

0.64

0.70

0.72

0.50

0.41

0.11

0.12

Logistic Regression

0.70

0.62

0.72

0.68

0.57

0.27

0.20

−0.04

Gaussian Naïve Bayes

0.35

0.35

0.71

0.73

0.32

0.32

0.02

0.03

Random Forest

0.66

0.70

0.69

0.72

0.33

0.54

0.01

0.19

Support Vector Machines

0.68

0.68

1.00

1.00

Voting-based Ensemble

0.68

0.67

0.69

0.69

0.44

0.33

0.06

0.01

  1. These results indicate how each classification algorithm performed on the training set after hyperparameter tuning. Overall, the Logistic Regression and Random Forests algorithms performed best,thus we selected these for submission to the challenge. The voting-based ensemble never outperformed all the individual algorithms, yet it never performed worse than all the individual algorithms. Thus we also constructed a submission for the challenge based on this classifier. PC3 and MCF7 are names of prostate- and breast-cancer cell lines, respectively. Bolded values indicate relative strong performance for the three algorithms we selected in Phase I. MCC = Matthews Correlation Coefficient. We were unable to calculate specificity or MCC for the Support Vector Machines algorithm because it predicted all cell lines to have the same class label