Skip to main content

Table 1 Summary of classification algorithms evaluated on the training set

From: Diverse approaches to predicting drug-induced liver injury using gene-expression profiles

Classification algorithm

scikit-learn implementation

Parameters selected after optimization

Multilayer Perceptron

sklearn.neural_network.MLPClassifier

activation = ‘relu’

alpha = 0.0001

batch_size = ‘auto’

beta_1 = 0.9

beta_2 = 0.999

early_stopping = False

epsilon = 1e-08

hidden_layer_sizes = (30,30,30,30,30,30,30,30,30,30)

learning_rate = ‘constant’

learning_rate_init = 0.0376

max_iter = 200

momentum = 0.9

nesterovs_momentum = True

power_t = 0.5

random_state = None

shuffle = True

solver = ‘adam’

tol = 0.0001

validation_fraction = 0.1

warm_start = False

Gradient Boosting

sklearn.ensemble. GradientBoostingClassifier

criterion = ‘friedman_mse’

init = None

learning_rate = 0.31

loss = ‘deviance

max_depth = 3

max_features = None

max_leaf_nodes = None

min_impurity_decrease = 0.0

min_impurity_split = None

min_samples_leaf = 1

min_samples_split = 2

min_weight_fraction_leaf = 0.0

n_estimators = 100

presort = ‘auto’

subsample = 1.0

warm_start = False

K-nearest Neighbor

sklearn.neighbors.KNeighborsClassifier

algorithm = ‘auto’

leaf_size = 30

metric = ‘minkowski’

metric_params = None

n_neighbors = 8

p = 2

weights = ‘distance’

Logistic Regression

sklearn.linear_model.LogisticRegression

C = 1.0

class_weight = None

dual = False

fit_intercept = True

intercept_scaling = 1

max_iter = 100

multi_class = ‘ovr’

penalty = ‘l2’

solver = ‘lbfgs’

tol = 0.0001

warm_start = False

Gaussian Naïve Bayes

sklearn.naive_bayes.GaussianNB

priors = None

Random Forest

sklearn.ensemble. RandomForestClassifier

bootstrap = False

class_weight = None

criterion = ‘gini’

max_depth = 9

min_samples_split = 2

min_samples_leaf = 1

min_weight_fraction_leaf = 0.0

max_features = ‘auto’

max_leaf_nodes = 25

min_impurity_decrease = 0.0

min_impurity_split = None

n_estimators = 25

oob_score = False

warm_start = False

Support Vector Machines

sklearn.svm. SVC

C = 1.0

class_weight = None

coef0 = 0.0

decision_function_shape = ‘ovr’

degree = 3

gamma = ‘auto’

kernel = ‘rbf’

max_iter = − 1

probability = False

shrinking = True

tol = 0.001

Voting-based Ensemble

sklearn.ensemble. VotingClassifier

flatten_transform = True

voting = ‘soft’

weights = ‘None’

  1. In Phase I, we employed 7 classification algorithms and a voting-based method that integrated predictions from the individual classifiers. The first two columns indicate a name for each algorithm and the scikit-learn implementation that we used for each algorithm. Using an ad hoc approach, we evaluated many hyperparameters via cross validation on the training set and selected a hyperparameter combination for each algorithm that performed best. Non-default parameters are bolded. Hyperparameters that do not fundamentally affect algorithm behavior—such as the number of parallel jobs—are not shown