Diverse approaches to predicting drug-induced liver injury using gene-expression profiles

Sumsion, G. Rex; Bradshaw, Michael S.; Beales, Jeremy T.; Ford, Emi; Caryotakis, Griffin R. G.; Garrett, Daniel J.; LeBaron, Emily D.; Nwosu, Ifeanyichukwu O.; Piccolo, Stephen R.

doi:10.1186/s13062-019-0257-6

Biology Direct

Table 1 Summary of classification algorithms evaluated on the training set

From: Diverse approaches to predicting drug-induced liver injury using gene-expression profiles

Classification algorithm	scikit-learn implementation	Parameters selected after optimization
Multilayer Perceptron	sklearn.neural_network.MLPClassifier	activation = ‘relu’ alpha = 0.0001 batch_size = ‘auto’ beta_1 = 0.9 beta_2 = 0.999 early_stopping = False epsilon = 1e-08 hidden_layer_sizes = (30,30,30,30,30,30,30,30,30,30) learning_rate = ‘constant’ learning_rate_init = 0.0376 max_iter = 200 momentum = 0.9 nesterovs_momentum = True power_t = 0.5 random_state = None shuffle = True solver = ‘adam’ tol = 0.0001 validation_fraction = 0.1 warm_start = False
Gradient Boosting	sklearn.ensemble. GradientBoostingClassifier	criterion = ‘friedman_mse’ init = None learning_rate = 0.31 loss = ‘deviance max_depth = 3 max_features = None max_leaf_nodes = None min_impurity_decrease = 0.0 min_impurity_split = None min_samples_leaf = 1 min_samples_split = 2 min_weight_fraction_leaf = 0.0 n_estimators = 100 presort = ‘auto’ subsample = 1.0 warm_start = False
K-nearest Neighbor	sklearn.neighbors.KNeighborsClassifier	algorithm = ‘auto’ leaf_size = 30 metric = ‘minkowski’ metric_params = None n_neighbors = 8 p = 2 weights = ‘distance’
Logistic Regression	sklearn.linear_model.LogisticRegression	C = 1.0 class_weight = None dual = False fit_intercept = True intercept_scaling = 1 max_iter = 100 multi_class = ‘ovr’ penalty = ‘l2’ solver = ‘lbfgs’ tol = 0.0001 warm_start = False
Gaussian Naïve Bayes	sklearn.naive_bayes.GaussianNB	priors = None
Random Forest	sklearn.ensemble. RandomForestClassifier	bootstrap = False class_weight = None criterion = ‘gini’ max_depth = 9 min_samples_split = 2 min_samples_leaf = 1 min_weight_fraction_leaf = 0.0 max_features = ‘auto’ max_leaf_nodes = 25 min_impurity_decrease = 0.0 min_impurity_split = None n_estimators = 25 oob_score = False warm_start = False
Support Vector Machines	sklearn.svm. SVC	C = 1.0 class_weight = None coef0 = 0.0 decision_function_shape = ‘ovr’ degree = 3 gamma = ‘auto’ kernel = ‘rbf’ max_iter = − 1 probability = False shrinking = True tol = 0.001
Voting-based Ensemble	sklearn.ensemble. VotingClassifier	flatten_transform = True voting = ‘soft’ weights = ‘None’

In Phase I, we employed 7 classification algorithms and a voting-based method that integrated predictions from the individual classifiers. The first two columns indicate a name for each algorithm and the scikit-learn implementation that we used for each algorithm. Using an ad hoc approach, we evaluated many hyperparameters via cross validation on the training set and selected a hyperparameter combination for each algorithm that performed best. Non-default parameters are bolded. Hyperparameters that do not fundamentally affect algorithm behavior—such as the number of parallel jobs—are not shown

Back to article page

ISSN: 1745-6150

Contact us

General enquiries: journalsubmissions@springernature.com