Fig. 1From: Diverse approaches to predicting drug-induced liver injury using gene-expression profilesWorkflow diagram illustrating analysis approach. In Phase I, we used a single-sample normalization method and gene-level summarization to preprocess the data. Via cross validation on the training set, we evaluated 7 classification algorithms and a soft-voting based ensemble classifier. After receiving class labels for the test set, we performed additional analyses in Phase II. These included using a multi-sample normalization method, batch-effect correction, feature scaling, feature selection, and dimensionality reduction. We also evaluated “hard” voting (treating individual predictions as discrete values), “scaled” voting (using predictions for multiple hyperparameter combinations as input to the voting classifiers), and class weighting (assigning a higher or lower weight to each class label). GBM = Gradient Boosting Machines; LR = Logistic Regression; KNN = K-nearest Neighbors; RF = Random Forests; MLP = Multilayer Perceptron; SVM = Support Vector Machines; GNB = Gaussian Naïve BayesBack to article page