Skip to main content

Table 3 The error rate with the leave-one-out cross-validation based on different rules. The number of features selected was retained in brackets

From: Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge

Methods Random Forest Support Vector Machine Linear Discriminant Analysis
Rules Species Family Order Species Family Order Species Family Order
The main dataset
Common features 0.588 (7) 0.306 (9) 0.253 (9) 0.571 (7) 0.296 (9) 0.270 (9) 0.615 (7) 0.323 (9) 0.340 (9)
i) Features existing in at least N cities (Top features with the highest ubiquity across all the cities)
  N = 15 0.459 (13) 0.365 (23) 0.375 (17) 0.463 (13) 0.365 (23) 0.355 (17) 0.512 (13) 0.372 (23) 0.379 (17)
  N = 14 0.394 (26) 0.332 (31) 0.342 (19) 0.363 (26) 0.319 (31) 0.355 (19) 0.370 (26) 0.302 (31) 0.382 (19)
  N = 13 0.359 (52) 0.292 (43) 0.295 (23) 0.356 (52) 0.302 (43) 0.295 (23) 0.353 (52) 0.286 (43) 0.294 (23)
  N = 12 0.365 (75) 0.309 (54) 0.285 (29) 0.348 (75) 0.289 (54) 0.295 (29) 0.321 (75) 0.249 (54) 0.242 (29)
  N = 11 0.360 (110) 0.296 (64) 0.295 (33) 0.333 (110) 0.282 (64) 0.291 (33) 0.323 (110) 0.256 (64) 0.219 (33)
  N = 10 0.357 (150) 0.299 (73) 0.285 (36) 0.340 (150) 0.289 (73) 0.271 (36) 0.357 (150) 0.282 (73) 0.212 (36)
  N = 9 0.317 (188) 0.292 (86) 0.311 (43) 0.317 (188) 0.302 (86) 0.281 (43) 0.393 (188) 0.262 (86) 0.199 (43)
  N = 8 0.337 (234) 0.302 (97) 0.201 (48) 0.327 (234) 0.316 (97) 0.275 (48) 0.503 (234) 0.279 (97) 0.195 (48)
ii) Top M features with the highest ubiquity across all the samples
  M = 10 0.486 (10) 0.421 (10) 0.425 (10) 0.500 (10) 0.435 (10) 0.439 (10) 0.524 (10) 0.475 (10) 0.455 (10)
  M = 20 0.385 (20) 0.328 (20) 0.341 (20) 0.381 (20) 0.351 (20) 0.338 (20) 0.388 (20) 0.318 (20) 0.321 (20)
  M = 30 0.371 (30) 0.285 (30) 0.288 (30) 0.350 (30) 0.312 (30) 0.285 (30) 0.347 (30) 0.292 (30) 0.235 (30)
  M = 50 0.291 (50) 0.309 (50) 0.271 (50) 0.288 (50) 0.286 (50) 0.265 (50) 0.271 (50) 0.256 (50) 0.195 (50)
  M = 100 0.284 (100) 0.309 (100) 0.301 (100) 0.304 (100) 0.317 (100) 0.305 (100) 0.241 (100) 0.256 (100) 0.281 (100)
  M = 150 0.283 (150) 0.312 (150) 0.308 (150) 0.297 (150) 0.336 (150) 0.348 (150) 0.303 (150) 0.292 (150) 0.411 (150)
iii) Combination of the common features
  7 species, 9 families, 9 orders 0.120 (25) 0.115 (25) 0.123 (25)
  7 species, 9 families 0.289 (16) 0.215 (16) 0.259 (16)
  7 species, 9 orders 0.210 (16) 0.189 (16) 0.237 (16)
  9 families, 9 orders 0.140 (18) 0.118 (18) 0.137 (18)
The mystery dataset
Common features 0.582 (8) 0.339 (18) 0.304 (15) 0.618 (8) 0.429 (18) 0.339 (15) 0.655 (8) 0.304 (18) 0.321 (15)
iii) Combination of the common features
  8 species, 18 families, 15 orders 0.268 (41) 0.339 (41) 0.446 (41)
  8 species, 18 families 0.375 (26) 0.464 (26) 0.411 (26)
  8 species, 15 orders 0.304 (23) 0.321 (23) 0.286 (23)
  18 families, 15 orders 0.250 (33) 0.339 (33) 0.339 (33)