Skip to main content

Table 3 The error rate with the leave-one-out cross-validation based on different rules. The number of features selected was retained in brackets

From: Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge

Methods

Random Forest

Support Vector Machine

Linear Discriminant Analysis

Rules

Species

Family

Order

Species

Family

Order

Species

Family

Order

The main dataset

Common features

0.588 (7)

0.306 (9)

0.253 (9)

0.571 (7)

0.296 (9)

0.270 (9)

0.615 (7)

0.323 (9)

0.340 (9)

i) Features existing in at least N cities (Top features with the highest ubiquity across all the cities)

  N = 15

0.459 (13)

0.365 (23)

0.375 (17)

0.463 (13)

0.365 (23)

0.355 (17)

0.512 (13)

0.372 (23)

0.379 (17)

  N = 14

0.394 (26)

0.332 (31)

0.342 (19)

0.363 (26)

0.319 (31)

0.355 (19)

0.370 (26)

0.302 (31)

0.382 (19)

  N = 13

0.359 (52)

0.292 (43)

0.295 (23)

0.356 (52)

0.302 (43)

0.295 (23)

0.353 (52)

0.286 (43)

0.294 (23)

  N = 12

0.365 (75)

0.309 (54)

0.285 (29)

0.348 (75)

0.289 (54)

0.295 (29)

0.321 (75)

0.249 (54)

0.242 (29)

  N = 11

0.360 (110)

0.296 (64)

0.295 (33)

0.333 (110)

0.282 (64)

0.291 (33)

0.323 (110)

0.256 (64)

0.219 (33)

  N = 10

0.357 (150)

0.299 (73)

0.285 (36)

0.340 (150)

0.289 (73)

0.271 (36)

0.357 (150)

0.282 (73)

0.212 (36)

  N = 9

0.317 (188)

0.292 (86)

0.311 (43)

0.317 (188)

0.302 (86)

0.281 (43)

0.393 (188)

0.262 (86)

0.199 (43)

  N = 8

0.337 (234)

0.302 (97)

0.201 (48)

0.327 (234)

0.316 (97)

0.275 (48)

0.503 (234)

0.279 (97)

0.195 (48)

ii) Top M features with the highest ubiquity across all the samples

  M = 10

0.486 (10)

0.421 (10)

0.425 (10)

0.500 (10)

0.435 (10)

0.439 (10)

0.524 (10)

0.475 (10)

0.455 (10)

  M = 20

0.385 (20)

0.328 (20)

0.341 (20)

0.381 (20)

0.351 (20)

0.338 (20)

0.388 (20)

0.318 (20)

0.321 (20)

  M = 30

0.371 (30)

0.285 (30)

0.288 (30)

0.350 (30)

0.312 (30)

0.285 (30)

0.347 (30)

0.292 (30)

0.235 (30)

  M = 50

0.291 (50)

0.309 (50)

0.271 (50)

0.288 (50)

0.286 (50)

0.265 (50)

0.271 (50)

0.256 (50)

0.195 (50)

  M = 100

0.284 (100)

0.309 (100)

0.301 (100)

0.304 (100)

0.317 (100)

0.305 (100)

0.241 (100)

0.256 (100)

0.281 (100)

  M = 150

0.283 (150)

0.312 (150)

0.308 (150)

0.297 (150)

0.336 (150)

0.348 (150)

0.303 (150)

0.292 (150)

0.411 (150)

iii) Combination of the common features

  7 species, 9 families, 9 orders

0.120 (25)

0.115 (25)

0.123 (25)

  7 species, 9 families

0.289 (16)

0.215 (16)

0.259 (16)

  7 species, 9 orders

0.210 (16)

0.189 (16)

0.237 (16)

  9 families, 9 orders

0.140 (18)

0.118 (18)

0.137 (18)

The mystery dataset

Common features

0.582 (8)

0.339 (18)

0.304 (15)

0.618 (8)

0.429 (18)

0.339 (15)

0.655 (8)

0.304 (18)

0.321 (15)

iii) Combination of the common features

  8 species, 18 families, 15 orders

0.268 (41)

0.339 (41)

0.446 (41)

  8 species, 18 families

0.375 (26)

0.464 (26)

0.411 (26)

  8 species, 15 orders

0.304 (23)

0.321 (23)

0.286 (23)

  18 families, 15 orders

0.250 (33)

0.339 (33)

0.339 (33)