Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge

Zhang, Runzhi; Walker, Alejandro R.; Datta, Susmita

doi:10.1186/s13062-020-00284-1

Table 3 The error rate with the leave-one-out cross-validation based on different rules. The number of features selected was retained in brackets

From: Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge

Methods	Random Forest			Support Vector Machine			Linear Discriminant Analysis
Rules	Species	Family	Order	Species	Family	Order	Species	Family	Order
The main dataset
Common features	0.588 (7)	0.306 (9)	0.253 (9)	0.571 (7)	0.296 (9)	0.270 (9)	0.615 (7)	0.323 (9)	0.340 (9)
i) Features existing in at least N cities (Top features with the highest ubiquity across all the cities)
N = 15	0.459 (13)	0.365 (23)	0.375 (17)	0.463 (13)	0.365 (23)	0.355 (17)	0.512 (13)	0.372 (23)	0.379 (17)
N = 14	0.394 (26)	0.332 (31)	0.342 (19)	0.363 (26)	0.319 (31)	0.355 (19)	0.370 (26)	0.302 (31)	0.382 (19)
N = 13	0.359 (52)	0.292 (43)	0.295 (23)	0.356 (52)	0.302 (43)	0.295 (23)	0.353 (52)	0.286 (43)	0.294 (23)
N = 12	0.365 (75)	0.309 (54)	0.285 (29)	0.348 (75)	0.289 (54)	0.295 (29)	0.321 (75)	0.249 (54)	0.242 (29)
N = 11	0.360 (110)	0.296 (64)	0.295 (33)	0.333 (110)	0.282 (64)	0.291 (33)	0.323 (110)	0.256 (64)	0.219 (33)
N = 10	0.357 (150)	0.299 (73)	0.285 (36)	0.340 (150)	0.289 (73)	0.271 (36)	0.357 (150)	0.282 (73)	0.212 (36)
N = 9	0.317 (188)	0.292 (86)	0.311 (43)	0.317 (188)	0.302 (86)	0.281 (43)	0.393 (188)	0.262 (86)	0.199 (43)
N = 8	0.337 (234)	0.302 (97)	0.201 (48)	0.327 (234)	0.316 (97)	0.275 (48)	0.503 (234)	0.279 (97)	0.195 (48)
ii) Top M features with the highest ubiquity across all the samples
M = 10	0.486 (10)	0.421 (10)	0.425 (10)	0.500 (10)	0.435 (10)	0.439 (10)	0.524 (10)	0.475 (10)	0.455 (10)
M = 20	0.385 (20)	0.328 (20)	0.341 (20)	0.381 (20)	0.351 (20)	0.338 (20)	0.388 (20)	0.318 (20)	0.321 (20)
M = 30	0.371 (30)	0.285 (30)	0.288 (30)	0.350 (30)	0.312 (30)	0.285 (30)	0.347 (30)	0.292 (30)	0.235 (30)
M = 50	0.291 (50)	0.309 (50)	0.271 (50)	0.288 (50)	0.286 (50)	0.265 (50)	0.271 (50)	0.256 (50)	0.195 (50)
M = 100	0.284 (100)	0.309 (100)	0.301 (100)	0.304 (100)	0.317 (100)	0.305 (100)	0.241 (100)	0.256 (100)	0.281 (100)
M = 150	0.283 (150)	0.312 (150)	0.308 (150)	0.297 (150)	0.336 (150)	0.348 (150)	0.303 (150)	0.292 (150)	0.411 (150)
iii) Combination of the common features
7 species, 9 families, 9 orders	0.120 (25)			0.115 (25)			0.123 (25)
7 species, 9 families	0.289 (16)			0.215 (16)			0.259 (16)
7 species, 9 orders	0.210 (16)			0.189 (16)			0.237 (16)
9 families, 9 orders	0.140 (18)			0.118 (18)			0.137 (18)
The mystery dataset
Common features	0.582 (8)	0.339 (18)	0.304 (15)	0.618 (8)	0.429 (18)	0.339 (15)	0.655 (8)	0.304 (18)	0.321 (15)
iii) Combination of the common features
8 species, 18 families, 15 orders	0.268 (41)			0.339 (41)			0.446 (41)
8 species, 18 families	0.375 (26)			0.464 (26)			0.411 (26)
8 species, 15 orders	0.304 (23)			0.321 (23)			0.286 (23)
18 families, 15 orders	0.250 (33)			0.339 (33)			0.339 (33)

Back to article page

ISSN: 1745-6150

Contact us

General enquiries: journalsubmissions@springernature.com

Biology Direct

Contact us