Methods | Random Forest | Support Vector Machine | Linear Discriminant Analysis | ||||||
---|---|---|---|---|---|---|---|---|---|
Rules | Species | Family | Order | Species | Family | Order | Species | Family | Order |
The main dataset | |||||||||
Common features | 0.588 (7) | 0.306 (9) | 0.253 (9) | 0.571 (7) | 0.296 (9) | 0.270 (9) | 0.615 (7) | 0.323 (9) | 0.340 (9) |
i) Features existing in at least N cities (Top features with the highest ubiquity across all the cities) | |||||||||
N = 15 | 0.459 (13) | 0.365 (23) | 0.375 (17) | 0.463 (13) | 0.365 (23) | 0.355 (17) | 0.512 (13) | 0.372 (23) | 0.379 (17) |
N = 14 | 0.394 (26) | 0.332 (31) | 0.342 (19) | 0.363 (26) | 0.319 (31) | 0.355 (19) | 0.370 (26) | 0.302 (31) | 0.382 (19) |
N = 13 | 0.359 (52) | 0.292 (43) | 0.295 (23) | 0.356 (52) | 0.302 (43) | 0.295 (23) | 0.353 (52) | 0.286 (43) | 0.294 (23) |
N = 12 | 0.365 (75) | 0.309 (54) | 0.285 (29) | 0.348 (75) | 0.289 (54) | 0.295 (29) | 0.321 (75) | 0.249 (54) | 0.242 (29) |
N = 11 | 0.360 (110) | 0.296 (64) | 0.295 (33) | 0.333 (110) | 0.282 (64) | 0.291 (33) | 0.323 (110) | 0.256 (64) | 0.219 (33) |
N = 10 | 0.357 (150) | 0.299 (73) | 0.285 (36) | 0.340 (150) | 0.289 (73) | 0.271 (36) | 0.357 (150) | 0.282 (73) | 0.212 (36) |
N = 9 | 0.317 (188) | 0.292 (86) | 0.311 (43) | 0.317 (188) | 0.302 (86) | 0.281 (43) | 0.393 (188) | 0.262 (86) | 0.199 (43) |
N = 8 | 0.337 (234) | 0.302 (97) | 0.201 (48) | 0.327 (234) | 0.316 (97) | 0.275 (48) | 0.503 (234) | 0.279 (97) | 0.195 (48) |
ii) Top M features with the highest ubiquity across all the samples | |||||||||
M = 10 | 0.486 (10) | 0.421 (10) | 0.425 (10) | 0.500 (10) | 0.435 (10) | 0.439 (10) | 0.524 (10) | 0.475 (10) | 0.455 (10) |
M = 20 | 0.385 (20) | 0.328 (20) | 0.341 (20) | 0.381 (20) | 0.351 (20) | 0.338 (20) | 0.388 (20) | 0.318 (20) | 0.321 (20) |
M = 30 | 0.371 (30) | 0.285 (30) | 0.288 (30) | 0.350 (30) | 0.312 (30) | 0.285 (30) | 0.347 (30) | 0.292 (30) | 0.235 (30) |
M = 50 | 0.291 (50) | 0.309 (50) | 0.271 (50) | 0.288 (50) | 0.286 (50) | 0.265 (50) | 0.271 (50) | 0.256 (50) | 0.195 (50) |
M = 100 | 0.284 (100) | 0.309 (100) | 0.301 (100) | 0.304 (100) | 0.317 (100) | 0.305 (100) | 0.241 (100) | 0.256 (100) | 0.281 (100) |
M = 150 | 0.283 (150) | 0.312 (150) | 0.308 (150) | 0.297 (150) | 0.336 (150) | 0.348 (150) | 0.303 (150) | 0.292 (150) | 0.411 (150) |
iii) Combination of the common features | |||||||||
7 species, 9 families, 9 orders | 0.120 (25) | 0.115 (25) | 0.123 (25) | ||||||
7 species, 9 families | 0.289 (16) | 0.215 (16) | 0.259 (16) | ||||||
7 species, 9 orders | 0.210 (16) | 0.189 (16) | 0.237 (16) | ||||||
9 families, 9 orders | 0.140 (18) | 0.118 (18) | 0.137 (18) | ||||||
The mystery dataset | |||||||||
Common features | 0.582 (8) | 0.339 (18) | 0.304 (15) | 0.618 (8) | 0.429 (18) | 0.339 (15) | 0.655 (8) | 0.304 (18) | 0.321 (15) |
iii) Combination of the common features | |||||||||
8 species, 18 families, 15 orders | 0.268 (41) | 0.339 (41) | 0.446 (41) | ||||||
8 species, 18 families | 0.375 (26) | 0.464 (26) | 0.411 (26) | ||||||
8 species, 15 orders | 0.304 (23) | 0.321 (23) | 0.286 (23) | ||||||
18 families, 15 orders | 0.250 (33) | 0.339 (33) | 0.339 (33) |