Skip to main content

Table 2 The number of the features selected based on additional rules

From: Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge

Rules   Number of Features selected
Species Family Order
i) Features existing in at least N cities (Top features with the highest ubiquity across all the cities) The main dataset
N = 15 13 23 17
N = 14 26 31 19
N = 13 52 43 23
N = 12 75 54 29
N = 11 110 64 33
N = 10 150 73 36
N = 9 188 86 43
N = 8 234 97 48
ii) Top M features with the highest ubiquity across all the samples M = 10 10 10 10
M = 20 20 20 20
M = 30 30 30 30
M = 50 50 50 50
M = 100 100 100 100
M = 150 150 150 150
iii) Combination of the common features “species”, “family” and “order” 25 (7 species, 9 families, 9 orders)
“species” and “family” 16 (7 species, 9 families)
“species” and “order” 16 (7 species, 9 orders)
“family” and “order” 18 (9 families, 9 orders)
The mystery dataset
“species”, “family” and “order” 41 (8 species, 18 families, 15 orders)
“species” and “family” 26 (8 species, 18 families)
“species” and “order” 23 (8 species, 15 orders)
“family” and “order” 33 (18 families, 15 orders)