Skip to main content

Table 2 The number of the features selected based on additional rules

From: Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge

Rules

 

Number of Features selected

Species

Family

Order

i) Features existing in at least N cities (Top features with the highest ubiquity across all the cities)

The main dataset

N = 15

13

23

17

N = 14

26

31

19

N = 13

52

43

23

N = 12

75

54

29

N = 11

110

64

33

N = 10

150

73

36

N = 9

188

86

43

N = 8

234

97

48

ii) Top M features with the highest ubiquity across all the samples

M = 10

10

10

10

M = 20

20

20

20

M = 30

30

30

30

M = 50

50

50

50

M = 100

100

100

100

M = 150

150

150

150

iii) Combination of the common features

“species”, “family” and “order”

25 (7 species, 9 families, 9 orders)

“species” and “family”

16 (7 species, 9 families)

“species” and “order”

16 (7 species, 9 orders)

“family” and “order”

18 (9 families, 9 orders)

The mystery dataset

“species”, “family” and “order”

41 (8 species, 18 families, 15 orders)

“species” and “family”

26 (8 species, 18 families)

“species” and “order”

23 (8 species, 15 orders)

“family” and “order”

33 (18 families, 15 orders)