Unraveling city-specific signature and identifying sample origin locations for the data from CAMDA MetaSUB challenge

Table 2 The number of the features selected based on additional rules

Rules		Number of Features selected
Rules		Species	Family	Order
i) Features existing in at least N cities (Top features with the highest ubiquity across all the cities)	The main dataset
	N = 15	13	23	17
	N = 14	26	31	19
	N = 13	52	43	23
	N = 12	75	54	29
	N = 11	110	64	33
	N = 10	150	73	36
	N = 9	188	86	43
	N = 8	234	97	48
ii) Top M features with the highest ubiquity across all the samples	M = 10	10	10	10
	M = 20	20	20	20
	M = 30	30	30	30
	M = 50	50	50	50
	M = 100	100	100	100
	M = 150	150	150	150
iii) Combination of the common features	“species”, “family” and “order”	25 (7 species, 9 families, 9 orders)
	“species” and “family”	16 (7 species, 9 families)
	“species” and “order”	16 (7 species, 9 orders)
	“family” and “order”	18 (9 families, 9 orders)
	The mystery dataset
	“species”, “family” and “order”	41 (8 species, 18 families, 15 orders)
	“species” and “family”	26 (8 species, 18 families)
	“species” and “order”	23 (8 species, 15 orders)
	“family” and “order”	33 (18 families, 15 orders)

ISSN: 1745-6150