Identification of city specific important bacterial signature for the MetaSUB CAMDA challenge microbiome data

Biology Direct

Table 1 Number of samples included in the analyses and their corresponding city and country of provenance

Set	City	Country
Set	City	New Zealand	U.S.A.	Nigeria	Portugal	Chile	Japan	Colombia
Training Main	Auckland (AKL)	15
	Hamilton (HAM)	16
	New York (NYC)		26 + 0 = 26
	Offa (OFA)			20
	Porto (PXO)				60
	Sacramento (SAC)		16 + 18 = 34
	Santiago (SCL)					20
	Tokyo (TKO)						20
Testing Mystery-1	Various (C1)		10 (NCY)	5	10	5
Training Mystery-2	Ilorin (C2)			12
	Lisbon (C3)				12
	Boston (C4)		12
	Bogota	No samples in the training set
Testing Mystery-3	Various (C5)		3 (Boston)	4	4			5 (Bogota)

Table also shows the mystery sets and how the city and sets were internally coded in this work. The column corresponding to US, shows that samples from New York City and Sacramento included additional samples from the pilot analysis but those samples yielded OTUs in this present setting only in Sacramento. Light gray rows are multi-city groups where city of provenance was predicted (testing sets) for all the samples according with the corresponding training model (main or mystery-2). All samples in training sets had a counterpart in the testing sets with the exception of the city of Bogota, which has 5 samples in the testing set (mystery-3) but has no samples in the training set (mystery-2)

ISSN: 1745-6150