Transcriptional diversity in specific synaptic gene sets discriminates cortical neuronal identity
Biology Direct volume 18, Article number: 22 (2023)
Synapse diversity has been described from different perspectives, ranging from the specific neurotransmitters released, to their diverse biophysical properties and proteome profiles. However, synapse diversity at the transcriptional level has not been systematically identified across all synapse populations in the brain. To quantify and identify specific synaptic features of neuronal cell types we combined the SynGO (Synaptic Gene Ontology) database with single-cell RNA sequencing data of the mouse neocortex. We show that cell types can be discriminated by synaptic genes alone with the same power as all genes. The cell type discriminatory power is not equally distributed across synaptic genes as we could identify functional categories and synaptic compartments with greater cell type specific expression. Synaptic genes, and specific SynGO categories, belonged to three different types of gene modules: gradient expression over all cell types, gradient expression in selected cell types and cell class- or type-specific profiles. This data provides a deeper understanding of synapse diversity in the neocortex and identifies potential markers to selectively identify synapses from specific neuronal populations.
Synapses are the information processing units of the brain and function in a use-dependent manner [1, 2]. Synapses are diverse with regards to subcellular targets and physiological properties  and are central in information processing and storage theories . Moreover, brain regions involved in higher cognitive functions, such as the hippocampus and neocortex, contain greater synapse diversity . Synapse diversity also overlaps with the connectivity patterns (connectome) between brain areas associated to different functions . Thus, understanding synapse diversity is crucial for gaining insights into the mechanisms for information processing in the brain.
Neurotransmitters and biophysical properties have traditionally been used for classification of synapses [2, 4]. New technologies, including diverse ‘omics’, are now uncovering additional layers of complexity and diversity on the molecular signatures of synapses . Proteome differences between synapse types correlate with functional diversity (strength, kinetics, or synaptic plasticity) . These different molecular profiles include, for example, scaffold proteins PSD95 and SAP102 [5, 7], and AMPA-type glutamate receptors (AMPARs)  for postsynaptic terminals of excitatory cells; and Gephyrin (GPHN) and Collybistin (ARHGEF9) scaffold proteins, and the GABAA receptors (GABAAR) for inhibitory postsynaptic sites . On the presynaptic side, synaptotagmins 1 and 2, involved in calcium-dependent vesicle exocytosis, are differentially expressed between synapse types . While most previous work is centered on inhibitory and excitatory synapses, recent studies have pointed out the expression of synaptic genes as correlated with neuronal diversity in subpopulations of transcriptomic cell types [10, 11].
Here, we aim to systematically identify how synaptic gene expression specifies the diversity of neuronal cell types in mouse neocortex using single cell transcriptomics data. Combining the expert-curated, evidence-based SynGO synaptic ontology  and single cell expression data we could observe that expression of synaptic genes presents a striking diversity. Remarkably, specific biological function and synaptic component gene categories contained significantly high diversity and discrete modules of synaptic genes exhibited different modes of variability revealing that synapse diversity is organized at different levels.
The single cell RNA-sequencing dataset published by Tasic et al.  together with the information available in the SynGO  database were retrieved for the study of the expression of synaptic genes in the neocortex. The scRNA-seq dataset used was generated using the smart-seq RNA-sequencing technique. The tissue used was mouse primary visual cortex and anterior lateral motor cortex and up to 133 transcriptomic cell types and 16 cell classes were identified. The described cell classes include glutamatergic neurons, labeled according to their preferential layer of residence (for example Layer 4 neurons are labelled as L4) and their projection pattern (intratelencephalic, IT; pyramidal tract, PT; near-projecting, NP; and corticothalamic, CT) and GABAergic neurons labelled with their predominant expressed gene including: Sst, Pvalb, Vip, Lamp5, Sncg and Serpinf1 . The first release of the expert-curated synaptic gene ontology, SynGO1.0 was used . SynGO1.0 contains 1112 unique human genes annotated to 2918 terms hierarchically organized and divided into ‘Cellular Components’ and ‘Biological Processes’ related to the synapse. These annotated synaptic genes encode evidenced proteins that localize to synaptic compartments and contribute to synaptic functions.
Pre-processing and visualization
The scRNA-seq data were filtered to keep only the cells belonging to the classes ‘GABAergic’ and ‘Glutamatergic’ (n = 22,439 cells) and expression data of the synaptic genes included in SynGO (1049 genes). Three subsets of the data were used for the downstream analysis separately by filtering the genes in the original dataset according to the following gene sets: all synaptic genes present in the SynGO database, SynGO presynaptic genes and SynGO postsynaptic genes.
Each of the filtered datasets was pre-processed with the standard protocol used by the Seurat  R package as follows: log-normalization and scaling (scaling factor = 10,000) of the raw count data, identification of highly variable genes, PCA dimensionality reduction, selection of significant principal components (PCs) by the Jackstraw procedure, and tSNE (t-distributed Stochastic Neighbor Embedding) dimensionality reduction/visualization (perplexity = 50). The significant PCs determined for each data subset were: 60 PCs for the full dataset, 42 PCs for the dataset with all synaptic genes and 21 PCs for both datasets with the pre- and postsynaptic genes. No quality filtering was performed on the cells since the dataset used was already passing the quality criteria in Tasic et al. . The tSNE embedding of the data were color coded with the cluster identities determined by Tasic et al. .
Synaptic function and localisation (SynGO annotations) underlying diversity
MetaNeighbor  was used to measure the power of each of the SynGO annotations to discriminate between different cell types. For each gene set (or SynGO annotated term), AUROC (Area Under the Receiver Operator Curve) scores were calculated for each of the 16 described cell types. To do so, random samples of the dataset were taken to train (2/3) the algorithm and test (1/3) the gene sets. Only those gene sets with at least 2 genes were used in this analysis. The result is an AUROC score for each gene set that can be interpreted as the performance of the gene set for the task of identifying each cell type, with 0.5 being equivalent to a random guess.
To calculate the statistical significance of the AUROC scores, the performance of random gene sets in MetaNeighbor was compared to that obtained with the SynGO annotations by generating random gene sets of the same size. For each gene set size in SynGO 10,000 gene sets were generated by sampling from all the genes expressed in the original dataset, as well as all genes found in SynGO. For each of these randomly generated gene sets the ‘fast_version’ of MetaNeighbor was used. Firstly, the AUROC scores were used to compare the average performance of random gene sets and random synaptic gene sets of each size. Secondly, the random synaptic gene sets were used to calculate the statistical significance of the SynGO annotation scorings by calculating an empirical p value. As indicated in Eq. 1, this is done by calculating the fraction of scores from the randomized gene sets that are higher than the scores of the SynGO annotations. To calculate this as the overall score across all cell types, the score used for the p value calculation was the sum of the AUROC scores in the 16 cell types. Likewise, we calculated the empirical p value of SynGO annotations performing significantly worse than random gene sets (fraction of AUROC scores from randomized synaptic gene sets lower than the scores of SynGO annotations).
where N = total number of random permutations (10,000)
To calculate the specificity per gene in Additional file 1: Table S1 we ranked the synaptic genes according to the cell type specificity of their expression by calculating the proportion of expression of each gene in each cell type similarly to Skene et al. . To do so, we first normalized the gene expression in each cell type by aggregating the counts for each gene across cells belonging to the same cell type, scaling to 1 million counts and then dividing by the total counts in all cells in that cell type. Then the specificity score was calculated by dividing the normalized expression of each gene in every cell type by the total expression if the gene in every cell type. The list of synaptic genes was then ranked by the maximum score of each gene, indicating that top ranked genes have a higher cell type specificity.
Quantification of cell type diversity encoded by synaptic genes
To measure and compare the cell type diversity observed with different gene sets MetaNeighbor analysis was performed as described in the previous section. The quantified gene sets included the most high variable genes among: all genes in the dataset, non-synaptic genes (defined as all genes excluding the genes in SynGO), all synaptic genes, presynaptic genes, postsynaptic genes and mitochondrial genes (all genes included in the dataset and annotated in MitoCarta ). AUROC scores were calculated for each gene set and cell type, as well as for every cell class. Wilcoxon rank test (followed by false discovery rate [FDR] correction) was used to determine statistically different performance of each pair of gene sets.
Synapse gene correlation network analysis
Weighted gene correlation network analysis (WCGNA) was used to investigate modules of synaptic genes in the transcriptional network of the dataset. In brief, the standard pipeline from the WGCNA  R package was used to perform hierarchical clustering on the distance between every gene pair, calculated as 1-TOM (topological overlap matrix). To generate the TOM matrix, the co-expression similarity matrix is raised to a soft thresholding power (adjacency) that approximates a scale-free topology while keeping the mean connectivity of the network (ß = 4) . Finally, the clustered genes are grouped into modules of highly interconnected genes using a dynamic branch-cutting algorithm. We used the dynamic tree cutting function (maximum height 0.9, 0.95, 0.98) and the modules were selected from a consensus of the result.
The gene modules were classified using K-means clustering on the eigenvector that explains the variance of gene expression in each cell type (80.3% variability explained). To do so, the average gene expression of each module in each cell was used to calculate the variance of expression for each gene module in each cluster. Next, the cell type identity information was removed, and the variance matrix ordered. The eigenvector explaining the maximum variability in the data (PC1) was used to cluster the modules in groups of similar variances of gene expression.
The individual gene modules were characterized using two approaches: mapping the average expression of the module to the synaptic types and annotating the function or cellular compartment they are related to by gene ontology enrichment. The former was done by mapping the average expression of all genes in each module, normalized to the average expression of random genes, to the transcriptomic cell types and visualizing it in the tSNE generated using only synaptic genes. To map the function and cellular component most related to each gene module, hypergeometric gene set enrichment was used. The background used for this analysis (universe) was comprised by all the genes annotated in SynGO that were present in the Tasic et al.  dataset. The significance scores (p value) from the hypergeometric tests were adjusted for multiple hypothesis testing using the Bonferroni correction method. Lastly, visualization of the test results for every gene module was produced with the sunburst custom color-coding tool of SynGO ontologies .
Synapse genes contain cell identity information
To evaluate transcriptional diversity of synaptic genes in neuronal cell types, we filtered the expression data of cell types identified by Tasic et al.  using the genes in SynGO. We analyzed four gene sets, including all genes in the original dataset (Fig. 1A), all synaptic genes in SynGO (Fig. 1B), presynaptic genes in SynGO (Fig. 1C) and postsynaptic genes in SynGO (Fig. 1D). We then compared the cell class and cell type diversity across the four different subsets. Here, we refer to the 133 transcriptomic neuronal types described in Tasic et al.  as cell types (different colors in Fig. 1) and 16 merged groups of these cell types as cell classes. Distinct classes and cell types could be discerned using SynGO genes only and all genes in the dataset to a similar extent (Fig. 1A, B). Additionally, the observed transcriptomic diversity of presynaptic (Fig. 1C) and postsynaptic (Fig. 1D) genes showed similar levels of cell type specification. Quantification of the class and cell type discriminatory power of synaptic gene expression was calculated as the cell classification performance of each gene set using MetaNeighbor . We included a similar sized gene-set from MitoCarta  as comparison. Synaptic genes had a similar power in discerning classes and cell types in comparison to all genes or after removing SynGO genes (Fig. 1E, F). We observed no difference in the discriminatory power between presynaptic and postsynaptic gene sets (Fig. 1E, F). Similarly, no difference was observed when measuring pre- and post-synaptic genes in excitatory and inhibitory neurons independently (Additional file 3: Fig. S1A). However, there is a considerable overlap between the terms pre- and post-synaptic genes. Comparing only the genes specific for each category revealed a significantly higher score of postsynaptic genes for inhibitory neurons (Additional file 3: Fig. S1B, C). This suggests that postsynaptic diversity is larger among GABAergic cells. These results indicate that the diversity of the synapse transcriptome across cell types is similar to that of the full transcriptome. In Additional file 1: Table S1 we provide individual specificity scores (see methods) for each of the synaptic genes in the analysis.
Specific SynGO annotations underlie synapse diversity
To identify whether genes contributing to synapse diversity belong to specific functional sets or are expressed in specific synaptic compartments, we analyzed the cell type discriminatory power of annotated SynGO terms. To test this, we used MetaNeighbor  to score the performance of each SynGO term on the task of discriminating different cell types (AUROC scores) and compared it to random sets (of equivalent size) of genes drawn from SynGO and from all expressed genes in the dataset (Fig. 2, Additional file 4: Fig. S2A). Several SynGO terms in both biological functions (BP, Fig. 2A, B) and cellular components (CC, Fig. 2C, D) discriminated cell types significantly better than random gene sets. Among the top biological process annotations are elements of the postsynaptic density organization, synaptic signaling, modulation of presynaptic chemical transmission and synaptic vesicle exocytosis. For cellular localisation, both presynaptic and postsynaptic membranes, as well as the presynaptic cytosol and active zone membrane were significant. A few categories conversely performed worse than random, including ribosomal genes and genes involved in metabolism (Additional file 4: Figure S2B, C). Analysis of average expression per category could not explain this result (Additional file 4: Fig. S2D, E). This analysis confirmed that synaptic genes perform better, on average, in cell type identification analysis than gene sets comprised of any gene expressed in the data also when normalising for number of genes. These results show that synapse diversity among different neuronal types accumulates in specific functions and cellular components.
Gene network analysis reveals different levels of synaptic organisation
WGCNA analysis and hierarchical clustering of the gene co-expression network revealed a high level of modularity of synaptic genes (Fig. 3A, Additional file 2: Table S2). Classification of the gene modules according to the eigenvector calculated from the variance of gene expression across cell types (Fig. 3B), showed that synaptic gene modules can be clustered into three types: modules with specific expression in cell types or cell classes (discrete modules), modules showing a gradient of expression in a specific cell class (intermediate gradients) and modules with a similar gradient of expression in all cell types (pure gradients). Notably, these different classes of diversity were similarly found in pre- and postsynaptic gene modules. The results from this analysis were mapped to cell types using the average expression of the gene module in each cell (Fig. 3E, G, I; Additional file 5: Fig. S3). Interestingly, we observed modules with cell type specific expression in Vip-cells, sometimes shared with other cell types including Sncg-cells (pink; Fig. 3I) and near projecting cells (dark green; Additional file 5: Fig. S3). This suggest that some synaptic specializations can be re-used between GABAergic cell types and across GABAergic and excitatory cell types. In addition, gene set enrichment analysis of the obtained gene modules showed the biological processes and cell compartments (SynGO terms) to which each gene module is most related (Fig. 3D, F, H). Interestingly, none of the enriched SynGO terms in the different groups of modules are overlapping between groups. Modules exhibiting gradients of expression included terms related to metabolism, post- and pre-synaptic ribosome, and protein translation (similar to those terms indicated in Additional file 4: Fig. S2B). Interestingly we observed two gradient modules with opposing expression pattern (Fig. 3C) suggesting that these are specific programs that are anti-regulated, perhaps in response to external signals or each other. This included the genes CTBP1 and ARL8 involved in “presynapse to nucleus signaling pathway” and “regulation of anterograde synaptic vesicle transport respectively” (yellow module), opposing the expression pattern of ribosomal and translational machinery genes (turquoise module). These results suggest that there are different types of synaptic organisation, ranging from cell type-specific to pan-neuronal programs, specified by distinct sets of genes at the transcriptome level, which also involve specific cellular functions.
In this study, synapse diversity was mapped to previously defined transcriptomic neuronal cell types. We found that synaptic genes contain considerable cell identity information at the transcriptome level. Among synaptic genes, certain groups of genes associated to specific synaptic functions and localisation, annotated as SynGO terms, underlie the observed synapse diversity. Moreover, we identified additional candidate modules of co-expressed genes that contribute to synaptic functional diversity. These gene modules suggest different types of synapse organisation or different hierarchies of synaptic specification.
These results agree with the proposed vast synapse diversity arising from the combination of the different proteins that have been described as part of the synapse proteome [4, 20]. Therefore, transcriptomic synapse diversity exists to a deeper extent of that depicted by the classical classifications of synapse types, possibly integrating the anatomical and physiological features classically described, as proposed for GABAergic interneurons in previous studies .
We observed cell type-related diversity in both the pre- and postsynaptic genes. Our findings add additional gene-level resolution to the postsynaptic site diversity previously proposed in the brain based on protein expression of Dlg4 (PSD95) and Dlg3 (SAP102) . Additionally, our data highlights the existence of such diversity also in the presynaptic site, showing a similar molecular diversity.
Our results show that synapse diversity, as well as similarity, between different cell types resides in specific synaptic functions and components. We identified cytoskeleton organisation, cell adhesion and synaptic signaling, as important for synapse diversity. As expected, we observed gene modules specific to excitatory/inhibitory synapse classification but also gene modules being specific to neuronal classes and neuronal types. An additional layer of diversity seems to be related to gradient-like expression of gene modules within each cell type, and surprisingly gene modules showing opposing expression which is likely an indication of dynamic synapse regulation as proposed by Zu et al. .
Despite the single-neuron synapse diversity depicted here, recent studies have also described synapse diversity within a single neuron . Differential spatial distribution of synapse mRNA and proteins across the dendritic tree or between the cell body and synapses likely represent distinct functions within the same cell. It is our hope that our results broaden the understanding of synapse diversity and generate hypotheses for future single synapse research. Revealing the subcellular localization of these mRNA and proteins can provide insights on the synapse diversity within one neuron and the dynamic processes that occur in response to activity, perhaps through local translation of proteins. As an example, gene modules showing gradient expression profiles within cell types could reflect different cell states of the same cell types, in which single synapse variability could have a role. Our study provides the opportunity to expand the knowledge on the specific synaptic profile of distinct cell types. Further work in this direction could be used to selectively identify populations of synapses derived from specific populations of neuronal cell types, in intact tissue as well as in disease models.
Availability of data and materials
The datasets analysed in this study were previously published and are available at GSE115746 and https://www.syngoportal.org/
Custom scripts used in this study can be found at: https://github.com/Hjerling-Leffler-Lab/SynGO_scRNAseq
Abbott LF, Regehr WG. Synaptic computation. Nature. 2004;431:796–803. https://doi.org/10.1038/nature03010.
Jackman SL, Regehr WG. The mechanisms and functions of synaptic facilitation. Neuron. 2017;94:447–64. https://doi.org/10.1016/j.neuron.2017.02.047.
Kubota Y, Karube F, Nomura M, Kawaguchi Y. The diversity of cortical inhibitory synapses. Front Neural Circuits. 2016;10:27. https://doi.org/10.3389/fncir.2016.00027.
O’Rourke NA, Weiler NC, Micheva KD, Smith SJ. Deep molecular diversity of mammalian synapses: why it matters and how to measure it. Nat Rev Neurosci. 2012;13:365–79. https://doi.org/10.1038/nrn3170.
Zhu F, Cizeron M, Qiu Z, et al. Architecture of the mouse brain synaptome. Neuron. 2018;99:781-799.e10. https://doi.org/10.1016/J.NEURON.2018.07.007.
Grant SGN, Fransén E. The synapse diversity dilemma: molecular heterogeneity confounds studies of synapse function. Front Synaptic Neurosci. 2020;12:45. https://doi.org/10.3389/fnsyn.2020.590403.
Broadhead MJ, Bonthron C, Arcinas L, et al. Nanostructural diversity of synapses in the mammalian spinal cord. Sci Rep. 2020;10:8189. https://doi.org/10.1038/s41598-020-64874-9.
Crosby KC, Gookin SE, Garcia JD, et al. Nanoscale subsynaptic domains underlie the organization of the inhibitory synapse. Cell Rep. 2019;26:3284-3297.e3. https://doi.org/10.1016/j.celrep.2019.02.070.
Südhof TC. The synaptic vesicle cycle. Annu Rev Neurosci. 2004;27:509–47. https://doi.org/10.1146/annurev.neuro.26.041002.131412.
Zeisel A, Hochgerner H, Lönnerberg P, et al. Molecular architecture of the mouse nervous system. Cell. 2018;174:999-1014.e22. https://doi.org/10.1016/J.CELL.2018.06.021.
Paul A, Crow M, Raudales R, et al. Transcriptional architecture of synaptic communication delineates GABAergic neuron identity. Cell. 2017;171:522-525.e20. https://doi.org/10.1016/j.cell.2017.08.032.
Koopmans F, van Nierop P, Andres-Alonso M, et al. SynGO: an evidence-based, expert-curated knowledge base for the synapse. Neuron. 2019;103:217-234.e4. https://doi.org/10.1016/j.neuron.2019.05.002.
Tasic B, Yao Z, Graybuck LT, et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature. 2018;563:72–8. https://doi.org/10.1038/s41586-018-0654-5.
Butler A, Hoffman P, Smibert P, et al. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36:411–20. https://doi.org/10.1038/nbt.4096.
Crow M, Paul A, Ballouz S, et al. Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor. Nat Commun. 2018;9:884. https://doi.org/10.1038/s41467-018-03282-0.
Skene NG, Bryois J, Bakken TE, et al. Genetic identification of brain cell types underlying schizophrenia. Nat Genet. 2018;50:825–33. https://doi.org/10.1038/s41588-018-0129-5.
Rath S, Sharma R, Gupta R, et al. MitoCarta3.0: an updated mitochondrial proteome now with sub-organelle localization and pathway annotations. Nucleic Acids Res. 2021;49:D1541–7. https://doi.org/10.1093/nar/gkaa1011.
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 2008;9:559. https://doi.org/10.1186/1471-2105-9-559.
Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 2005. https://doi.org/10.2202/1544-6115.1128.
Grant SGN. Toward a molecular catalogue of synapses. Brain Res Rev. 2007;55:445–9. https://doi.org/10.1016/J.BRAINRESREV.2007.05.003.
Open access funding provided by Karolinska Institute. J.H.-L. was funded by the Swedish Research Council (Vetenskapsrådet, award 2018-00799), European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 819540), and the Swedish Brain Foundation (Hjärnfonden). M.V., A.B.S. and A.R.A. were funded by the Broad Synapse 3 project (6910259-5500000759) and the Simons foundation SFARI director's grant (882976).
Ethics approval and consent to participate
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Ranked list of synaptic genes according to their maximum cell type specificity.
List of genes in each of the modules identified with WGCNA.
Comparison of cell type discriminatory power of pre and post synaptic genes within cell classes. Quantification cell type identity encoded in each all genes annotated in SynGO, all pre- and post-synaptic genes; and non-overlapping pre- and post-synaptic genes, as well as the tSNE embedding resulting from only exclusive pre- and post-synaptic genes are shown. Wilcoxon rank test was used to determine statistically different performance of each pair of gene sets. Colour code of each cell type is the same used in Tasic et al.and Fig 1A.
All MetaNeighbor AUROC scores obtained in the random gene sets used for bootstrap analysis and annotated SynGO categories for each cell class in the dataset.Mean AUROC score across the 16 cell types is shown for biological functionsand cellular compartmentsannotated in SynGO. Some SynGO termsscore significantly worse than the random performance expected for their respective gene set size. The sunburst plots show the SynGO biological processesand cellular compartmentswhere less variability than expected lies across all neuronal subclasses. The colour codeindicates SynGO terms that perform significantly worse than random synaptic gene sets of the same size.Average expression of all genes in each annotated SynGO term.
tSNE representation of the synaptic cell typescolour coded with the average expression of the genes in each module found with WGCNA.
About this article
Cite this article
Roig Adam, A., Martínez-López, J.A., van der Spek, S.J.F. et al. Transcriptional diversity in specific synaptic gene sets discriminates cortical neuronal identity. Biol Direct 18, 22 (2023). https://doi.org/10.1186/s13062-023-00372-y