scNetViz: Cytoscape networks for scRNA-seq analysis
scNetViz is a Cytoscape app for identifying differentially expressed genes from single-cell RNA sequencing data and displaying networks of the corresponding proteins for further analysis. Several ways of plotting the cells and gene expression data are also available. This app enables scientists who may not be experts in scRNA-seq to explore the data and to develop biological hypotheses.
scNetViz works with other Cytoscape apps, namely stringApp, the enhancedGraphics app, cyBrowser, and cyPlot, as well as web services hosted by the RBVI, to provide:
- differential expression analysis based on clusterings or other categorizations of the cells in the scRNA-seq dataset
- cell plots: t-SNE, UMAP, etc.
- gene-expression plots: violin, heatmap
- protein networks from STRING corresponding to the sets of top differentially expressed genes
- annotating network nodes with GO terms, pathways, etc. that are enriched vs. the whole genome
Contents
- Installation
- Main Menu
- Loading an Experiment
- Experiment Table
- Plotting Cells
- Adding Categories
- Differential Expression Analysis
- Loading Protein Networks
- Results Panel
- Data Cleaning
Installation
To download and install scNetViz, start Cytoscape and bring up the App Manager (Apps→App Manager). You can search for scNetViz directly by name or by any of its tags: automation, integrated analysis, enrichment analysis, gene expression, and PPI-network. Select the scNetViz app and click Install.
An alternative approach is to navigate to the Cytoscape App Store using a web browser, search for and select scNetViz as above, and download the jar file. In Cytoscape, the app can be installed from file using the App Manager (Apps→App Manager).
Source code is available from https://github.com/RBVI/scNetViz/
Main Menu
scNetViz adds entries to the Cytoscape main Apps menu:
- scNetViz
- Load Experiment
- From Single Cell Expression Atlas – browse and load an experiment from the Single Cell Expression Atlas
- From the Human Cell Atlas – browse and load an experiment from the Human Cell Atlas
- Import from file – load scRNA-seq quantification from file (MatrixMarket zip, tar.gz, tgz, or gzip)
- Add Category
– read in or calculate additional classifications of the cells
(details...)
- Import from file
- Louvain clustering
- Leiden clustering
- New Cell Plot
– plot cells in 2D (details...)
- t-SNE (local)
- UMAP
- Graph layout
- t-SNE (on server)
- Show Experiment Tables – show the table for a previously loaded experiment
- Show Results Panel – show a panel in the main Cytoscape window for adjusting network parameters and performing enrichment analyses for a previously loaded experiment
- Remove Experiment – delete a previously loaded experiment (closing an experiment table merely hides it)
- Settings – set default cutoffs for differential expression analysis and network display
- Load Experiment
Loading an Experiment
To browse and load an experiment from an online repository:
- Single Cell Expression Atlas (SCEA): Click the icon in the Cytoscape toolbar or use main menu: Apps→scNetViz→Load Experiment→From Single Cell Expression Atlas
- Human Cell Atlas (HCA): Click the icon in the Cytoscape toolbar or use main menu: Apps→scNetViz→Load Experiment→From the Human Cell Atlas
The resulting experiment browser lists the available datasets along with their accession codes, brief descriptions, numbers of cells, and other information. Clicking a column header sorts by the contents of that column. Searching with a term of interest highlights all rows with matching text in the accession, experiment (SCEA), description (HCA), or organisms column.
SCEA Experiment Browser (click any figure to enlarge it...) Clicking a row to highlight it chooses an experiment. If multiple rows are highlighted, only the first is treated as the chosen experiment for the following actions:
- Clicking View Data loads the following for the chosen experiment:
- Normalized scRNA-seq quantification in transcripts per million (TPM) per gene per cell, to be shown in the TPM tab of the experiment table
- Metadata such as sample characteristics and experimental variables, to be shown in the Categories tab of the experiment table
- From SCEA only, the results of clustering the filtered cells at different values of k with Single-cell Consensus Clustering (SC3), also shown in the Categories tab of the experiment table
Subsequent steps of analysis can be performed interactively using buttons and menus on the experiment table.
- For SCEA only, clicking Create Networks loads data for the chosen experiment as described above, but also automatically performs differential expression analysis and loads networks, one for each cluster in the best clustering plus a network that is the union of the cluster-specific networks. The best clustering has sel.K value true (clustering information can be viewed in the Categories tab of the experiment table). If none is true, the first clustering in the list will be used instead. The experiment table is not shown automatically, but can be invoked by choosing Apps→scNetViz→Show Experiment Tables from the main menu.
Clicking Create Networks does not work for HCA data because there is no default category for differential expression analysis.
Whether the Double-Click Action (the result of double-clicking a row in the experiment browser) should be View Data or Create Networks is specified in the settings, along with the default parameters for differential expression analysis and loading networks.
The Settings dialog can be shown by choosing Apps→scNetViz→Settings from the main menu or by clicking the icon near the upper right corner of either atlas browser or any experiment table.
To load an experiment from file:
Choose Apps→scNetViz→Load Experiment→Import from file from the menu, then browse to locate and open a zip, tar.gz, tgz, or gzip file of the three MatrixMarket files (.mtx, .mtx_cols, .mtx_rows) comprising a normalized scRNA-seq quantification dataset. The species must also be specified in the dialog to enable the later step of loading networks for the corresponding proteins.
Experiment Table
The experiment table contains all of the information loaded for an experiment, as well as analysis results. It has three tabbed sections:
Experiment Table: TPM |
- TPM
– RNA quantification in transcripts per million,
with genes as rows and cells as columns.
Double-clicking a gene name sorts the columns by the values in that row.
Standard column sorting by clicking a column header
(in this case, a cell identifier) can also be done.
Menus and buttons across the top:
- New Cell Plot
– plot cells in 2D (details...)
with coloring by TPM values for the currently chosen gene
(the row highlighted in the table)
- t-SNE (local)
- UMAP
- Graph layout
- t-SNE (on server)
- View <cell-plot-type>, for example,
View tSNE or View UMAP
– re-show the most recently calculated cell plot, but with
coloring by TPM values for the currently chosen gene
(the row highlighted in the table)
- Add Category
– read in or compute additional classifications of the cells
(details...)
- Import from file
- Louvain clustering
- Leiden clustering
- Export CSV – export table as a text file with comma- or tab-separated values
- New Cell Plot
– plot cells in 2D (details...)
with coloring by TPM values for the currently chosen gene
(the row highlighted in the table)
- Categories
– sets of labels such as cluster numbers or cell-type assignments,
with categories as rows and cells as columns.
Each row defines a grouping that could be used for
differential expression analysis.
Within a given category, some cells may lack a label (group assignment).
Categories can be added from input files
or clustering calculations,
and the menu under Available categories
can be used to switch between the resulting sets of categories.
Clicking a row chooses that category, and cutoff criteria
for which genes to include can be adjusted before
Calculate Diff Exp
is clicked to launch the analysis.
- New Cell Plot
– plot cells in 2D (details...)
with coloring by the currently chosen category and hiding cells without labels
for that category (e.g., cells not assigned to any cluster)
- t-SNE (local)
- UMAP
- Graph layout
- t-SNE (on server)
- View <cell-plot-type>, for example, View tSNE or View UMAP – re-show the most recently calculated cell plot, but with coloring by the currently chosen category and hiding cells without labels for that category (e.g., cells not assigned to any cluster)
The other controls are as described for the TPM tab above.
- New Cell Plot
– plot cells in 2D (details...)
with coloring by the currently chosen category and hiding cells without labels
for that category (e.g., cells not assigned to any cluster)
- DiffExp – results (if any) of differential expression analysis, with genes as rows and DE statistics as columns. In this section, double-clicking a gene name opens a browser window showing information for that gene at the Ensembl website.
The table for a previously loaded experiment can be shown by choosing Apps→scNetViz→Show Experiment Tables from the main menu.
Plotting Cells
The New Cell Plot menu is available from the experiment table or under Apps→scNetViz in the main menu, with choice of method and adjustable parameters. Parameters are explained in more detail in the balloon help from mousing over the dialogs.
UMAP Colored by Cluster |
- t-SNE (local)
– t-SNE (t-Distributed Stochastic Neighbor Embedding)
calculated locally after data cleaning
- Initial Dimensions (initial default 10)
- Perplexity (initial default 20)
- Number of iterations (initial default 1000)
- Use Barnes-Hut approximation (initial default off)
- Theta value for Barnes-Hut (max: 0 min: 2) (initial default 0.001)
- Log normalize the data (initial default on)
- Center and scale the data (initial default off)
Even with unchanged parameters, t-SNE results may vary between runs due to randomization inherent in the method.
- UMAP
– Uniform Manifold Approximation and Projection
calculated on web server
- Number of neighbors (initial default 10)
- Minimum distance (initial default 0.5)
- Advanced preprocessing parameters – see data cleaning
- Graph layout
– force-directed graph drawing as
implemented in scanpy, calculated on web server
- Graph layout algorithm
- fa (ForceAtlas2) (initial default)
- kk (Kamada Kawai)
- fr (Fruchterman Reingold)
- lgl (Large Graph)
- dlr (Distributed Recursive Layout)
- rt (Reingold Tilford tree layout)
- Advanced preprocessing parameters – see data cleaning
- Graph layout algorithm
- t-SNE (on server)
– t-SNE (t-Distributed Stochastic Neighbor Embedding)
calculated on web server
- Perplexity (initial default 20)
- Initial dimensions (initial default 0, meaning not to use principal components analysis)
- Early exaggeration (initial default 12)
- Learning rate (initial default 1000)
- Advanced preprocessing parameters – see data cleaning
Clicking a cell in the plot scrolls to the corresponding column in the Categories tab of the experiment table. With the magnifying-glass icon chosen (initial default) in the plot window, click-dragging to select a rectangle automatically enlarges that region. Clicking the house icon resets to showing the whole plot.
Adding Categories
For the purposes of scNetViz, a “category” is any classification or labeling of the cells. Within a given category, the cells in an experiment might all have the same label (for example, species = Homo sapiens) or different labels (for example, cluster number = 1, 2, ...). Categories can be viewed and sorted in the Categories tab of the experiment table.
A category in which the cells have at least two different labels is required for differential expression analysis.
The Add Category menu is available from the experiment table or under Apps→scNetViz in the main menu, with options:
- Import from file –
the file can be comma- or tab-separated (CSV or TSV),
with categories and cells as rows and columns or vice versa.
If the columns are categories, the
File needs to be pivoted option should be checked on.
The number of header lines and the data type should be indicated.
- Louvain clustering –
Louvain clustering as
implemented in scanpy, calculated on web server
- Number of neighbors (initial default 15)
- Advanced preprocessing parameters – see data cleaning
- Leiden clustering –
Leiden clustering as
implemented in scanpy, calculated on web server
- Number of neighbors (initial default 15)
- Advanced preprocessing parameters – see data cleaning
Experiment Table: Categories |
Differential Expression Analysis
In the Categories tab of the experiment table, each row defines a category or grouping that could be used for differential expression (DE) analysis. Cutoffs indicate which genes should be included, with factory defaults:
- absolute magnitude of Log2FC (log2 fold change) at least 0.5
- gene detected in at least Min.pct 10% of cells in either comparison set
The values can be edited directly, and defaults (the values shown initially) adjusted in the settings. Genes not meeting the criteria will still be listed in the results, but without significance values.
The default grouping (category) for analysis of SCEA data is the clustering with sel.K value true, if any, or else the first clustering listed. A different category can be chosen by clicking its row. Clicking the Calculate Diff Exp button performs the analysis with the current settings. Not all cells may be assigned to a cluster, and more generally, some cells may lack a label (may not be assigned to a group) within the chosen category; these cells are excluded from analysis.
Expression is compared between each group and the set of all other groups in that category. With the default cutoffs, a gene is omitted from the calculation if the absolute magnitude of its log2 fold change (ratio of expression levels for the two sets of cells) is less than 0.5 or the gene is detected in fewer than 10% of the cells in each of the two sets.
In the DiffExp tab of the experiment table, the rows are genes, and result columns for each group (e.g. cluster) include:
- MTC – mean transcript count (in TPM), i.e., average over all cells in the group
- Min.pct – percent of cells with gene detected in the group or in the comparison set, whichever is more
- MDTC – mean (data-available) transcript count, i.e., average over cells in the group with the gene detected
- log2FC – log2 of the fold change (FC). FC = (MTC of the group) ÷ (MTC of the comparison set)
- pValue – p-value for expression difference, group vs. comparison set, from the Mann-Whitney U (Wilcox rank-sum) test
- FDR – false discovery rate according to the Benjamini-Hochberg procedure
The menus under Comparison can be used to show the results for different clusterings (different values of k) or categories. Menus and buttons on the top right:
Heatmap |
- View Plots
– plot differential or absolute expression:
- Heatmap
– show a heat map of genes colored by log2 fold change
for each comparison in the current category
(e.g., each cluster vs. the set of all others);
placing the cursor over the map reports the corresponding gene and value.
There is a column for each comparison, and the rows are the top
differentially expressed genes from each comparison,
but shown for all columns so that the same gene may appear twice.
For example, if three clusters are each compared to all others,
and the top 10 positive and top 10 negative log2FC genes are shown
for each comparison, the heat map will have three columns and 60 rows.
The number of genes to show for each comparison
is specified in the settings
(Heat map count limit, initial default 20
for the 10 largest-magnitude log2FC of each sign after applying the same
FDR and Log2FC
cutoffs as for network generation).
As shown in the color key, coloring is from red for the largest positive
values, through white for zero, to blue for the largest negative values.
Clicking within the heat map highlights the corresponding row (gene) in the
TPM and DiffExp tabs
of the experiment table, and if
networks are loaded, the corresponding nodes.
- Violin (diff exp)
– plot the log2
fold-change distribution of genes for each comparison in the current category;
placing the cursor over a plot shows its overall statistics as well as the
kernel density estimation (KDE) value for the point under the cursor
- Violin (gene) – for the gene chosen in the table by clicking a row, plot the expression-level distribution in each group (e.g., each cluster) in the current category
Experiment Table: DiffExp
- Heatmap
– show a heat map of genes colored by log2 fold change
for each comparison in the current category
(e.g., each cluster vs. the set of all others);
placing the cursor over the map reports the corresponding gene and value.
There is a column for each comparison, and the rows are the top
differentially expressed genes from each comparison,
but shown for all columns so that the same gene may appear twice.
For example, if three clusters are each compared to all others,
and the top 10 positive and top 10 negative log2FC genes are shown
for each comparison, the heat map will have three columns and 60 rows.
The number of genes to show for each comparison
is specified in the settings
(Heat map count limit, initial default 20
for the 10 largest-magnitude log2FC of each sign after applying the same
FDR and Log2FC
cutoffs as for network generation).
As shown in the color key, coloring is from red for the largest positive
values, through white for zero, to blue for the largest negative values.
Clicking within the heat map highlights the corresponding row (gene) in the
TPM and DiffExp tabs
of the experiment table, and if
networks are loaded, the corresponding nodes.
- Export CSV – export table as a text file with comma- or tab-separated values
The top differentially expressed genes as shown in the heatmaps and networks may be considered putative markers, but their biological relevance cannot be assessed by statistics alone. Important factors include the specific experiment, its scope and conditions, and the category groupings used for differential expression analysis.
Loading Protein Networks
Protein networks are fetched from the STRING database, either automatically or when the Create Networks button in the DiffExp tab of the experiment table is clicked.
Network analysis cutoffs indicate which proteins should be loaded as a network for each comparison in the differential expression analysis. Factory defaults are to include only the proteins for genes with:
- FDR (false discovery rate) no greater than 0.05
- Log2FC (log2 fold change) absolute magnitude at least 1.0
Entering a value for the Max genes further limits the set of proteins to no more than the specified number of top hits ranked by log2FC (factory default 200). The Positive only option indicates whether only genes with positive log2FC values (higher expression than in the comparison set) should be included, with factory default off, meaning to include genes with both higher and lower expression than the comparison set. The values can be edited directly, and defaults (the values shown initially) adjusted in the settings. Networks are generated according to the current criteria when the Create Networks button is clicked.
Assuming some genes meet the criteria for each comparison, the number of networks loaded will be the same as the number of groups in the category, plus one network that is the union of the others. Network node coloring is by log2FC, from red for most positive to blue for most negative, as in the heatmap.
Some of the top-ranked genes as shown in the heatmap may be missing from the network because there is no corresponding protein in STRING (for example, noncoding RNA).
The Cytoscape Node Table lists attributes of each node (protein) including its log2FC magnitude rank in the network that is being viewed. Sorting on the rank column gives the top genes from differential expression analysis, essentially the protein version of Seurat FindMarkers results. For example, the scNetViz Cluster 5 Rank column gives the top putative markers for the comparison of cluster 5 vs. all others.
Networks from scNetViz |
Results Panel
The Results Panel within the main Cytoscape window includes options similar to those appearing elsewhere in scNetViz, plus controls for performing enrichment analyses of terms (annotations) in the network vs. the whole genome of the organism. The panel can be shown by choosing Apps→scNetViz→Show Results Panel from the main menu.
To View data and plots:- Tables
– the respective sections of the
experiment table:
- TPM Table
- Category Table
- DE Table
- Plots
– differential expression plots:
- Heatmap – a heat map of genes colored by log2FC for each comparison in the current category, as above; if nodes are selected in the network, include only the corresponding genes
- Violin – the log2FC distribution of genes for each comparison in the current category, as above; if nodes are selected in the network, include only the corresponding genes
- The menus under Comparison specify what category or clustering (different values of k) should be used to group the cells for differential expression analysis, and which pairwise comparisons should be made.
- Several cutoffs indicate which proteins should be included in the
networks based on the
differential expression of their genes:
- FDR – maximum false discovery rate for expression difference
- Log2FC – least magnitude of log2 fold change
- Max genes – include no more than the specified number of top hits ranked by log2 fold change
- Positive only – whether to include only those with higher expression than the comparison set
Clicking Create Networks generates the networks according to the current criteria. Default values can be adjusted in the settings.
Network Showing Enrichment
- Entire network
- Positive only – only use the proteins with higher expression than the comparison set
- Negative only – only use the proteins with lower expression than the comparison set
- Selected only – only use the proteins corresponding to the currently selected nodes
- FDR cutoff
– false discovery rate above which to exclude proteins
from enrichment analysis
Clicking Retrieve Table performs the enrichment analysis and loads the STRING Enrichment table of results, with redundancies removed. The terms found to be enriched for a network are displayed as colored segments (a “donut chart”) encircling the corresponding nodes, with color-coding as shown in the table.
Enrichment analysis is described in more detail in the stringApp paper:
Cytoscape stringApp: Network analysis and visualization of proteomics data. Doncheva NT, Morris J, Gorodkin J, Jensen LJ. J Proteome Res. 2019 Feb 1;18(2):623-632.
Data Cleaning
For cell-plot and clustering calculations on the web server, data cleaning settings can be adjusted in the Advanced preprocessing parameters section of the respective dialogs:
- Minimum number of genes/cell (initial default 100)
- Minimum number of cells/gene (initial default 1)
- Normalize (initial default on)
- Log transform (initial default on)
- Highly variable genes (initial default on)
- Scale the final matrix (initial default on)
For local t-SNE calculations, data cleaning entails:
- removing all genes for which expression was not detected in any cells and all cells in which no genes were found to be expressed
- log-normalizing expression values (TPMs)
- limiting the gene set to the top variable genes using a reimplementation of the Seurat “find variable genes” routine
Last updated on November 21, 2019
About RBVI | Projects | People | Publications | Resources | Visit Us
Copyright 2021 Regents of the University of California. All rights reserved.