Comparing steps of Scanpy for scRNQ-seq and totalvi for CITE-seq

taopeng1100 · August 2, 2021, 6:44pm

Hi I have spent a few days to learn how totalvi analyze CITE-seq data and I am bit confused by the contrasting steps between Scanpy and totalvi:

I have used Scanpy for 10x scRNA-seq for over 2 years now and I love it.
The typical steps are as follows:

Read in the data.
Preprocessing.
3.normalize, log transformation.
Generate highly variable genes.
regress out, scale, PCA, neighbor , UMAP and Leiden.

For TOTALVI, I realize it is quite different:

Read in the data.
Preprocessing
normalize, log transformation.
Generate highly variable genes. The method in this step is very different! Why?
setup_anndata
vae = scvi.model.TOTALVI, vae.train. This is unique to scvi/TOTALVI.

Please, advise if this is appropriate steps to analyze CITE-seq using scvi/TOALVI?

adamgayoso · August 2, 2021, 7:56pm

totalVI replaces the regress out, scale, and PCA steps of Scanpy. How you select genes is up to you (we use the seurat v3 flavor of the function in scanpy, which requires the counts and not the normalized data). All the other differences you see are there to ensure that the anndata object has data in the correct slots.

taopeng1100 · August 2, 2021, 10:09pm

Thx for the explanation. I have CITE-seq data from donor 1 to 3 and each donor has control or treated. How can I set up batch so I can see the clustering result in donor 1, 2 and 3 or control versus treated?

Valentine_Svensson · August 4, 2021, 1:28am

I would set this up as six batches of [donor] x [treatment condition]. This means the latent representation will describe cell-to-cell variation not explained due to donor-to-donor variation or differences due to treatment. This way you can cluster your data and define cell types. Once you have annotated the cells into e.g. 10 cell types you can do differential expression per cell type between treated and control.

This is a very effective way to learn how various types of cells in a context react differently to a perturbation and in what way they may be communicating with each other by investigating the DE genes.

mdmanurung · September 21, 2021, 11:50am

Hi @Valentine_Svensson,

I am also interested to do DE analysis, but I am not sure which values and methods to use. So far, I have exported raw gene counts and denoised protein values to R for DE analysis using edgeR and limma, respectively, on pseudobulked samples.

Does this approach make sense?

Valentine_Svensson · September 29, 2021, 12:25am

What I do depends on how complex the design of the experiment is. In straightforward settings I use scVI. If there is nested or hierarchical design I pseudobulk each sample and use GLMMs.

I don’t think you should do statistics on denoised values (without incorporating uncertainty about the denoising). The confidence/credible intervals will no longer be related to your observed data.

taopeng1100 · October 8, 2021, 3:28pm

For this step: sc.pp.highly_variable_genes, I used to do like this:
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)

What are the best practice to select highly_variable_genes?

I appreciate your insights!

Tao

Topic		Replies	Views
Running TOTALVI data in which subset of cells do not have citeseq data scvi-tools integration , totalvi	8	504	March 25, 2021
Batch correction in reconstructed gene space scvi-tools	4	655	March 24, 2023
Thoughts on a more ~realistic tutorial? scvi-tools tutorials	14	982	February 26, 2022
Differentially expressed genes between two Leiden clusters scvi-tools diff-exp , totalvi	8	865	October 15, 2021
Dataset integration and analysis scvi-tools integration , cellassign , scvi , multivi , totalvi	3	581	May 3, 2023

Comparing steps of Scanpy for scRNQ-seq and totalvi for CITE-seq

Related Topics