Differential expression and differential accessibility with MultiVi

Hi,

I used MultiVi

https://docs.scvi-tools.org/en/stable/tutorials/notebooks/MultiVI_tutorial.html

to integrate a dataset including sc/snATAC-Seq, sc/snRNA-Seq, 10X multiome samples (Single Cell Multiome ATAC + Gene Expression - 10x Genomics)

I would like to calculate differential expression and differential accessibility and I found the routines

scvi.model.MULTIVI.differential_expression

scvi.model.MULTIVI.differential_accessibility

in the documentation.

It is not very clear to me at what point of the workflow reported in the first link I should use them, though.

Is there a tutorial available?

Thanks :slight_smile:

Ciao Daniele,
To answer your question, you need to train the model using paired + single modality data and once that the model is trained, you can call the routines to calculate differential accessibility/expression.

         We will be modifying the tutorial soon as we have many new features now. Thanks for asking this question! We will try to clarify it on the tutorial.

         Please, let me know if it is clear now ! 

Thanks,

Mariano

Ciao Mariano,

Thank you so much for your reply.

I tried to run the tutorial reported at

to try and use the routine

scvi.model.MULTIVI.differential_expression

I am not sure if this command is correct

dge_df = scvi.model.MULTIVI.differential_expression(adata = adata_mvi, groupby = ‘leiden’)

since I am getting the error

TypeError: differential_expression() missing 1 required positional argument: ‘self’

I report the pdf of the notebook I ran:

I have few other questions as well:

  1. Training took few hours (4.30-5 hours) on a node on a cluster (in the pdf I shared I am importing the trained object I obtained with the same code), is this amount of time normal?

  2. The UMAP looks different from the one in the tutorial on the GitHub link above. I noticed that in the GitHub link training was stopped at about 42%. Could this be a reason?

All the best,

Daniele

Hi Daniele,

You need to use the method of the object corresponding to the instance of the model you trained. In the tutorial this is called adata_mvi. So you need to replace the

dge_df = scvi.model.MULTIVI.differential_expression(adata = adata_mvi, groupby = ‘leiden’)

call with

dge_df = adata_mvi.differential_expression(adata = adata_mvi, groupby = ‘leiden’)

Regarding your other questions, I haven’t used this particular model, but the tutorial takes 36 minutes to train for 12k cells with a GPU. So if you have ~120k cells and are also using a GPU, it sounds reasonable. Without a GPU training will be slower (I don’t know how much slower).

With the RNA-seq scVI models my experience is that you can use fewer epochs for training if you have more cells to cut down training time. My typical workflow is to run quick training with fewer epochs while I explore hyperparameters/setting, then when I think I understand the variation in the data I start a longer training run and save the model so I can just load it when I want to do use it for some analysis in the future.

Regarding the UMAP; UMAP training is non-deterministic. In the tutorial a manual random seed is set for scvi, but it doesn’t seem like a manual random seed is set for the UMAP training. The resulting UMAP will then look different, but the general structure in the plot (number of clusters, overlap between labels, etc) should be consistent between UMAP runs.

Best,
/Valentine