scVI imputation confusion

gdewael · June 16, 2021, 12:07pm

Hi scvi-tools team,

Great piece of software.
I’m looking to benchmark some models/dataset w.r.t. imputation performance.

In your documentation, it is not immediately clear how to properly impute gene expression values using scVI.
From the scVI paper:

This mapping goes through intermediate values ρ^n_g, which provide a batch-corrected, normalized estimate of the percentage of transcripts in each cell n that originate from each gene g . We used these estimates for differential expression analysis and its scaled version (multiplying ρ^n_g by the estimated library size ℓ_n) for imputation.

I have surmised that ρ^n and ℓ_n can be obtained through the functions get_normalized_expression and get_latent_representation

My question is in regards to the library_size argument of the former function. In your user guide, you use a common library size. Hence my question: to benchmark imputation performance, should expression frequencing be scaled to latent library sizes or a common library size?

Thanks in advance!

adamgayoso · June 16, 2021, 4:56pm

So with this function you get \ell_n\rho_n, where you can use the parameters to set \ell_n=1 (default I believe).

It’s not easy to give a straightforward answer to this as it depends on the setup of your benchmarking experiment. Most people would tend to look at the data on a common library scale, I assume.

Topic		Replies	Views
DestVI gene imputation comparison between slides scvi-tools	0	223	March 11, 2023
Autoencoder gene expression reconstruction accuracy scvi-tools	1	306	June 1, 2022
What is the best way to extract a "full" batch effect corrected count matrix from scVI model? scvi-tools scvi	4	1812	August 16, 2023
Assessing scVI fit by gene scvi-tools scvi , model-fit	3	423	June 23, 2021
Batch Integration Parameter Tuning scvi-tools integration , gene-selection , scvi , modeling	1	485	March 2, 2022

scVI imputation confusion

Related Topics