Posterior Variance for totalVI Estimates

Hello, I have been using totalVI for protein imputation for CITE-seq data. I am interested in possibly constructing confidence intervals for the protein estimates in my data and was wondering if the user is able to output the estimated posterior variance using totalVI?

Thank you!

You can construct credible intervals by setting return_mean=False and n_samples>1 in the normalized expression function. Note that this will return a tensor of samples by cells by proteins.

Using the snippet from the tutorial

_, protein_means_samples = model.get_normalized_expression(
    n_samples=25,
    transform_batch="PBMC10k",
    include_protein_background=True,
    sample_protein_mixing=False,
    return_mean=False,
)

Also note that this credible interval would be constructed over posterior samples of the latent variables. In other words, the variation comes from sampling the latent variables, as the imputed values are a deterministic function of these. This can be seen around Equation 16 in the Nature Methods version of the manuscript, where the credible interval would be over the mean of this zero-inflated gamma distribution, as opposed to the random variable itself. There’s also a way you could sample from the likelihood p(y | ...) and get counts, such that it would be more like a counterfactual posterior predictive sample.

1 Like