Input to scvi encoder

I have a vague memory that in the original version of scvi, the counts were normalized somehow before being passed into the encoder, perhaps as log(1+CPM). My recollection is this was more numerically stable than passing raw counts directly. Is that still the case? I couldn’t find anything in the documentation or in the code itself.

Just a log(1+x) transform. It’s here:

And yes, more numerically stable!

Excellent, thanks a ton Adam!

Hi Adam, I can see that log(1+X) keep partial info of the raw data while increasing the numerical stability as you mentioned. Is there a specific reason that scVI does not use more common log2(1+CPM) or log2(1+TPM)?

Hi, we just use log(1 + x) for simplicity as we generally only care about numerical stability of the model.