First of all, thanks so much for creating this amazing set of tools! I’m a new user, so my apologies if I’ve missed relevant docs that would answer this question
I’m interested in creating a large “atlas” from multiple (~20) independent cohorts which collectively span ~1000 donors. I have two related questions:
- is there a recommended minimum number of cells per donor ie batch (I am using the donor as the “batch” identifier)? I understand that there are suggestions to keep the number of cells greater than the number of genes, but I’m not sure if this applies within individual batches as well.
- It looks like there are systematic technical differences in expression between the cohorts. Is there a way to include the cohort as an additional “batch” covariate for the model fitting? Does that even make sense, given that each donor has a cohort membership so the model already has freedom to fit those differences on a per-donor basis?
Any other thoughts/suggestions you might have on integrating very many batches would be much appreciated! For example, would it be better to integrate a few big batches and then bring the rest into that latent space using something like scarches? Are there SCVI flags (use_layer_norm, use_batch_norm, etc?) that might be appropriate for handling many cells/batches?
Thanks in advance for any advice you can offer!