The best practice for detecting differential gene expression between 2 types of samples

Dear community,

The datasets I have come from WT or KO animals, each with multiple batches. They contain roughly the same cell types, although KO samples have very few cells in certain cell types. The goal is to match all datasets by cell type, and detect gene expression dysregulation in KO cells for each cell type. The classic Seurat pipeline would first run CCA to align all datasets, and then use the uncorrected (but normalized) counts for differential expression between WT and KO.

I was wondering how to approach this within the scVI-tools framework. Should I use batch corrected data for DEG detection or uncorrected values? Also, is there a way to identify a KO cells specific gene expression signature without running DEG detection? I’m asking because there are cases where genes in an entire pathway are up/downregulated in a concerted manner but to a small extent. These genes wouldn’t be called DEGs although they are meaningful. Is there an existing framework that directly compares KO and WT cells on a pathway level without running gene-wise comparisons first?

Thank you so much!!

Generally, the scvi-tools approach would be the same, except you would use SCVI to integrate the datasets. @Valentine_Svensson might have better insight on your DE questions.