Hey everyone. Historically I have done my cell annotations with a combination of manual approaches or with SingleR and databases (I come from the bioconductor world originally).
I wanted to give
CellAssign a try for a recent project because some the cell types arent in the databases and this could be a great way to “automate” the manual annotation since I get to assign my own markers with it.
I followed this guide → Annotation with CellAssign — scvi-tools and have no errors but also cells also get dumped into incorrect categories. The default 400 epochs puts everything into the last two categories , and when I try with 50 epochs everything gets dumped into “other” category… (have 210,000 cells , 10X platform, 5’ prime seq kits)
I have a RTX GPU so re-running the
model.train() is trivial in time so am happy to try other approaches to change things if people have suggestions…
my two ideas are:
bdata = adata[:, marker_gene_mat.index].copy()I am going to have a lot of empty cells. are the total 0’s confusing the model? those t cells are tricky becasue CD4 transcript and CD8A wont be detected in all of them. if I remove them , how hard is it to extrapolate them later from the detected ones? EDIT: I attempted to control for this, see first comment, it didn’t fix my problem
the tutorial doesnt want log data, put perhaps I need to format the counts differently than I am, I have tried RAW and normalized RAW…
There seems to be a VERY old r-cellassign package that uses tensorflow also, it is the same thing as this? I think I would be better at troubleshooting an sce object because of my background than anndata , but i have no idea if the projects are linked or just have same name.
oh and 41 genes being used in the celltype.csv
Thanks in advance for troubleshooting help!