I have some datasets I would like to integrate, select a few cell types that interest me and recluster them. However, I think I might have a problem with the second time I select variable genes and train the model, because I’m not sure if getting the normalized data is adequate.
I ran this to normalize the expression, save these normalized genes, select variable genes, and cluster downstream.
adata.layers["counts"] = adata.X.copy() sc.pp.normalize_total(adata, target_sum=1e4) sc.pp.log1p(adata) adata.raw = adata # keep full dimension safe sc.pp.highly_variable_genes( adata, flavor="seurat", n_top_genes=3000, layer="counts", batch_key="Sample", subset=True )
Then I selected the clusters that interested me to cluster them again, with
adata2=adata[adata.obs['leiden_0.6'].isin(['1', '5', '4'])]
Because I probably a need a new set of variable genes, I used the block below to get all genes back.
adata2 = adata2.raw.to_adata()
These genes are normalized
print(adata2.X) (0, 18) 2.37902 (0, 20) 2.37902 (0, 47) 2.37902 (0, 68) 2.37902 (0, 84) 3.0247393
Finally, I ran this block but cluster these cells of interest again. I commented on the normalization step, as the genes are already normalized.
adata2.layers["counts"] = adata2.X.copy() adata2.raw = adata2 # keep full dimension safe #sc.pp.normalize_total(adata2, target_sum=1e4) #sc.pp.log1p(adata2) sc.pp.highly_variable_genes( adata2, flavor="seurat", n_top_genes=3000, layer="counts", batch_key="Sample", subset=True )
There are 2 reasons I think something went wrong.
1 - all cells are too overlapped
2 - this warning
UserWarning: Make sure the registered X field in anndata contains unnormalized count data.
I assume the normalization should be performed with all cells present, which is why I decided to save normalized genes instead of counts. On the other hand, when I try to run this code but saving the raw counts instead by running
adata.raw = adata # keep full dimension safe
before the normalization, the cells are still too overlapped (they are not overlapped in the first clustering step).
Is there anything I am missing?