Merging identical genes from 10x fixed scRNA

Hi,

I have a perhaps unusual use-case.
I am working with the ouput of cellranger multi for the new probe-based fixed single cell kit.
The unfiltered raw_feature_bc_matrix.h5 which I want to utilize with Cellbender and Co contains probes and not transcript species.

That means when I load it in with sc.read_10x_h5() there will be duplicate entries in adata.var and in the columns of adata.X as some genes have multiple probes targeting them. These entries have identical var_names

What would be the most graceful way to merge these entries?

Do you have both the same targets and the same expression levels?

Otherwise, maybe you’d want the “better” probe? I don’t know how you’d decide that though.

Hi Isaac, the expression levels of each probe is different.
They seem to be targeting different forms of the genes.

However, I see now that 10x simply removes blacklisted probes which results in unique .var in the end.

grafik

So not relevant after all.

Out of curiosity how would one merge genes? Extract the index number and then operate on anndata.X?

When I’ve done this with microarray data in the past was something like:

  • DataFrame where each row is a gene
  • Group by probe target
  • Some aggregation (max, mean, etc)

If you are okay with densify the matrix, this should be straight forward. Maybe with flox or numpy-groupies.

For sparse, it’s a little more complicated. But this would be a good extension of the new sc.get.aggregate and I’ve opened an issue to track it:

1 Like