How to filter concatenated anndata object?

Hello,

I concatenated several samples in one anndata object as shown here:
S1.obs[‘sample’]=“S1”

S2.obs[‘sample’]=“S2”

S3.obs[‘sample’]=“S3”

merge into one object.

adata = S1.concatenate(S2, S3,)

Then I did the QC steps and the violin plots in order to decide how to filter the object. Since I have 3 samples I am using different parameters for each. Does the below look correct?

keep_S1 = (adata.obs[‘n_genes_by_counts’] < 5000) & (adata.obs[‘sample’] == ‘S1’)
keep_S2 = (adata.obs[‘n_genes_by_counts’] < 6000) & (adata.obs[‘sample’] == ‘S2’)
keep_S3 = (adata.obs[‘n_genes_by_counts’] < 4000) & (adata.obs[‘sample’] == ‘S3’)

keep both sets of cells

keep = (keep_S1 ) | (keep_S2 ) | (keep_S3 )
adata = adata[keep, :]

print(“Remaining cells %d”%adata.n_obs)

Does that seem correct?

Thank you

I think it’s a little strange to combine the data, then filter them separately, but otherwise I think this is doing what you want.

I would also suggest doing:

adata = sc.concat([S1, S2, S3], keys=["S1", "S2", "S3"], label = "sample")

Instead of S{n}.obs["sample"] = "S{n}" and S1.concatenate.

Is there a reason you think this code may be incorrect?

Thank you for replying. I am just relatively new to scanpy and I am still trying to understand how to handle the anndata object.
I tried to combine them first because when doing the QC and violin plots I could do it only on one object. Is it strange because you think I should use the same filtering criteria for all the samples?
Is the code that you included for concatenating the data just faster than what I did or does it produce different results?

I don’t think there’s anything wrong with what you did, and I see how it would make some plotting easier. I think it’s mostly that the plots I typically make for qc don’t facet by sample well.

Is the code that you included for concatenating the data just faster than what I did or does it produce different results?

It’s a little faster, but more importantly AnnData.concatenate will be deprecated in favor of the concat function.

There are slight differences, IIRC mostly when you use join="outer".

When I did it like this:

adata = sc.concat([S1, S2, S3], keys=[“S1”, “S2”, “S3”], label = “sample”)

I actually lost all the .var information.

Ah that’s right, to determine which var things to keep you’ll need to specify the merge argument. Probably merge="same" or merge="unique" here.

See also:

1 Like