How to filter concatenated anndata object?

st4302 · March 14, 2024, 6:15pm

Hello,

I concatenated several samples in one anndata object as shown here:
S1.obs[‘sample’]=“S1”

S2.obs[‘sample’]=“S2”

S3.obs[‘sample’]=“S3”

merge into one object.

adata = S1.concatenate(S2, S3,)

Then I did the QC steps and the violin plots in order to decide how to filter the object. Since I have 3 samples I am using different parameters for each. Does the below look correct?

keep_S1 = (adata.obs[‘n_genes_by_counts’] < 5000) & (adata.obs[‘sample’] == ‘S1’)
keep_S2 = (adata.obs[‘n_genes_by_counts’] < 6000) & (adata.obs[‘sample’] == ‘S2’)
keep_S3 = (adata.obs[‘n_genes_by_counts’] < 4000) & (adata.obs[‘sample’] == ‘S3’)

keep both sets of cells

keep = (keep_S1 ) | (keep_S2 ) | (keep_S3 )
adata = adata[keep, :]

print(“Remaining cells %d”%adata.n_obs)

Does that seem correct?

Thank you

ivirshup · March 15, 2024, 1:20pm

I think it’s a little strange to combine the data, then filter them separately, but otherwise I think this is doing what you want.

I would also suggest doing:

adata = sc.concat([S1, S2, S3], keys=["S1", "S2", "S3"], label = "sample")

Instead of S{n}.obs["sample"] = "S{n}" and S1.concatenate.

Is there a reason you think this code may be incorrect?

st4302 · March 15, 2024, 1:44pm

Thank you for replying. I am just relatively new to scanpy and I am still trying to understand how to handle the anndata object.
I tried to combine them first because when doing the QC and violin plots I could do it only on one object. Is it strange because you think I should use the same filtering criteria for all the samples?
Is the code that you included for concatenating the data just faster than what I did or does it produce different results?

ivirshup · March 15, 2024, 2:06pm

I don’t think there’s anything wrong with what you did, and I see how it would make some plotting easier. I think it’s mostly that the plots I typically make for qc don’t facet by sample well.

Is the code that you included for concatenating the data just faster than what I did or does it produce different results?

It’s a little faster, but more importantly AnnData.concatenate will be deprecated in favor of the concat function.

There are slight differences, IIRC mostly when you use join="outer".

st4302 · March 15, 2024, 6:00pm

When I did it like this:

adata = sc.concat([S1, S2, S3], keys=[“S1”, “S2”, “S3”], label = “sample”)

I actually lost all the .var information.

ivirshup · March 18, 2024, 1:25pm

Ah that’s right, to determine which var things to keep you’ll need to specify the merge argument. Probably merge="same" or merge="unique" here.

Topic		Replies	Views
How to concatenate anndata properly? anndata scrna-seq , integration , scvi	2	4933	November 3, 2022
Concatenate anndata with merged rows anndata	0	332	August 5, 2022
Help with concat anndata	1	472	August 17, 2022
How to concatenate spatial AnnData objects squidpy	4	632	August 15, 2023
How to add metadata from csv file to concatenated anndata object? anndata	0	665	June 15, 2023

How to filter concatenated anndata object?

merge into one object.

keep both sets of cells

Related Topics