Unexpected chain pairing status while converting AirrCells to AnnData

Hi,

I encounter an unexpected behavior while converting a AirrCells object to AnnData.

However, some cells seem to be lost when converting airr_cells_new to adata_new at the end.

# Convert to AnnData
adata_new = ir.io.from_airr_cells(airr_cells_new)
# Get updated chain_pairing status, with no filtering
ir.pp.index_chains(adata_new, filter=(lambda x: True, lambda x: True))
ir.tl.chain_qc(adata_new)

## Counting chains in AirrCell objects:
pd.DataFrame([len(cell.chains) for cell in airr_cells_new]).value_counts()

> 2    3433
> 1     146
> Name: count, dtype: int64

## Chain pairing status of the AnnData object:
adata_new.obs['chain_pairing'].value_counts()

> chain_pairing
> single pair    3432
> orphan VJ       133
> orphan VDJ       14
> Name: count, dtype: int64

Which means that there is one cell with 2 chains which is converted to either “orphan VJ” or “orphan VDJ” .

Am I missing something?


More context:

The AirrCells object is defined by hand. Here is the core run to define airr_cells_new.
The script’s purpose is to manually remove secondary chains from an AnnData object.

airr_cells = ir.io.to_airr_cells(adata)

# Initiate empty AirrCell object to populate
airr_cells_new = []

# Convert additional chains into new cells
for ind, cell in enumerate(airr_cells):
    if len(cell.chains) <= 2:
        new_cell = cell
    elif len(cell.chains) > 2:
        ## Check that there are at most 4 chains, in concordance with the dual IR model
        assert len(cell.chains) <= 4
        
        chain_indices = adata.obsm['chain_indices'][ind]
        new_cell = ir.io.AirrCell(cell_id=f'{cell.cell_id}_prim')
        
        ## Select primary VJ chain if it exists
        if chain_indices.tolist()['VJ'][0] is not None:
            prim_vj_chain = cell.chains[chain_indices.tolist()['VJ'][0]]
            ## Add it
            new_cell.add_chain(prim_vj_chain)
        
        ## Select primary VDJ chain if it exists
        if chain_indices.tolist()['VDJ'][0] is not None:
            prim_vdj_chain = cell.chains[chain_indices.tolist()['VDJ'][0]]
            ## Add it
            new_cell.add_chain(prim_vdj_chain)
        
    # Add new cell to list
    airr_cells_new.append(new_cell)

Yes, if there were two VJ or two VDJ chains, it would still be orphan. A “singe pair” is always VJ+VDJ. Does that make sense?

I agree, and the case of two VJ and VDJ chains can happen.
However, because of how airr_cells_new is computed, this is not the case for any cell:

for cell_new in airr_cells_new:
        assert set([chain['locus'] for chain in new_cell.chains]) == {"TRA", "TRB"}

Hi, sorry for the late reply.

Given that you filter with filter=(lambda x: True, lambda x: True) and

assert set([chain['locus'] for chain in new_cell.chains]) == {"TRA", "TRB"}

for all cells, I don’t immediately see why this would happen.

Can you share with me the AirrCell objects (e.g. using pickle.dump()) of the offending cells that I can investigate? Alternatively the h5ad file. If you can’t share the data publicly, you can PN me via discourse or find my email on my GitHub profile.

Hi,
In turn, sorry for my late reply.

This was actually due to a mistake from my part. In the above code, the line:

creates a cell with a different cell_id, while for len(cell.chains) <= 2 the new_cell object is defined with the same cell_id.

I removed this and it all works as it should. Thanks.

1 Like