Posterior probability of being assigned to a specific label

Hi, many thanks for your great tool!

I tried using the seed labeling on the Tusi 2018 data and their marker genes from Supp table 1. I followed strictly your tutorial (https://docs.scvi-tools.org/en/stable/user_guide/notebooks/seed_labeling.html) and the first results look promising. However, I would like to get the posterior probability score which you used in your original publication (e.g. Figure 6 D).

On that note, is there a implemented way to dismiss labels if they have no support based on the trained model?

Is it possible to extract it from the model? I tried dir(scvi.model._scanvi.SCANVI) but could not find.

Many thanks,
Florian

So I think you just want to add soft=True to the predict method of SCANVI. However, we do have a bug in this where it’s not correctly outputting a dataframe.

1 Like

Hello Adam, many thanks for your reply! So I added the following lines:

y_pred = scanvi_model.predict(adata, soft=True)
pred = pd.DataFrame(data=y_pred[0:,0:])
pred_score = pred.max(axis=1).to_numpy()

I assume that pred_score is now the maximum score across all labels for each cell. Which should correspond to the label assigned to that cell.

Background: I annotated progenitor cells with SingleR and use the top 10 SingleR labels as seed labels with scanvi. That is the comparison of the scvi vs SingleR score.

Your process is correct. We just released the new version which fixes the issue with the soft prediction, so I recommend updating to it.

1 Like

Great, many thanks for your tools and dedication!