In my dataset, I have two distinct groups of cells. However, during the analysis, I noticed an unexpected observation where three cells from group 1 were found to cluster with group 2, and two cells from group 2 were located within group 1. To ensure the accuracy and integrity of the analysis, I would like to know the best approach to remove these five cells from the dataset. What methods or criteria can I use to confidently exclude these cells while preserving the integrity of the overall dataset?
That’s an interesting question!
There are certainly ways that you can exclude specific cells from your analysis using the custom cell sets in the Data Exploration module and then subsetting the clusters you want to keep into a new analysis - see our user guide for more details on how to do this.
For example, you could ‘hide’ one of the Louvain clusters and then use the lasso tool to select a specific group of cells on the UMAP embedding from the remaining cluster. And then repeat when the other cluster is hidden. This is likely the simplest way to exclude the 5 cells. You can then select your two new custom cell sets to ‘subset’ into a new analysis so that they are completed excluded.
However, a bigger question here is should you do exclude these cells? Personally, I would advise against excluding cells from an analysis unless you have a compelling reason to do so. If these cells have passed QC (i.e. they are good quality, not empty droplets or dead or doublets) then they might actually be telling you something interesting about the biology of your dataset!
Hope this helps!