dynamo.pp.filter_genes

dynamo.pp.filter_genes(adata, filter_bool=None, layer='all', min_cell_s=1, min_cell_u=1, min_cell_p=1, min_avg_exp_s=1e-10, min_avg_exp_u=0, min_avg_exp_p=0, max_avg_exp=inf, min_count_s=0, min_count_u=0, min_count_p=0, shared_count=30, inplace=False)

Basic filter of genes based a collection of expression filters.

Parameters
  • adata (AnnData) – AnnData object.

  • filter_bool (ndarray (default: None)) – A boolean array from the user to select genes for downstream analysis.

  • layer (str (default: X)) – The data from a particular layer (include X) used for feature selection.

  • min_cell_s (int (default: 5)) – Minimal number of cells with expression for the data in the spliced layer (also used for X).

  • min_cell_u (int (default: 5)) – Minimal number of cells with expression for the data in the unspliced layer.

  • min_cell_p (int (default: 5)) – Minimal number of cells with expression for the data in the protein layer.

  • min_avg_exp_s (float (default: 1e-2)) – Minimal average expression across cells for the data in the spliced layer (also used for X).

  • min_avg_exp_u (float (default: 1e-4)) – Minimal average expression across cells for the data in the unspliced layer.

  • min_avg_exp_p (float (default: 1e-4)) – Minimal average expression across cells for the data in the protein layer.

  • max_avg_exp (float (default: 100.)) – Maximal average expression across cells for the data in all layers (also used for X).

  • min_count_s (int (default: 5)) – Minimal number of counts (UMI/expression) for the data in the spliced layer (also used for X).

  • min_count_u (int (default: 5)) – Minimal number of counts (UMI/expression) for the data in the unspliced layer.

  • min_count_p (int (default: 5)) – Minimal number of counts (UMI/expression) for the data in the protein layer.

  • shared_count (int (default: 30)) – The minimal shared number of counts for each genes across cell between layers.

Returns

adata – An updated AnnData object with use_for_pca as a new column in .var attributes to indicate the selection of genes for downstream analysis. adata will be subsetted with only the genes pass filter if keep_unflitered is set to be False.

Return type

AnnData