Prepare data

Prepare anndata for training the model

utils.get_training_data(remove_clusters=None, cells_per_cluster=100, cluster_column='clusters', min_shared_counts=10, n_var_genes=2000)

Reduces and anndata object to the most relevant cells and genes for understanding the differentiation trajectories in the data.

Parameters:

adata – AnnData Object
remove_clusters – Names of clusters to be removed
cells_per_cluster – How many cells to keep per cluster. For Louvain clustering with resolution = 1, keeping more than 300 cells per cluster does not provide much extra information.
cluster_column – Name of the column in adata.obs that contains cluster names
min_shared_counts – Minimum number of spliced+unspliced counts across all cells for a gene to be retained
n_var_genes – Number of top variable genes to retain

Returns:

AnnData object reduced to the most informative cells and genes

Return type:

AnnData