scib_metrics.benchmark.Benchmarker#

class scib_metrics.benchmark.Benchmarker(adata, batch_key, label_key, embedding_obsm_keys, bio_conservation_metrics=None, batch_correction_metrics=None, pre_integrated_embedding_obsm_key=None, n_jobs=1)[source]#

Benchmarking pipeline for the single-cell integration task.

Parameters:
  • adata (AnnData) – AnnData object containing the raw count data and integrated embeddings as obsm keys.

  • batch_key (str) – Key in adata.obs that contains the batch information.

  • label_key (str) – Key in adata.obs that contains the cell type labels.

  • embedding_obsm_keys (list[str]) – List of obsm keys that contain the embeddings to be benchmarked.

  • bio_conservation_metrics (Optional[BioConservation] (default: None)) – Specification of which bio conservation metrics to run in the pipeline.

  • batch_correction_metrics (Optional[BatchCorrection] (default: None)) – Specification of which batch correction metrics to run in the pipeline.

  • pre_integrated_embedding_obsm_key (Optional[str] (default: None)) – Obsm key containing a non-integrated embedding of the data. If None, the embedding will be computed in the prepare step. See the notes below for more information.

  • n_jobs (int (default: 1)) – Number of jobs to use for parallelization of neighbor search.

Notes

adata.X should contain a form of the data that is not integrated, but is normalized. The prepare method will use adata.X for PCA via pca(), which also only uses features masked via adata.var['highly_variable'].

See further usage examples in the following tutorial:

  1. Benchmarking lung integration

Methods table#

benchmark()

Run the pipeline.

get_results([min_max_scale, clean_names])

Return the benchmarking results.

plot_results_table([min_max_scale, show, ...])

Plot the benchmarking results.

prepare([neighbor_computer])

Prepare the data for benchmarking.

Methods#

Benchmarker.benchmark()[source]#

Run the pipeline.

Return type:

None

Benchmarker.get_results(min_max_scale=True, clean_names=True)[source]#

Return the benchmarking results.

Parameters:
  • min_max_scale (bool (default: True)) – Whether to min max scale the results.

  • clean_names (bool (default: True)) – Whether to clean the metric names.

Return type:

DataFrame

Returns:

The benchmarking results.

Benchmarker.plot_results_table(min_max_scale=True, show=True, save_dir=None)[source]#

Plot the benchmarking results.

Parameters:
  • min_max_scale (bool (default: True)) – Whether to min max scale the results.

  • show (bool (default: True)) – Whether to show the plot.

  • save_dir (Optional[str] (default: None)) – The directory to save the plot to. If None, the plot is not saved.

Return type:

Table

Benchmarker.prepare(neighbor_computer=None)[source]#

Prepare the data for benchmarking.

Parameters:

neighbor_computer (Optional[Callable[[ndarray, int], NeighborsResults]] (default: None)) – Function that computes the neighbors of the data. If None, the neighbors will be computed with pynndescent(). The function should take as input the data and the number of neighbors to compute and return a NeighborsResults object.

Return type:

None