scib_metrics.benchmark.Benchmarker#
- class scib_metrics.benchmark.Benchmarker(adata, batch_key, label_key, embedding_obsm_keys, bio_conservation_metrics=BioConservation(isolated_labels=True, nmi_ari_cluster_labels_leiden=False, nmi_ari_cluster_labels_kmeans=True, silhouette_label=True, clisi_knn=True), batch_correction_metrics=BatchCorrection(bras=True, ilisi_knn=True, kbet_per_label=True, graph_connectivity=True, pcr_comparison=True), pre_integrated_embedding_obsm_key=None, n_jobs=1, progress_bar=True, solver='arpack')[source]#
Benchmarking pipeline for the single-cell integration task.
- Parameters:
adata (
AnnData) – AnnData object containing the raw count data and integrated embeddings as obsm keys.batch_key (
str) – Key inadata.obsthat contains the batch information.label_key (
str) – Key inadata.obsthat contains the cell type labels.embedding_obsm_keys (
list[str]) – List of obsm keys that contain the embeddings to be benchmarked.bio_conservation_metrics (
BioConservation|None(default:BioConservation(isolated_labels=True, nmi_ari_cluster_labels_leiden=False, nmi_ari_cluster_labels_kmeans=True, silhouette_label=True, clisi_knn=True))) – Specification of which bio conservation metrics to run in the pipeline.batch_correction_metrics (
BatchCorrection|None(default:BatchCorrection(bras=True, ilisi_knn=True, kbet_per_label=True, graph_connectivity=True, pcr_comparison=True))) – Specification of which batch correction metrics to run in the pipeline.pre_integrated_embedding_obsm_key (
str|None(default:None)) – Obsm key containing a non-integrated embedding of the data. IfNone, the embedding will be computed in the prepare step. See the notes below for more information.n_jobs (
int(default:1)) – Number of jobs to use for parallelization of neighbor search.progress_bar (
bool(default:True)) – Whether to show a progress bar forprepare()andbenchmark().solver (
str(default:'arpack')) – SVD solver to use during PCA. can help stability issues. Choose from: “arpack”, “randomized” or “auto”
Notes
adata.Xshould contain a form of the data that is not integrated, but is normalized. Thepreparemethod will useadata.Xfor PCA viapca(), which also only uses features masked viaadata.var['highly_variable'].See further usage examples in the following tutorial:
Methods table#
Run the pipeline. |
|
|
Return the benchmarking results. |
|
Plot the benchmarking results. |
|
Prepare the data for benchmarking. |
Methods#
- Benchmarker.get_results(min_max_scale=False, clean_names=True)[source]#
Return the benchmarking results.
- Benchmarker.plot_results_table(min_max_scale=False, show=True, save_dir=None)[source]#
Plot the benchmarking results.
- Benchmarker.prepare(neighbor_computer=None)[source]#
Prepare the data for benchmarking.
- Parameters:
neighbor_computer (
Callable[[ndarray,int],NeighborsResults] |None(default:None)) – Function that computes the neighbors of the data. IfNone, the neighbors will be computed withpynndescent(). The function should take as input the data and the number of neighbors to compute and return aNeighborsResultsobject.- Return type: