Data preparation APIs

Dataset wrapper classes provide functionality for adding in-memory or local data objects to datasets when rendering Vitessce as a Jupyter widget.

We provide default wrapper class implementations for data formats used by popular single-cell and imaging packages.

To write your own custom wrapper class, create a subclass of the AbstractWrapper class, implementing the getter functions for the data types that can be derived from your object.

vitessce.wrappers

class vitessce.wrappers.AbstractWrapper(**kwargs)[source]

An abstract class that can be extended when implementing custom dataset object wrapper classes.

Abstract constructor to be inherited by dataset wrapper classes.

Parameters
  • out_dir (str) – The path to a local directory used for data processing outputs. By default, uses a temp. directory.

  • request_init (dict) – options to be passed along with every fetch request from the browser, like { “header”: { “Authorization”: “Bearer dsfjalsdfa1431” } }

auto_view_config(vc)[source]

Auto view configuration is intended to be used internally by the VitessceConfig.from_object method. Each subclass of AbstractWrapper may implement this method which takes in a VitessceConfig instance and modifies it by adding datasets, visualization components, and view coordinations. Implementations of this method may create an opinionated view config based on inferred use cases.

Parameters

vc (VitessceConfig) – The view config instance.

convert_and_save(dataset_uid, obj_i, base_dir=None)[source]

Fill in the file_def_creators array. Each function added to this list should take in a base URL and generate a Vitessce file definition. If this wrapper is wrapping local data, then create routes and fill in the routes array. This method is void, should not return anything.

Parameters
  • dataset_uid (str) – A unique identifier for this dataset.

  • obj_i (int) – Within the dataset, the index of this data wrapper object.

get_file_defs(base_url)[source]

Obtain the file definitions for this wrapper class.

Parameters

base_url (str) – A base URL to prepend to relative URLs.

Returns

A list of file definitions.

Return type

list[dict]

get_local_dir_route(dataset_uid, obj_i, local_dir_path, local_dir_uid)[source]

Obtain the Mount for some local directory

Parameters
  • dataset_uid (str) – A dataset unique identifier for the Mount

  • obj_i (str) – A index of the current vitessce.wrappers.AbstractWrapper among all other wrappers in the view config

  • local_dir_path (str) – The path to the local directory to serve.

  • local_dir_uid (str) – The UID to include as the route path suffix.

Returns

A starlette Mount of the the local_dir_path

Return type

list[starlette.routing.Mount]

get_out_dir_route(dataset_uid, obj_i)[source]

Obtain the Mount for the out_dir

Parameters
  • dataset_uid (str) – A dataset unique identifier for the Mount

  • obj_i (str) – A index of the current vitessce.wrappers.AbstractWrapper among all other wrappers in the view config

Returns

A starlette Mount of the the out_dir

Return type

list[starlette.routing.Mount]

get_routes()[source]

Obtain the routes that have been created for this wrapper class.

Returns

A list of server routes.

Return type

list[starlette.routing.Route]

get_stores(base_url)[source]

Obtain the stores that have been created for this wrapper class.

Returns

A dictionary that maps file URLs to Zarr Store objects.

Return type

dict[str, zarr.Store]

class vitessce.wrappers.AnnDataWrapper(adata_path=None, adata_url=None, adata_store=None, obs_feature_matrix_path=None, feature_filter_path=None, initial_feature_filter_path=None, obs_set_paths=None, obs_set_names=None, obs_locations_path=None, obs_segmentations_path=None, obs_embedding_paths=None, obs_embedding_names=None, obs_embedding_dims=None, obs_spots_path=None, obs_points_path=None, feature_labels_path=None, obs_labels_path=None, convert_to_dense=True, coordination_values=None, obs_labels_paths=None, obs_labels_names=None, **kwargs)[source]

Wrap an AnnData object by creating an instance of the AnnDataWrapper class.

Parameters
  • adata_path (str) – A path to an AnnData object written to a Zarr store containing single-cell experiment data.

  • adata_url (str) – A remote url pointing to a zarr-backed AnnData store.

  • adata_store (str or zarr.Storage) – A path to pass to zarr.DirectoryStore, or an existing store instance.

  • obs_feature_matrix_path (str) – Location of the expression (cell x gene) matrix, like X or obsm/highly_variable_genes_subset

  • feature_filter_path (str) – A string like var/highly_variable used in conjunction with obs_feature_matrix_path if obs_feature_matrix_path points to a subset of X of the full var list.

  • initial_feature_filter_path (str) – A string like var/highly_variable used in conjunction with obs_feature_matrix_path if obs_feature_matrix_path points to a subset of X of the full var list.

  • obs_set_paths (list[str]) – Column names like [‘obs/louvain’, ‘obs/cellType’] for showing cell sets

  • obs_set_names (list[str]) – Names to display in place of those in obs_set_paths, like [‘Louvain’, ‘Cell Type’]

  • obs_locations_path (str) – Column name in obsm that contains centroid coordinates for displaying centroids in the spatial viewer

  • obs_segmentations_path (str) – Column name in obsm that contains polygonal coordinates for displaying outlines in the spatial viewer

  • obs_embedding_paths (list[str]) – Column names like [‘obsm/X_umap’, ‘obsm/X_pca’] for showing scatterplots

  • obs_embedding_names (list[str]) – Overriding names like [‘UMAP’, ‘PCA’] for displaying above scatterplots

  • obs_embedding_dims (list[str]) – Dimensions along which to get data for the scatterplot, like [[0, 1], [4, 5]] where [0, 1] is just the normal x and y but [4, 5] could be comparing the third and fourth principal components, for example.

  • obs_spots_path (str) – Column name in obsm that contains centroid coordinates for displaying spots in the spatial viewer

  • obs_points_path (str) – Column name in obsm that contains centroid coordinates for displaying points in the spatial viewer

  • feature_labels_path (str) – The name of a column containing feature labels (e.g., alternate gene symbols), instead of the default index in var of the AnnData store.

  • obs_labels_path (str) – (DEPRECATED) The name of a column containing observation labels (e.g., alternate cell IDs), instead of the default index in obs of the AnnData store. Use obs_labels_paths and obs_labels_names instead. This arg will be removed in a future release.

  • obs_labels_paths (list[str]) – The names of columns containing observation labels (e.g., alternate cell IDs), instead of the default index in obs of the AnnData store.

  • obs_labels_names (list[str]) – The optional display names of columns containing observation labels (e.g., alternate cell IDs), instead of the default index in obs of the AnnData store.

  • convert_to_dense (bool) – Whether or not to convert X to dense the zarr store (dense is faster but takes more disk space).

  • coordination_values (dict or None) – Coordination values for the file definition.

  • **kwargs – Keyword arguments inherited from AbstractWrapper

auto_view_config(vc)[source]

Auto view configuration is intended to be used internally by the VitessceConfig.from_object method. Each subclass of AbstractWrapper may implement this method which takes in a VitessceConfig instance and modifies it by adding datasets, visualization components, and view coordinations. Implementations of this method may create an opinionated view config based on inferred use cases.

Parameters

vc (VitessceConfig) – The view config instance.

convert_and_save(dataset_uid, obj_i, base_dir=None)[source]

Fill in the file_def_creators array. Each function added to this list should take in a base URL and generate a Vitessce file definition. If this wrapper is wrapping local data, then create routes and fill in the routes array. This method is void, should not return anything.

Parameters
  • dataset_uid (str) – A unique identifier for this dataset.

  • obj_i (int) – Within the dataset, the index of this data wrapper object.

class vitessce.wrappers.CsvWrapper(csv_path=None, csv_url=None, data_type=None, options=None, coordination_values=None, **kwargs)[source]

Wrap a CSV file by creating an instance of the CsvWrapper class.

Parameters
  • data_type (str) – The data type of the information contained in the file.

  • csv_path (str) – A local filepath to a CSV file.

  • csv_url (str) – A remote URL of a CSV file.

  • options (dict) – The file options.

  • coordination_values (dict) – The coordination values.

  • **kwargs – Keyword arguments inherited from AbstractWrapper

Abstract constructor to be inherited by dataset wrapper classes.

Parameters
  • out_dir (str) – The path to a local directory used for data processing outputs. By default, uses a temp. directory.

  • request_init (dict) – options to be passed along with every fetch request from the browser, like { “header”: { “Authorization”: “Bearer dsfjalsdfa1431” } }

convert_and_save(dataset_uid, obj_i, base_dir=None)[source]

Fill in the file_def_creators array. Each function added to this list should take in a base URL and generate a Vitessce file definition. If this wrapper is wrapping local data, then create routes and fill in the routes array. This method is void, should not return anything.

Parameters
  • dataset_uid (str) – A unique identifier for this dataset.

  • obj_i (int) – Within the dataset, the index of this data wrapper object.

class vitessce.wrappers.ImageOmeTiffWrapper(img_path=None, offsets_path=None, img_url=None, offsets_url=None, coordinate_transformations=None, coordination_values=None, **kwargs)[source]

Wrap an OME-TIFF File by creating an instance of the ImageOmeTiffWrapper class. Intended to be used with the spatialBeta and layerControllerBeta views.

Parameters

Abstract constructor to be inherited by dataset wrapper classes.

Parameters
  • out_dir (str) – The path to a local directory used for data processing outputs. By default, uses a temp. directory.

  • request_init (dict) – options to be passed along with every fetch request from the browser, like { “header”: { “Authorization”: “Bearer dsfjalsdfa1431” } }

convert_and_save(dataset_uid, obj_i, base_dir=None)[source]

Fill in the file_def_creators array. Each function added to this list should take in a base URL and generate a Vitessce file definition. If this wrapper is wrapping local data, then create routes and fill in the routes array. This method is void, should not return anything.

Parameters
  • dataset_uid (str) – A unique identifier for this dataset.

  • obj_i (int) – Within the dataset, the index of this data wrapper object.

class vitessce.wrappers.ImageOmeZarrWrapper(img_path=None, img_url=None, coordinate_transformations=None, coordination_values=None, **kwargs)[source]

Wrap an OME-NGFF Zarr store by creating an instance of the ImageOmeZarrWrapper class. Intended to be used with the spatialBeta and layerControllerBeta views.

Parameters

Abstract constructor to be inherited by dataset wrapper classes.

Parameters
  • out_dir (str) – The path to a local directory used for data processing outputs. By default, uses a temp. directory.

  • request_init (dict) – options to be passed along with every fetch request from the browser, like { “header”: { “Authorization”: “Bearer dsfjalsdfa1431” } }

convert_and_save(dataset_uid, obj_i, base_dir=None)[source]

Fill in the file_def_creators array. Each function added to this list should take in a base URL and generate a Vitessce file definition. If this wrapper is wrapping local data, then create routes and fill in the routes array. This method is void, should not return anything.

Parameters
  • dataset_uid (str) – A unique identifier for this dataset.

  • obj_i (int) – Within the dataset, the index of this data wrapper object.

class vitessce.wrappers.MultiImageWrapper(image_wrappers, use_physical_size_scaling=False, **kwargs)[source]

Wrap multiple imaging datasets by creating an instance of the MultiImageWrapper class.

Parameters

Abstract constructor to be inherited by dataset wrapper classes.

Parameters
  • out_dir (str) – The path to a local directory used for data processing outputs. By default, uses a temp. directory.

  • request_init (dict) – options to be passed along with every fetch request from the browser, like { “header”: { “Authorization”: “Bearer dsfjalsdfa1431” } }

convert_and_save(dataset_uid, obj_i, base_dir=None)[source]

Fill in the file_def_creators array. Each function added to this list should take in a base URL and generate a Vitessce file definition. If this wrapper is wrapping local data, then create routes and fill in the routes array. This method is void, should not return anything.

Parameters
  • dataset_uid (str) – A unique identifier for this dataset.

  • obj_i (int) – Within the dataset, the index of this data wrapper object.

class vitessce.wrappers.MultivecZarrWrapper(zarr_path=None, zarr_url=None, **kwargs)[source]

Abstract constructor to be inherited by dataset wrapper classes.

Parameters
  • out_dir (str) – The path to a local directory used for data processing outputs. By default, uses a temp. directory.

  • request_init (dict) – options to be passed along with every fetch request from the browser, like { “header”: { “Authorization”: “Bearer dsfjalsdfa1431” } }

convert_and_save(dataset_uid, obj_i, base_dir=None)[source]

Fill in the file_def_creators array. Each function added to this list should take in a base URL and generate a Vitessce file definition. If this wrapper is wrapping local data, then create routes and fill in the routes array. This method is void, should not return anything.

Parameters
  • dataset_uid (str) – A unique identifier for this dataset.

  • obj_i (int) – Within the dataset, the index of this data wrapper object.

class vitessce.wrappers.ObsSegmentationsOmeTiffWrapper(img_path=None, offsets_path=None, img_url=None, offsets_url=None, coordinate_transformations=None, obs_types_from_channel_names=None, coordination_values=None, **kwargs)[source]

Wrap an OME-TIFF File by creating an instance of the ObsSegmentationsOmeTiffWrapper class. Intended to be used with the spatialBeta and layerControllerBeta views.

Parameters
  • img_path (str) – A local filepath to an OME-TIFF file.

  • offsets_path (str) – A local filepath to an offsets.json file.

  • img_url (str) – A remote URL of an OME-TIFF file.

  • offsets_url (str) – A remote URL of an offsets.json file.

  • coordinate_transformations (list) – A column-major ordered matrix for transforming this image (see http://www.opengl-tutorial.org/beginners-tutorials/tutorial-3-matrices/#homogeneous-coordinates for more information).

  • obs_types_from_channel_names (bool) – Whether to use the channel names to determine the obs types. Optional.

  • coordination_values (dict) – Optional coordinationValues to be passed in the file definition.

  • **kwargs – Keyword arguments inherited from AbstractWrapper

Abstract constructor to be inherited by dataset wrapper classes.

Parameters
  • out_dir (str) – The path to a local directory used for data processing outputs. By default, uses a temp. directory.

  • request_init (dict) – options to be passed along with every fetch request from the browser, like { “header”: { “Authorization”: “Bearer dsfjalsdfa1431” } }

convert_and_save(dataset_uid, obj_i, base_dir=None)[source]

Fill in the file_def_creators array. Each function added to this list should take in a base URL and generate a Vitessce file definition. If this wrapper is wrapping local data, then create routes and fill in the routes array. This method is void, should not return anything.

Parameters
  • dataset_uid (str) – A unique identifier for this dataset.

  • obj_i (int) – Within the dataset, the index of this data wrapper object.

class vitessce.wrappers.ObsSegmentationsOmeZarrWrapper(img_path=None, img_url=None, coordinate_transformations=None, coordination_values=None, obs_types_from_channel_names=None, **kwargs)[source]

Wrap an OME-NGFF Zarr store by creating an instance of the ObsSegmentationsOmeZarrWrapper class. Intended to be used with the spatialBeta and layerControllerBeta views.

Parameters

Abstract constructor to be inherited by dataset wrapper classes.

Parameters
  • out_dir (str) – The path to a local directory used for data processing outputs. By default, uses a temp. directory.

  • request_init (dict) – options to be passed along with every fetch request from the browser, like { “header”: { “Authorization”: “Bearer dsfjalsdfa1431” } }

convert_and_save(dataset_uid, obj_i, base_dir=None)[source]

Fill in the file_def_creators array. Each function added to this list should take in a base URL and generate a Vitessce file definition. If this wrapper is wrapping local data, then create routes and fill in the routes array. This method is void, should not return anything.

Parameters
  • dataset_uid (str) – A unique identifier for this dataset.

  • obj_i (int) – Within the dataset, the index of this data wrapper object.

class vitessce.wrappers.OmeTiffWrapper(img_path=None, offsets_path=None, img_url=None, offsets_url=None, name='', transformation_matrix=None, is_bitmask=False, **kwargs)[source]

Wrap an OME-TIFF File by creating an instance of the OmeTiffWrapper class.

Parameters

Abstract constructor to be inherited by dataset wrapper classes.

Parameters
  • out_dir (str) – The path to a local directory used for data processing outputs. By default, uses a temp. directory.

  • request_init (dict) – options to be passed along with every fetch request from the browser, like { “header”: { “Authorization”: “Bearer dsfjalsdfa1431” } }

convert_and_save(dataset_uid, obj_i, base_dir=None)[source]

Fill in the file_def_creators array. Each function added to this list should take in a base URL and generate a Vitessce file definition. If this wrapper is wrapping local data, then create routes and fill in the routes array. This method is void, should not return anything.

Parameters
  • dataset_uid (str) – A unique identifier for this dataset.

  • obj_i (int) – Within the dataset, the index of this data wrapper object.

class vitessce.wrappers.OmeZarrWrapper(img_path=None, img_url=None, name='', is_bitmask=False, **kwargs)[source]

Wrap an OME-NGFF Zarr store by creating an instance of the OmeZarrWrapper class.

Parameters
  • img_path (str) – A local filepath to an OME-NGFF Zarr store.

  • img_url (str) – A remote URL of an OME-NGFF Zarr store.

  • **kwargs – Keyword arguments inherited from AbstractWrapper

Abstract constructor to be inherited by dataset wrapper classes.

Parameters
  • out_dir (str) – The path to a local directory used for data processing outputs. By default, uses a temp. directory.

  • request_init (dict) – options to be passed along with every fetch request from the browser, like { “header”: { “Authorization”: “Bearer dsfjalsdfa1431” } }

convert_and_save(dataset_uid, obj_i, base_dir=None)[source]

Fill in the file_def_creators array. Each function added to this list should take in a base URL and generate a Vitessce file definition. If this wrapper is wrapping local data, then create routes and fill in the routes array. This method is void, should not return anything.

Parameters
  • dataset_uid (str) – A unique identifier for this dataset.

  • obj_i (int) – Within the dataset, the index of this data wrapper object.

vitessce.export

vitessce.export.export_to_files(config, base_url, out_dir='.')[source]
Parameters
  • config (VitessceConfig) – The Vitessce view config to export to files.

  • out_dir (str) – The path to the output directory. By default, the current directory.

  • base_url (str) – The URL on which the files will be served.

Returns

The config as a dict, with urls filled in.

Return type

dict

vitessce.export.export_to_s3(config, s3, bucket_name, prefix='')[source]
Parameters
  • config (VitessceConfig) – The Vitessce view config to export to S3.

  • s3 (boto3.resource) – A boto3 S3 resource object with permission to upload to the specified bucket.

  • bucket_name (str) – The name of the bucket to which to upload.

  • prefix (str) – The prefix path for the bucket keys (think subdirectory).

Returns

The config as a dict, with S3 urls filled in.

Return type

dict

vitessce.data_utils

vitessce.data_utils.ome.multiplex_img_to_ome_tiff(img_arr, channel_names, output_path, axes='CYX')[source]

Convert a multiplexed image to OME-TIFF.

Parameters
  • img_arr (np.array) – The image as a 3D, 4D, or 5D array.

  • channel_names (list[str]) – A list of channel names to include in the omero.channels[].label NGFF metadata field.

  • output_path (str) – The path to save the Zarr store.

  • axes (str) – The array axis ordering. By default, “CYX”

vitessce.data_utils.ome.multiplex_img_to_ome_zarr(img_arr, channel_names, output_path, img_name='Image', chunks=(1, 256, 256), axes='cyx', channel_colors=None)[source]

Convert a multiplexed image to OME-Zarr v0.3.

Parameters
  • img_arr (np.array) – The image as a 3D, 4D, or 5D array.

  • channel_names (list[str]) – A list of channel names to include in the omero.channels[].label NGFF metadata field.

  • output_path (str) – The path to save the Zarr store.

  • img_name (str) – The name of the image to include in the omero.name NGFF metadata field.

  • chunks (tuple[int]) – The chunk sizes of each axis. By default, (1, 256, 256).

  • axes (str) – The array axis ordering. By default, “cyx”

  • channel_colors (dict or None) – Dict mapping channel names to color strings to use for the omero.channels[].color NGFF metadata field. If provided, keys should match channel_names. By default, None to use “FFFFFF” for all channels.

vitessce.data_utils.ome.rgb_img_to_ome_tiff(img_arr, output_path, img_name='Image', axes='CYX')[source]

Convert an RGB image to OME-TIFF.

Parameters
  • img_arr (np.array) – The image as a 3D array.

  • output_path (str) – The path to save the Zarr store.

  • img_name (str) – The name of the image to include in the omero.name NGFF metadata field.

  • axes (str) – The array axis ordering. By default, “CYX”

vitessce.data_utils.ome.rgb_img_to_ome_zarr(img_arr, output_path, img_name='Image', chunks=(1, 256, 256), axes='cyx', **kwargs)[source]

Convert an RGB image to OME-Zarr v0.3.

Parameters
  • img_arr (np.array) – The image as a 3D array.

  • output_path (str) – The path to save the Zarr store.

  • img_name (str) – The name of the image to include in the omero.name NGFF metadata field.

  • chunks (tuple[int]) – The chunk sizes of each axis. By default, (1, 256, 256).

  • axes (str) – The array axis ordering. By default, “cyx”

vitessce.data_utils.anndata.cast_arr(arr)[source]

Try to cast an array to a dtype that takes up less space.

Parameters

arr (np.array) – The array to cast.

Returns

The new array.

Return type

np.array

vitessce.data_utils.anndata.optimize_adata(adata, obs_cols=None, obsm_keys=None, var_cols=None, varm_keys=None, layer_keys=None, remove_X=False, optimize_X=False, to_dense_X=False, to_sparse_X=False)[source]

Given an AnnData object, optimize for usage with Vitessce and return a new object.

Parameters
  • adata (anndata.AnnData) – The AnnData object to optimize.

  • obs_cols (list[str] or None) – Columns of adata.obs to optimize. Columns not specified will not be included in the returned object.

  • var_cols (list[str] or None) – Columns of adata.var to optimize. Columns not specified will not be included in the returned object.

  • obsm_keys (list[str] or None) – Arrays within adata.obsm to optimize. Keys not specified will not be included in the returned object.

  • varm_keys (list[str] or None) – Arrays within adata.varm to optimize. Keys not specified will not be included in the returned object.

  • layer_keys (list[str] or None) – Arrays within adata.layers to optimize. Keys not specified will not be included in the returned object.

  • remove_X (bool) – Should the returned object have its X matrix set to None? By default, False.

  • optimize_X (bool) – Should the returned object run optimize_arr on adata.X? By default, False.

  • to_dense_X (bool) – Should adata.X be cast to a dense array in the returned object? By default, False.

  • to_sparse_X (bool) – Should adata.X be cast to a sparse array in the returned object? By default, False.

Returns

The new AnnData object.

Return type

anndata.AnnData

vitessce.data_utils.anndata.optimize_arr(arr)[source]

Try to cast an array to a dtype that takes up less space, and convert to dense.

Parameters

arr (np.array) – The array to cast and convert.

Returns

The new array.

Return type

np.array

vitessce.data_utils.anndata.sort_var_axis(adata_X, orig_var_index, full_var_index=None)[source]

Sort the var index by performing hierarchical clustering.

Parameters
  • adata_X (np.array) – The matrix to use for clustering. For example, adata.X

  • orig_var_index (pandas.Index) – The original var index. For example, adata.var.index

  • full_var_index (pandas.Index or None) – Pass the full adata.var.index to append the var values excluded from sorting, if adata_X and orig_var_index are a subset of the full adata.X matrix. By default, None.

Returns

The sorted elements of the var index.

Return type

list[str]

vitessce.data_utils.anndata.to_dense(arr)[source]

Convert a sparse array to dense.

Parameters

arr (np.array) – The array to convert.

Returns

The converted array (or the original array if it was already dense).

Return type

np.array

vitessce.data_utils.anndata.to_diamond(x, y, r)[source]

Convert an (x, y) coordinate to a polygon (diamond) with a given radius.

Parameters
  • x (int or float) – The x coordinate.

  • y – The y coordinate.

  • r (int or float) – The radius.

Returns

The polygon vertices as an array of coordinate pairs, like [[x1, y1], [x2, y2], …]

Return type

np.array

vitessce.data_utils.anndata.to_memory(arr)[source]

Try to load a backed AnnData array into memory.

Parameters

arr (np.array) – The array to load.

Returns

The loaded array.

Return type

np.array

vitessce.data_utils.anndata.to_uint8(arr, norm_along=None)[source]

Convert an array to uint8 dtype.

Parameters
  • arr (np.array) – The array to convert.

  • norm_along (str or None) – How to normalize the array values. By default, None. Valid values are “global”, “var”, “obs”.

Returns

The converted array.

Return type

np.array