{ "cells": [ { "cell_type": "markdown", "metadata": { "nbsphinx": "hidden" }, "source": [ "# Vitessce Data Preparation Tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Export data to AWS S3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Import dependencies\n", "\n", "We need to import the classes and functions that we will be using from the corresponding packages." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import boto3\n", "import json\n", "from urllib.parse import quote_plus\n", "from os.path import join, isfile, isdir\n", "from urllib.request import urlretrieve\n", "from anndata import read_h5ad\n", "import scanpy as sc\n", "\n", "from vitessce import (\n", " VitessceWidget,\n", " VitessceConfig,\n", " Component as cm,\n", " CoordinationType as ct,\n", " AnnDataWrapper,\n", ")\n", "from vitessce.data_utils import (\n", " optimize_adata,\n", " VAR_CHUNK_SIZE,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Download and process data\n", "\n", "For this example, we need to download a dataset from the COVID-19 Cell Atlas https://www.covid19cellatlas.org/index.healthy.html#habib17." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "adata_filepath = join(\"data\", \"habib17.processed.h5ad\")\n", "if not isfile(adata_filepath):\n", " os.makedirs(\"data\", exist_ok=True)\n", " urlretrieve('https://covid19.cog.sanger.ac.uk/habib17.processed.h5ad', adata_filepath)\n", "\n", "adata = read_h5ad(adata_filepath)\n", "top_dispersion = adata.var[\"dispersions_norm\"][\n", " sorted(\n", " range(len(adata.var[\"dispersions_norm\"])),\n", " key=lambda k: adata.var[\"dispersions_norm\"][k],\n", " )[-51:][0]\n", "]\n", "adata.var[\"top_highly_variable\"] = (\n", " adata.var[\"dispersions_norm\"] > top_dispersion\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "zarr_filepath = join(\"data\", \"habib17.processed.zarr\")\n", "if not isdir(zarr_filepath):\n", " adata = optimize_adata(\n", " adata,\n", " obs_cols=[\"CellType\"],\n", " obsm_keys=[\"X_umap\"],\n", " var_cols=[\"top_highly_variable\"],\n", " optimize_X=True,\n", " )\n", " adata.write_zarr(zarr_filepath, chunks=[adata.shape[0], VAR_CHUNK_SIZE])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Create the Vitessce configuration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set up the configuration by adding the views and datasets of interest." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vc = VitessceConfig(schema_version=\"1.0.15\", name='Habib et al', description='COVID-19 Healthy Donor Brain')\n", "dataset = vc.add_dataset(name='Brain').add_object(AnnDataWrapper(\n", " adata_path=zarr_filepath,\n", " obs_embedding_paths=[\"obsm/X_umap\"],\n", " obs_embedding_names=[\"UMAP\"],\n", " obs_set_paths=[\"obs/CellType\"],\n", " obs_set_names=[\"Cell Type\"],\n", " obs_feature_matrix_path=\"X\",\n", " feature_filter_path=\"var/top_highly_variable\"\n", "))\n", "scatterplot = vc.add_view(cm.SCATTERPLOT, dataset=dataset, mapping=\"UMAP\")\n", "cell_sets = vc.add_view(cm.OBS_SETS, dataset=dataset)\n", "genes = vc.add_view(cm.FEATURE_LIST, dataset=dataset)\n", "heatmap = vc.add_view(cm.HEATMAP, dataset=dataset)\n", "vc.layout((scatterplot | (cell_sets / genes)) / heatmap);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Create a `boto3` resource with S3 credentials" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s3 = boto3.resource(\n", " service_name='s3',\n", " aws_access_key_id=os.environ['VITESSCE_S3_ACCESS_KEY_ID'],\n", " aws_secret_access_key=os.environ['VITESSCE_S3_SECRET_ACCESS_KEY'],\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Upload files to S3\n", "\n", "The `.export(to='S3')` method on the view config instance will upload all data objects to the specified bucket. Then, the processed view config will be returned as a `dict`, with the file URLs filled in, pointing to the S3 bucket files. For more information about configuring the S3 bucket so that files are accessible over the internet, visit the \"Hosting Data\" page of our core documentation site." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "config_dict = vc.export(to='S3', s3=s3, bucket_name='vitessce-export-examples', prefix='test')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. View on vitessce.io\n", "\n", "The returned view config dict can be converted to a URL, and can be used to share the interactive visualizations with colleagues." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vitessce_url = \"http://vitessce.io/?url=data:,\" + quote_plus(json.dumps(config_dict))\n", "import webbrowser\n", "webbrowser.open(vitessce_url)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.0" } }, "nbformat": 4, "nbformat_minor": 4 }