{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "# Vitessce Widget Tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Visualization of genomic profiles"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Import dependencies\n",
    "\n",
    "We need to import the classes and functions that we will be using from the corresponding packages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from vitessce import (\n",
    "    VitessceConfig,\n",
    "    ViewType as vt,\n",
    "    CoordinationType as ct,\n",
    "    AnnDataWrapper,\n",
    "    MultivecZarrWrapper,\n",
    ")\n",
    "from vitessce.data_utils import (\n",
    "    adata_to_multivec_zarr,\n",
    ")\n",
    "from os.path import join\n",
    "from scipy.io import mmread\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from anndata import AnnData"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Load the data\n",
    "\n",
    "In this step, we load the raw data that has been downloaded from the HuBMAP portal https://portal.hubmapconsortium.org/browse/dataset/210d118a14c8624b6bb9610a9062656e"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mtx = mmread(join('data', 'snapatac', 'filtered_cell_by_bin.mtx')).toarray()\n",
    "barcodes_df = pd.read_csv(join('data', 'snapatac', 'barcodes.txt'), header=None)\n",
    "bins_df = pd.read_csv(join('data', 'snapatac', 'bins.txt'), header=None, names=[\"interval\"])\n",
    "clusters_df = pd.read_csv(join('data', 'snapatac', 'umap_coords_clusters.csv'), index_col=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Convert the data to Vitessce-compatible formats\n",
    "\n",
    "Vitessce can load AnnData objects saved to Zarr formats efficiently."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The genome assembly is GRCh38 but the chromosome names in the bin names do not start with the \"chr\" prefix.\n",
    "# This is incompatible with the chromosome names from `negspy`, so we need to append the prefix.\n",
    "bins_df[\"interval\"] = bins_df[\"interval\"].apply(lambda x: \"chr\" + x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs = clusters_df[[\"cluster\"]]\n",
    "obs[\"cluster\"] = obs[\"cluster\"].astype(str)\n",
    "obsm = { \"X_umap\": clusters_df[[\"umap.1\", \"umap.2\"]].values }\n",
    "adata = AnnData(X=mtx, obs=obs, var=bins_df, obsm=obsm)\n",
    "adata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "multivec_zarr_path = join(\"data\", \"HBM485.TBWH.322.multivec.zarr\")\n",
    "adata_zarr_path = join(\"data\", \"HBM485.TBWH.322.adata.zarr\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sort cluster IDs\n",
    "cluster_ids = obs[\"cluster\"].unique().tolist()\n",
    "cluster_ids.sort(key=int)\n",
    "# Save genomic profiles to multivec-zarr format.\n",
    "adata_to_multivec_zarr(adata, multivec_zarr_path, obs_set_col=\"cluster\", obs_set_name=\"Cluster\", obs_set_vals=cluster_ids)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save anndata object to AnnData-Zarr format.\n",
    "adata.write_zarr(adata_zarr_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 4. Make a Vitessce configuration\n",
    "\n",
    "We need to tell Vitessce about the data that we want to load and the visualization components that we want to include in the widget.\n",
    "For this dataset, we want to add the `GENOMIC_PROFILES` component, which renders genome browser tracks with [HiGlass](http://higlass.io)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vc = VitessceConfig(schema_version=\"1.0.15\", name='HuBMAP snATAC-seq')\n",
    "dataset = vc.add_dataset(name='HBM485.TBWH.322').add_object(MultivecZarrWrapper(\n",
    "    zarr_path=multivec_zarr_path\n",
    ")).add_object(AnnDataWrapper(\n",
    "    adata_path=adata_zarr_path,\n",
    "    obs_embedding_paths=[\"obsm/X_umap\"],\n",
    "    obs_embedding_names=[\"UMAP\"],\n",
    "    obs_set_paths=[\"obs/cluster\"],\n",
    "    obs_set_names=[\"Cluster\"],\n",
    "))\n",
    "\n",
    "genomic_profiles = vc.add_view(vt.GENOMIC_PROFILES, dataset=dataset)\n",
    "scatter = vc.add_view(vt.SCATTERPLOT, dataset=dataset, mapping = \"UMAP\")\n",
    "cell_sets = vc.add_view(vt.OBS_SETS, dataset=dataset)\n",
    "\n",
    "vc.layout(genomic_profiles / (scatter | cell_sets));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Create the widget"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vw = vc.widget(height=800)\n",
    "vw"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}