{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "source": [
    "# Vitessce Data Preparation Tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Convert data manually"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When running the Vitessce widget, data is converted on-the-fly into formats that the Vitessce JavaScript component can render.\n",
    "The converted data is stored in temporary files, preventing long-term use of the converted files.\n",
    "\n",
    "However, the data conversion utilities used by the widget are exposed so that their outputs can be saved to regular files.\n",
    "This allows the files to be saved for future use with the Vitessce web application by serving the files locally or moving the files onto an object storage system such as AWS S3 (for long-term storage and data sharing).\n",
    "\n",
    "This notebook demonstrates how to save the processed outputs of the `AnnDataWrapper` and `SnapWrapper` classes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from vitessce import SnapWrapper\n",
    "from os.path import join\n",
    "from scipy.io import mmread\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mtx = mmread(join('data', 'snapatac', 'filtered_cell_by_bin.mtx'))\n",
    "barcodes_df = pd.read_csv(join('data', 'snapatac', 'barcodes.txt'), header=None)\n",
    "bins_df = pd.read_csv(join('data', 'snapatac', 'bins.txt'), header=None)\n",
    "clusters_df = pd.read_csv(join('data', 'snapatac', 'umap_coords_clusters.csv'), index_col=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "zarr_filepath = join('data', 'snapatac', 'out.snap.multires.zarr')\n",
    "\n",
    "w = SnapWrapper(mtx, barcodes_df, bins_df, clusters_df)\n",
    "\n",
    "cells_json = w.create_cells_json()\n",
    "cell_sets_json = w.create_cell_sets_json()\n",
    "\n",
    "with open(join('data', 'snapatac', 'out.cells.json'), 'w') as f:\n",
    "    json.dump(cells_json, f)\n",
    "\n",
    "with open(join('data', 'snapatac', 'out.cell-sets.json'), 'w') as f:\n",
    "    json.dump(cell_sets_json, f)\n",
    "\n",
    "\n",
    "w.create_genomic_multivec_zarr(zarr_filepath)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}