Trend Analysis in Cell Differentiation¶
In this tutorial, we’ll provide a guide on how to use the Mellon library in conjunction with Palantir to analyze trends in cell differentiation trajectories. This will involve loading a scRNA-seq dataset, selecting cell differentiation branches, calculating gene trends along these branches, and finally, visualizing these trends.
Please refer to the Density estimator single-cell analysis for set up and data download instructions
Firstly, let’s import the necessary libraries:
[1]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import palantir
import mellon
import scanpy as sc
import warnings
from numba.core.errors import NumbaDeprecationWarning
We’ll enable inline plotting for the notebook and suppress the NumbaDeprecationWarning for cleaner output:
[2]:
%matplotlib inline
warnings.simplefilter("ignore", category=NumbaDeprecationWarning)
Step 1: Reading and Displaying the Dataset¶
We start by loading the scRNA-seq dataset. For this demonstration, we will use a publicly available dataset of T-cell depleted bone marrow:
[3]:
ad_url = "https://fh-pi-setty-m-eco-public.s3.amazonaws.com/mellon-tutorial/preprocessed_t-cell-depleted-bm-rna.h5ad"
ad = sc.read("data/preprocessed_t-cell-depleted-bm-rna.h5ad", backup_url=ad_url)
ad
[3]:
AnnData object with n_obs × n_vars = 8627 × 17226
obs: 'sample', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'batch', 'DoubletScores', 'n_counts', 'leiden', 'phenograph', 'log_n_counts', 'celltype', 'palantir_pseudotime', 'selection', 'NaiveB_lineage', 'mellon_log_density', 'mellon_log_density_clipped'
var: 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'PeakCounts'
uns: 'DMEigenValues', 'DM_EigenValues', 'NaiveB_lineage_colors', 'celltype_colors', 'custom_branch_mask_columns', 'hvg', 'leiden', 'mellon_log_density_predictor', 'neighbors', 'pca', 'sample_colors', 'umap'
obsm: 'DM_EigenVectors', 'X_FDL', 'X_pca', 'X_umap', 'branch_masks', 'chromVAR_deviations', 'palantir_branch_probs', 'palantir_fate_probabilities', 'palantir_lineage_cells'
varm: 'PCs', 'geneXTF'
layers: 'Bcells_lineage_specific', 'Bcells_primed', 'MAGIC_imputed_data'
obsp: 'DM_Kernel', 'DM_Similarity', 'connectivities', 'distances', 'knn'
Note: The annData object ad
we loaded already has been processed from raw gene counts according to the following notebook, and comes with cell-type annotations, PCA, Mellon densities, and UMAP representation. The anndata object also contains primed and lineage specific accessibility scores.
Step 2: Branch selection¶
To evaluate density along a differentiation branch, we need to select all cells that we consider to represent different states along this branch.
For manual selection, we write a pandas DataFrame with boolean values to a ad.obsm
array and name the columns according to the selected branches:
ad.obsm["custom_branch_mask"] = pd.DataFrame(
{"NaiveB": ad.obs["NaiveB_lineage"] == "True"}
)
Lineage cells used in the Mellon manuscript are available at ad.obsm["palantir_lineage_cells"]
We can inspect this selection using Palantir:
[4]:
palantir.plot.plot_branch_selection(ad, masks_key="palantir_lineage_cells", s=1)
plt.show()
Alternatively, to automate such a selection, we can use the Palantir fate probabilities and the select_branch_cells
tool. Palantir stores these results in ad.obsm["branch_masks"]
by default. See Palantir tutorial for more details.
Step 3: Calculate gene trends¶
Next, we’ll use Mellon’s FunctionEstimator to compute the trend of gene expressions throughout the differentiation trajectories. This requires the cell selection, log-transformed gene expressions of the selected cells, pseudot imes of the selected cells, and a grid of pseudotime values at which we want to evaluate the gene trends:
[5]:
selection = ad.obsm["palantir_lineage_cells"]["NaiveB"]
pseudotimes = ad[selection, :].obs["palantir_pseudotime"]
expressions = ad[selection, :].X
pseudotim_grid = np.linspace(pseudotimes.min(), pseudotimes.max(), 200)
Gene trends are unique to each gene (ad.var
) and each differentiation branch. To store these trends within the AnnData structure, we write a DataFrame to an ad.varm
array. Additionally, to link these trends to the specific pseudotimes at which they’ve been evaluated, we store the pseudotimes in ad.uns["<trend key>_pseudotime"]
. This enables the Palantir library to access these results for plotting purposes:
[6]:
%%time
model = mellon.FunctionEstimator(ls=5, sigma=1)
trends = model.fit_predict(pseudotimes, expressions, pseudotim_grid)
ad.varm["custom_mellon_trends_NaiveB"] = trends.T
ad.uns["custom_mellon_trends_NaiveB_pseudotime"] = pseudotim_grid
[2023-08-14 14:43:16,870] [INFO ] Using covariance function Matern52(ls=5.0).
CPU times: user 6.52 s, sys: 4.22 s, total: 10.7 s
Wall time: 2.04 s
Palantir also offers a wrapper for the `mellon.FunctionEstimator
<https://mellon.readthedocs.io/en/uncertainty/model.html#mellon.model.FunctionEstimator>`__ to run the trend computation on all branches simultaneously and store them in ad.varm["gene_trends_<fate>"]
:
[7]:
%%time
trends = palantir.presults.compute_gene_trends(
ad, masks_key="palantir_lineage_cells"
)
NaiveB
[2023-08-14 14:43:19,169] [INFO ] Using covariance function Matern52(ls=1.0).
Ery
[2023-08-14 14:43:20,712] [INFO ] Using covariance function Matern52(ls=1.0).
pDC
[2023-08-14 14:43:21,562] [INFO ] Using covariance function Matern52(ls=1.0).
Mono
[2023-08-14 14:43:22,664] [INFO ] Using covariance function Matern52(ls=1.0).
CPU times: user 27.5 s, sys: 20.6 s, total: 48.1 s
Wall time: 5.11 s
Consider setting the length scale parameter ls
to tailor the smoothness of the inferred trends over pseudotime to match the variability within your dataset.
Step 4: Plotting Trends¶
Finally, we can use Palantir’s plotting functions to conveniently inspect the results:
[8]:
palantir.plot.plot_trend(ad, "NaiveB", "EBF1", masks_key="palantir_lineage_cells")
plt.show()
We can customize the colorings of the cells based on different genes or other cellular characteristics stored in the ad.obs
dataframe.
[9]:
palantir.plot.plot_trend(
ad, "NaiveB", "EBF1", masks_key="palantir_lineage_cells", color="celltype"
)
plt.show()
In addition to customizing the cell colors, we can also modify the y-position of the cells in the scatter plot, allowing for visualization based on different genes or other cellular characteristics stored in the ad.obs
dataframe. For we can plot the cells according to the Mellon log-density:
[10]:
palantir.plot.plot_trend(
ad,
"NaiveB",
"EBF1",
masks_key="palantir_lineage_cells",
color="celltype",
position="mellon_log_density",
)
plt.show()
Another insightful characteristic is the local variability of genes at different stages of the differentiation trajectory. These variabilities can be computed using Palantir and are stored in ad.layers["local_variability"]
. To visualize this variability alongside the gene expression trends, we need to specify a layer from which this information will be retrieved for the gene:
[11]:
palantir.utils.run_local_variability(ad)
palantir.plot.plot_trend(
ad,
"NaiveB",
"EBF1",
masks_key="palantir_lineage_cells",
color="EBF1",
position="mellon_log_density",
color_layer="local_variability",
)
plt.show()
This tutorial has demonstrated how to use Mellon and Palantir to perform trend analysis in cell differentiation trajectories. This includes selecting branches, calculating gene trends along these branches, and visualizing the results.