Trend Analysis in Cell Differentiation

In this tutorial, we’ll provide a guide on how to use the Mellon library in conjunction with Palantir to analyze trends in cell differentiation trajectories. This will involve loading a scRNA-seq dataset, selecting cell differentiation branches, calculating gene trends along these branches, and finally, visualizing these trends.

Please refer to the Density estimator single-cell analysis for set up and data download instructions

Firstly, let’s import the necessary libraries:

[1]:
import numpy as np
import pandas as pd

import matplotlib
import matplotlib.pyplot as plt

import palantir
import mellon
import scanpy as sc

import warnings
from numba.core.errors import NumbaDeprecationWarning

We’ll enable inline plotting for the notebook and suppress the NumbaDeprecationWarning for cleaner output:

[2]:
%matplotlib inline
warnings.simplefilter("ignore", category=NumbaDeprecationWarning)

Step 1: Reading and Displaying the Dataset

We start by loading the scRNA-seq dataset. For this demonstration, we will use a publicly available dataset of T-cell depleted bone marrow:

[3]:
ad_url = "https://fh-pi-setty-m-eco-public.s3.amazonaws.com/mellon-tutorial/preprocessed_t-cell-depleted-bm-rna.h5ad"
ad = sc.read("data/preprocessed_t-cell-depleted-bm-rna.h5ad", backup_url=ad_url)
ad
[3]:
AnnData object with n_obs × n_vars = 8627 × 17226
    obs: 'sample', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'batch', 'DoubletScores', 'n_counts', 'leiden', 'phenograph', 'log_n_counts', 'celltype', 'palantir_pseudotime', 'selection', 'NaiveB_lineage', 'mellon_log_density', 'mellon_log_density_clipped'
    var: 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'PeakCounts'
    uns: 'DMEigenValues', 'DM_EigenValues', 'NaiveB_lineage_colors', 'celltype_colors', 'custom_branch_mask_columns', 'hvg', 'leiden', 'mellon_log_density_predictor', 'neighbors', 'pca', 'sample_colors', 'umap'
    obsm: 'DM_EigenVectors', 'X_FDL', 'X_pca', 'X_umap', 'branch_masks', 'chromVAR_deviations', 'palantir_branch_probs', 'palantir_fate_probabilities', 'palantir_lineage_cells'
    varm: 'PCs', 'geneXTF'
    layers: 'Bcells_lineage_specific', 'Bcells_primed', 'MAGIC_imputed_data'
    obsp: 'DM_Kernel', 'DM_Similarity', 'connectivities', 'distances', 'knn'

Note: The annData object ad we loaded already has been processed from raw gene counts according to the following notebook, and comes with cell-type annotations, PCA, Mellon densities, and UMAP representation. The anndata object also contains primed and lineage specific accessibility scores.

Step 2: Branch selection

To evaluate density along a differentiation branch, we need to select all cells that we consider to represent different states along this branch.

For manual selection, we write a pandas DataFrame with boolean values to a ad.obsm array and name the columns according to the selected branches:

ad.obsm["custom_branch_mask"] = pd.DataFrame(
    {"NaiveB": ad.obs["NaiveB_lineage"] == "True"}
)

Lineage cells used in the Mellon manuscript are available at ad.obsm["palantir_lineage_cells"]

We can inspect this selection using Palantir:

[4]:
palantir.plot.plot_branch_selection(ad, masks_key="palantir_lineage_cells", s=1)
plt.show()
../_images/notebooks_trajectory-trends_tutorial_11_0.png

Alternatively, to automate such a selection, we can use the Palantir fate probabilities and the select_branch_cells tool. Palantir stores these results in ad.obsm["branch_masks"] by default. See Palantir tutorial for more details.