leiden_clustering.RdThis function performs dimensionality reduction via PCA, constructs a k-nearest neighbor graph, and applies the Leiden community detection algorithm to identify cell populations. It can handle both standard copy number data and haplotype-specific copy number (HSCN) data.
leiden_clustering(
CNbins,
field = "copy",
n_pcs = 50,
k = 15,
resolution = 0.7,
z_clip = 10,
seed = NULL,
hscn = FALSE,
objective_function = "modularity",
tree_type = "centroid"
)A data frame containing copy number data. Must include columns for 'cell_id' and the specified `field`.
Character. The column name in `CNbins` to use for copy number values. Default is "copy".
Integer. The number of principal components to compute. Default is 50.
Integer. The number of nearest neighbors for graph construction. Default is 15.
Numeric. Resolution parameter for Leiden algorithm (higher = more clusters). Default is 0.7.
Numeric. Maximum absolute z-score for clipping scaled data. Default is 10.
Integer or NULL. Random seed for reproducibility. Default is NULL.
Logical. Whether to use haplotype-specific copy number data. Default is FALSE.
Character. Leiden objective function: "modularity" or "CPM". Default is "modularity".
Character. Type of phylogenetic tree to generate: "centroid" (flat clusters) or "cell" (hierarchical within clusters). Default is "centroid".
A list containing:
A data frame with cell_id and clone_id (cluster assignments).
The igraph communities object from Leiden clustering.
The prcomp object from PCA.
A phylogenetic tree object (cluster-level or cell-level based on tree_type).
Inspired by community detection approaches developed by Sohrab Salehi. TODO: Add reference to Salehi et al. paper on community detection in single-cell genomics.
The function performs the following steps: 1. Creates a copy number matrix from the input data. 2. Applies z-score standardization with clipping to handle outliers. 3. Performs PCA dimensionality reduction. 4. Constructs a symmetric k-nearest neighbor graph in PCA space. 5. Applies Leiden community detection algorithm. 6. Generates a phylogenetic tree (either cluster centroids or full cell hierarchy).
If `hscn` is TRUE, the function expects columns 'copy' and 'BAF' in `CNbins`, and creates separate matrices for A and B alleles.
The function automatically adjusts `k` if there are too few cells. Unlike HDBSCAN (used in umap_clustering), Leiden produces flat cluster assignments, so tree generation uses hierarchical clustering on cluster centroids to create a backbone, then grafts cell subtrees onto it. Both tree types preserve clone blocks in the tree structure: - `tree_type = "centroid"`: Cells within each cluster form a flat star/polytomy (no within-cluster hierarchy) - `tree_type = "cell"`: Cells within each cluster are hierarchically organized via hclust