umap_clustering.Rd
This function takes copy number data, performs UMAP dimensionality reduction, and then applies HDBSCAN clustering to identify cell populations. It can handle both standard copy number data and haplotype-specific copy number (HSCN) data.
umap_clustering(
CNbins,
n_neighbors = 10,
min_dist = 0.1,
minPts = 30,
seed = NULL,
field = "copy",
umapmetric = "correlation",
hscn = FALSE,
pca = NULL
)
A data frame containing copy number data. Must include columns for 'cell_id' and the specified `field`.
Integer. The number of neighbors to consider in UMAP. Default is 10.
Numeric. The minimum distance between points in UMAP. Default is 0.1.
Integer. The minimum number of points to form a cluster in HDBSCAN. Default is 30.
Integer or NULL. Random seed for reproducibility. Default is NULL.
Character. The column name in `CNbins` to use for copy number values. Default is "copy".
Character. The distance metric to use in UMAP. Default is "correlation".
Logical. Whether to use haplotype-specific copy number data. Default is FALSE.
Integer or NULL. Number of principal components to use in UMAP. If NULL, pca not used, this is the default.
A list containing:
A data frame with UMAP coordinates and cluster assignments for each cell.
The results of the HDBSCAN clustering.
The results of the UMAP dimensionality reduction.
A phylogenetic tree object representing the hierarchical structure of the clusters.
The function performs the following steps: 1. Creates a copy number matrix from the input data. 2. Applies UMAP dimensionality reduction. 3. Performs HDBSCAN clustering on the UMAP results. 4. Generates a phylogenetic tree from the clustering results.
If `hscn` is TRUE, the function expects columns 'copy' and 'BAF' in `CNbins`, and creates separate matrices for A and B alleles.
The function automatically adjusts `n_neighbors` if there are too few cells. If UMAP fails, it attempts to rerun with small jitter added to the data points. The function will reduce `minPts` if only one cluster is initially found.