heatmap_visualization.RmdThe signals package provides powerful heatmap
visualization capabilities for copy number data.
These features work with any copy number data, not just haplotype-specific or allele-specific calls. This makes them useful for visualizing total copy number profiles from any single-cell DNA sequencing technology.
We’ll use the included CNbins test data, which contains
copy number calls for 250 cells across all human chromosomes. First,
let’s load the data and create a clustering for organized heatmap
visualization.
data(CNbins)
CNbins <- CNbins %>% filter(chr != "Y")
head(CNbins)
#> chr start end reads copy state cell_id
#> 1 1 2000001 2500000 536 2.960868 2 SA921-A90554A-R03-C44
#> 2 1 3000001 3500000 499 2.603584 2 SA921-A90554A-R03-C44
#> 3 1 4000001 4500000 569 2.216388 2 SA921-A90554A-R03-C44
#> 4 1 4500001 5000000 524 2.039672 2 SA921-A90554A-R03-C44
#> 5 1 5000001 5500000 576 2.210792 2 SA921-A90554A-R03-C44
#> 6 1 5500001 6000000 516 2.028686 2 SA921-A90554A-R03-C44
clustering <- umap_clustering(CNbins, field = "copy")The umap_clustering() function performs UMAP
dimensionality reduction followed by HDBSCAN clustering. This produces a
phylogenetic tree and cluster assignments that organize cells by
similarity.
Let’s start with a standard heatmap.
plotHeatmap(CNbins,
tree = clustering$tree,
clusters = clustering$clustering)
This heatmap shows copy number states across all genomic bins (columns) and all cells (rows). Cells are organized by the hierarchical clustering tree (displayed on the left). The color scheme represents copy number states, with blue indicating losses and red indicating gains.
The gene_annotations parameter highlights specific genes
of interest on the heatmap. This is useful for focusing on known
cancer-associated genes or other loci of biological significance.
Let’s highlight the TP53 tumor suppressor gene:
plotHeatmap(CNbins,
gene_annotations = "TP53",
tree = clustering$tree,
clusters = clustering$clustering)
A line connecting the gene label to its genomic position appears above the heatmap. This makes it easy to identify where specific genes are located relative to the copy number changes.
We can highlight multiple genes simultaneously:
plotHeatmap(CNbins,
gene_annotations = c("TP53", "MYC", "KRAS", "BRCA1"),
tree = clustering$tree,
clusters = clustering$clustering)
You can adjust the appearance of gene annotations:
plotHeatmap(CNbins,
gene_annotations = c("TP53", "MYC"),
gene_annotation_fontsize = 12,
gene_link_height = 8,
tree = clustering$tree,
clusters = clustering$clustering)
Key parameters: - gene_annotation_fontsize: Controls the
size of gene labels (default uses annofontsize) -
gene_link_height: Height of connector lines in mm (default
is 5 mm)
Gene annotations can be combined with other annotations like the frequency plot, which shows how many cells have copy number changes at each genomic position:
plotHeatmap(CNbins,
gene_annotations = c("TP53", "MYC", "KRAS"),
plotfrequency = TRUE,
frequencycutoff = 2,
tree = clustering$tree,
clusters = clustering$clustering)
In this visualization: - Gene labels appear at the very top - Below that is the frequency plot showing recurrence of copy number changes - The heatmap shows individual cell profiles
The plotideogram parameter adds a visual representation
of chromosome structure at the bottom of the heatmap, including cytoband
patterns and centromeric regions.
⚠️ Important: The plotideogram
parameter requires plotallbins = TRUE to display all
genomic bins including centromeric regions.
plotHeatmap(CNbins,
plotallbins = TRUE,
plotideogram = TRUE,
tree = clustering$tree,
clusters = clustering$clustering)
The ideogram shows: - Chromosome bands: Different cytogenetic regions (displayed as colored bands) - Centromeric regions: Light grey areas where coverage is typically low or absent - Chromosome structure: Provides genomic context for interpreting copy number changes
Here’s the complete visualization combining all new features:
plotHeatmap(CNbins,
gene_annotations = c("TP53", "MYC", "KRAS"),
plotallbins = TRUE,
plotideogram = TRUE,
plotfrequency = TRUE,
frequencycutoff = 2,
tree = clustering$tree,
clusters = clustering$clustering)
This comprehensive visualization shows: 1. Top: Gene labels for key cancer-associated genes 2. Upper middle: Frequency plot showing recurrent copy number changes 3. Center: Individual cell copy number heatmap 4. Bottom: Chromosome ideogram with cytoband and centromere information
This arrangement provides multiple levels of information: individual cell profiles, population-level recurrence patterns, genomic context, and functional annotation (genes).
By default, the package uses hg19 (GRCh37) for gene positions and cytoband information. You can switch to hg38 (GRCh38):
plotHeatmap(CNbins,
gene_annotations = "TP53",
genome = "hg38",
tree = clustering$tree,
clusters = clustering$clustering)
Adjust the height of the ideogram annotation:
plotHeatmap(CNbins,
plotallbins = TRUE,
plotideogram = TRUE,
ideogram_height = 0.5,
tree = clustering$tree,
clusters = clustering$clustering)
The ideogram_height parameter specifies height in cm
(default is 0.3 cm).
Customize the appearance of centromeric regions:
plotHeatmap(CNbins,
plotallbins = TRUE,
plotideogram = TRUE,
centromere_col = "#E0E0E0",
tree = clustering$tree,
clusters = clustering$clustering)
Highlight common tumor suppressors and oncogenes:
cancer_genes <- c("TP53", "RB1", "PTEN", "BRCA1", "BRCA2", "MYC", "KRAS", "PIK3CA")
plotHeatmap(CNbins,
gene_annotations = cancer_genes,
plotfrequency = TRUE,
frequencycutoff = 3,
tree = clustering$tree,
clusters = clustering$clustering)
Display a specific chromosome with gene annotations:
# Filter to chromosome 17
CNbins_chr17 <- CNbins[CNbins$chr == "17", ]
clustering_chr17 <- umap_clustering(CNbins_chr17, field = "copy")
plotHeatmap(CNbins_chr17,
gene_annotations = c("TP53", "BRCA1"),
tree = clustering_chr17$tree,
clusters = clustering_chr17$clustering)