Joins copy number bin data with phased haplotype counts to produce a combined data.frame with copy number states and B-allele frequency (BAF) for each bin.

combineBAFCN(
  haplotypes,
  CNbins,
  filtern = 0,
  phased_haplotypes = NULL,
  minbins = 100,
  minbinschr = 10,
  phasing_method = "distribution",
  ...
)

Arguments

haplotypes

A data.frame with haplotype allele counts. Required columns: `cell_id`, `chr`, `start`, `end`, `hap_label`, `allele_id`, `readcount` (raw) or `allele1`, `allele0`, `totalcounts` (formatted).

CNbins

A data.frame with copy number bin data. Required columns: `cell_id`, `chr`, `start`, `end`, `state`, `copy`.

filtern

Minimum total read count per bin to include. Default 0.

phased_haplotypes

Optional pre-computed phased haplotypes from `computehaplotypecounts()`. If NULL, phasing is performed automatically.

minbins

Minimum number of bins per cell to include. Default 100.

minbinschr

Minimum number of bins per chromosome per cell. Default 10.

phasing_method

Method for phasing haplotypes. One of "distribution" (default) or use top N imbalanced cells.

...

Additional arguments passed to `format_haplotypes()`.

Value

A data.frame with columns from CNbins plus: * `alleleA`: Read counts for A allele * `alleleB`: Read counts for B allele * `totalcounts`: Total read counts * `BAF`: B-allele frequency (alleleB / totalcounts)

Examples

data(CNbins)
data(haplotypes)
# Format haplotypes first
haps_formatted <- format_haplotypes_dlp(haplotypes, CNbins)
#> Number of distinct bins in copy number data: 4375
#> Number of distinct bins in haplotype data: 5728
#> Number of distinct bins in formatted haplotype data: 4360
# Combine with CNbins
cnbaf <- combineBAFCN(haps_formatted, CNbins)
#> Finding overlapping cell IDs between CN data and haplotype data...
#> Total number of cells in both CN and haplotypes: 250
#> Number of cells in CN data: 250
#> Number of cells in haplotype data: 250
#> Joining bins and haplotypes...
#> Phase haplotypes...
#> Phasing based on distribution across all cells
#> Join phased haplotypes...
#> Reorder haplotypes based on phase...
#> Total number of cells after removing cells with < 100 bins: 250