Converts raw haplotype data from DLP+ sequencing to the format required by signals. This includes binning haplotypes to match copy number bin coordinates and converting from long to wide format.

format_haplotypes_dlp(haplotypes, CNbins, hmmcopybinsize = 5e+05)

Arguments

haplotypes

A data.frame with raw haplotype allele counts. Required columns: `cell_id`, `chr`, `start`, `end`, `hap_label`, `allele_id`, `readcount`.

CNbins

A data.frame with copy number bin coordinates. Used to align haplotype bins. Required columns: `chr`, `start`, `end`.

hmmcopybinsize

Bin size used by HMMcopy for copy number calling. Default 0.5e6 (500kb).

Value

A data.frame with formatted haplotypes containing columns: * `cell_id`: Cell identifier * `chr`: Chromosome * `start`, `end`: Bin coordinates (aligned to CNbins) * `hap_label`: Haplotype block identifier * `allele1`, `allele0`: Read counts for each allele * `totalcounts`: Total read counts (allele1 + allele0)

See also

[format_haplotypes()] for adding phasing information

Examples

data(CNbins)
data(haplotypes)
haps_formatted <- format_haplotypes_dlp(haplotypes, CNbins)
#> Number of distinct bins in copy number data: 4375
#> Number of distinct bins in haplotype data: 5728
#> Number of distinct bins in formatted haplotype data: 4360