Calculate dissimilarity or Euclidean distance for genlight objects

This function calculates both dissimilarity and Euclidean distances for genlight or snpclone objects.

bitwise.dist(
  x,
  percent = TRUE,
  mat = FALSE,
  missing_match = TRUE,
  scale_missing = FALSE,
  euclidean = FALSE,
  differences_only = FALSE,
  threads = 0L
)

Arguments

x	a genlight or snpclone object.
percent	`logical`. Should the distance be represented from 0 to 1? Default set to `TRUE`. `FALSE` will return the distance represented as integers from 1 to n where n is the number of loci. This option has no effect if `euclidean = TRUE`
mat	`logical`. Return a matrix object. Default set to `FALSE`, returning a dist object. `TRUE` returns a matrix object.
missing_match	`logical`. Determines whether two samples differing by missing data in a location should be counted as matching at that location. Default set to `TRUE`, which forces missing data to match with anything. `FALSE` forces missing data to not match with any other information, including other missing data.
scale_missing	A logical. If `TRUE`, comparisons with missing data is scaled up proportionally to the number of columns used by multiplying the value by `m / (m - x)` where m is the number of loci and x is the number of missing sites. This option matches the behavior of base R's `dist()` function. Defaults to `FALSE`.
euclidean	`logical`. if `TRUE`, the Euclidean distance will be calculated.
differences_only	`logical`. When `differences_only = TRUE`, the output will reflect the number of different loci. The default setting, `differences_only = FALSE`, reflects the number of different alleles. Note: this has no effect on haploid organisms since 1 locus = 1 allele. This option is NOT recommended.
threads	The maximum number of parallel threads to be used within this function. A value of 0 (default) will attempt to use as many threads as there are available cores/CPUs. In most cases this is ideal. A value of 1 will force the function to run serially, which may increase stability on some systems. Other values may be specified, but should be used with caution.

Value

A dist object containing pairwise distances between samples.

Details

The default distance calculated here is quite simple and goes by many names depending on its application. The most familiar name might be the Hamming distance, or the number of differences between two strings.

As of poppr version 2.8.0, this function now also calculates Euclidean distance and is considerably faster and more memory-efficient than the standard dist() function.

Note

This function is optimized for genlight and snpclone objects. This does not mean that it is a catch-all optimization for SNP data. Three assumptions must be met for this function to work:

SNPs are bi-allelic
Samples are haploid or diploid
All samples have the same ploidy

If the user supplies a genind or genclone object, prevosti.dist() will be used for calculation.

Examples