R/bitwise.r
bitwise.dist.Rd
This function calculates both dissimilarity and Euclidean distances for genlight or snpclone objects.
bitwise.dist( x, percent = TRUE, mat = FALSE, missing_match = TRUE, scale_missing = FALSE, euclidean = FALSE, differences_only = FALSE, threads = 0L )
x | a genlight or snpclone object. |
---|---|
percent |
|
mat |
|
missing_match |
|
scale_missing | A logical. If |
euclidean |
|
differences_only |
|
threads | The maximum number of parallel threads to be used within this function. A value of 0 (default) will attempt to use as many threads as there are available cores/CPUs. In most cases this is ideal. A value of 1 will force the function to run serially, which may increase stability on some systems. Other values may be specified, but should be used with caution. |
A dist object containing pairwise distances between samples.
The default distance calculated here is quite simple and goes by many names depending on its application. The most familiar name might be the Hamming distance, or the number of differences between two strings.
As of poppr version 2.8.0, this function now also calculates Euclidean
distance and is considerably faster and more memory-efficient than the
standard dist()
function.
This function is optimized for genlight and snpclone objects. This does not mean that it is a catch-all optimization for SNP data. Three assumptions must be met for this function to work:
SNPs are bi-allelic
Samples are haploid or diploid
All samples have the same ploidy
If the user supplies a genind or
genclone object, prevosti.dist()
will be used for
calculation.
diss.dist()
, snpclone,
genlight, win.ia()
, samp.ia()
#> /// GENLIGHT OBJECT ///////// #> #> // 10 genotypes, 1,000 binary SNPs, size: 20.6 Kb #> 0 (0 %) missing data #> #> // Basic content #> @gen: list of 10 SNPbin #> @ploidy: ploidy of each individual (range: 2-2) #> #> // Optional content #> @pop: population of each individual (group size range: 4-6) #> @other: a list containing: ancestral.pops #>#> user system elapsed #> 0.000 0.000 0.001xd#> 1 2 3 4 5 6 7 8 9 #> 2 0.2230 #> 3 0.2260 0.2280 #> 4 0.2250 0.2170 0.2040 #> 5 0.3795 0.3835 0.3795 0.3805 #> 6 0.4035 0.3985 0.4055 0.3975 0.2100 #> 7 0.4005 0.3955 0.3935 0.3885 0.2000 0.2200 #> 8 0.3880 0.3860 0.3870 0.3960 0.2035 0.2205 0.2135 #> 9 0.3920 0.4030 0.4080 0.3970 0.2135 0.2125 0.2005 0.2150 #> 10 0.3935 0.3905 0.4015 0.3885 0.2230 0.2160 0.2120 0.2265 0.2195# Calculate Euclidean distance system.time(xdt <- bitwise.dist(x, euclidean = TRUE, scale_missing = TRUE, threads = 1L))#> user system elapsed #> 0.000 0.000 0.001xdt#> 1 2 3 4 5 6 7 8 #> 2 23.40940 #> 3 23.74868 24.04163 #> 4 23.36664 23.10844 22.31591 #> 5 34.36568 34.82815 34.71311 34.56877 #> 6 35.59494 35.45420 36.09709 35.76311 22.93469 #> 7 35.53871 35.17101 35.79106 35.00000 22.31591 23.57965 #> 8 34.75629 34.98571 35.12834 35.49648 22.24860 23.08679 23.13007 #> 9 35.49648 35.88872 36.19392 35.41186 23.04344 23.13007 22.24860 22.93469 #> 10 34.71311 34.68429 35.81899 35.05710 23.57965 23.10844 22.80351 23.81176 #> 9 #> 2 #> 3 #> 4 #> 5 #> 6 #> 7 #> 8 #> 9 #> 10 23.51595# \dontrun{ # This function is more efficient in both memory and speed than [dist()] for # calculating Euclidean distance on genlight objects. For example, we can # observe a clear speed increase when we attempt a calculation on 100k SNPs # with 10% missing data: set.seed(999) mat <- matrix(sample(c(0:2, NA), 100000 * 50, replace = TRUE, prob = c(0.3, 0.3, 0.3, 0.1)), nrow = 50) glite <- new("genlight", mat, ploidy = 2) # Default Euclidean distance system.time(dist(glite))#> user system elapsed #> 2.432 0.074 2.508#> user system elapsed #> 0.849 0.004 0.872# }