Pairwise statistics on uncertain data collections

These estimators operate on pairs of uncertain value collections. Each element of such a collection can be an uncertain value of any type, such as populations, theoretical distributions, KDE distributions or fitted distributions.

The methods compute the statistic in question by drawing a length-k realisation of each of the k-element collections. Realisations are drawn by sampling each uncertain point in the collections independently. The statistic is then computed on either a single pair of such realisations (yielding a single value for the statistic) or over multiple pairs of realisations (yielding a distribution of the statistic).

Within each collection, point are always sampled independently according to their furnishing distributions, unless sampling constraints are provided (not yet implemented).

Syntax

The syntax for estimating of a statistic f on uncertain value collections x and y is

  • f(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, args..., n::Int; kwargs...), which draws independent length-n draws of x and y, then estimates the statistic f for those draws.

Methods

Covariance

Statistics.covMethod
cov(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int; corrected::Bool = true)

Obtain a distribution on the covariance between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute the covariance between the two length-L draws.

This yields n estimates of the covariance between n independent pairs of realisations of x and y. The n-member distribution of covariance estimates is returned as a vector.

If corrected is true (the default) then the sum is scaled with n - 1 for each pair of draws, whereas the sum is scaled with n if corrected is false where n = length(x).

source

Correlation (Pearson)

Statistics.corMethod
cor(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Estimate a distribution on Pearson's rank correlation coefficient between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute Pearson's rank correlation coefficient between the two length-L draws.

This yields n estimates of Pearson's rank correlation coefficient between n independent pairs of realisations of x and y. The n-member distribution of correlation estimates is returned as a vector.

source

Correlation (Kendall)

StatsBase.corkendallMethod
corkendall(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Estimate a n-member distribution on Kendalls's rank correlation coefficient between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute Kendall's rank correlation coefficient between the two length-L draws.

This yields n computations of Kendall's rank correlation coefficient between n independent pairs of realisations of x and y. The n-member distribution of correlation estimates is returned as a vector.

source

Correlation (Spearman)

StatsBase.corspearmanMethod
corspearman(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Estimate a n-member distribution on Spearman's rank correlation coefficient between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute Spearman's rank correlation coefficient between the two length-L draws.

This yields n estimates of Spearman's rank correlation coefficient between n independent pairs of realisations of x and y. The n-member distribution of correlation estimates is returned as a vector.

source

Count non-equal

StatsBase.countneMethod
countne(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Estimate a n-member distribution on the number of indices at which the elements of two collections of uncertain values are not equal.

This is done by repeating the following procedure n times:

  1. Draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Draw a length-L realisation of y in the same manner.
  3. Count the number of indices at which the elements of the two length-L draws are not equal.

This yields n counts of non-equal values between n pairs of independent realisations of x and y. The n-member distribution of nonequal-value counts is returned as a vector.

source

Count equal

StatsBase.counteqMethod
counteq(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Estimate a n-member distribution on the number of indices at which the elements of two collections of uncertain values are equal.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Count the number of indices at which the elements of the two length-L draws are equal.

This yields n counts of non-equal values between n pairs of independent realisations of x and y. The n-member distribution of equal-value counts is returned as a vector.

source

Maximum absolute deviation

StatsBase.maxadMethod
maxad(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution over the maximum absolute deviation between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute the maximum absolute deviation between the two length-L draws.

This yields n estimates of the maximum absolute deviation between n independent pairs of realisations of x and y. The n-member distribution of maximum absolute deviation estimates is returned as a vector.

source

Mean absolute deviation

StatsBase.meanadMethod
meanad(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution over the mean absolute deviation between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute the mean absolute deviation between the two length-L draws.

This yields n estimates of the mean absolute deviation between n independent pairs of realisations of x and y. The n-member distribution of mean absolute deviation estimates is returned as a vector.

source

Mean squared deviation

StatsBase.msdMethod
msd(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution over the mean squared deviation between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute the mean squared deviation between the two length-L draws.

This yields n estimates of the mean squared deviation between n independent pairs of realisations of x and y. The n-member distribution of mean squared deviation estimates is returned as a vector.

source

Peak signal-to-noise ratio

StatsBase.psnrMethod
psnr(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, maxv, n::Int)

Obtain a distribution over the peak signal-to-noise ratio (PSNR) between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute the PSNR between the two length-L draws.

This yields n estimates of the peak signal-to-noise ratio between n independent pairs of realisations of x and y. The n-member distribution of PSNR estimates is returned as a vector.

The PSNR is computed as 10 * log10(maxv^2 / msd(x_draw, y_draw)), where maxv is the maximum possible value x or y can take

source

Root mean squared deviation

StatsBase.rmsdMethod
rmsd(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int, normalize = false)

Obtain a distribution over the root mean squared deviation between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute the root mean squared deviation between the two length-L draws.

This yields n estimates of the root mean squared deviation between n independent pairs of realisations of x and y. The n-member distribution of root mean squared deviation estimates is returned as a vector.

The root mean squared deviation is computed as sqrt(msd(x_draw, y_draw)) at each iteration. Optionally, x_draw and y_draw may be normalised.

source

Squared L2 distance

StatsBase.sqL2distMethod
sqL2dist(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution over the squared L2 distance between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute the squared L2 distance between the two length-L draws.

This yields n estimates of the squared L2 distance between n independent pairs of realisations of x and y. The n-member distribution of squared L2 distance estimates is returned as a vector.

The squared L2 distance is computed as $\sum_{i=1}^n |x_i - y_i|^2$.

source

Cross correlation

StatsBase.crosscorMethod
crosscor(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, [lags], n::Int; demean = true)

Obtain a distribution over the cross correlation between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute the cross correlation between the two length-L draws.

This yields n estimates of the cross correlation between n independent pairs of realisations of x and y. The n-member distribution of cross correlation estimates is returned as a vector.

demean specifies whether, at each iteration, the respective means of the draws should be subtracted from them before computing their cross correlation.

When left unspecified, the lags used are -min(n-1, 10*log10(n)) to min(n, 10*log10(n)).

The output is normalized by sqrt(var(x_draw)*var(y_draw)). See crosscov for the unnormalized form.

source

Cross covariance

StatsBase.crosscovMethod
crosscov(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, [lags], n::Int; demean = true)

Obtain a distribution over the cross covariance function (CCF) between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute the CCF between the two length-L draws.

This yields n estimates of the CCF between n independent pairs of realisations of x and y. The n-member distribution of CCF estimates is returned as a vector.

demean specifies whether, at each iteration, the respective means of the draws should be subtracted from them before computing their CCF.

When left unspecified, the lags used are -min(n-1, 10*log10(n)) to min(n, 10*log10(n)).

The output is not normalized. See crosscor for a function with normalization.

source

Generalized Kullback-Leibler divergence

StatsBase.gkldivMethod
gkldiv(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution over the generalized Kullback-Leibler divergence between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute the generalized Kullback-Leibler divergence between the two length-L draws.

This yields n estimates of the generalized Kullback-Leibler divergence between n independent pairs of realisations of x and y. The n-member distribution of generalized Kullback-Leibler divergence estimates is returned as a vector.

source

Kullback-Leibler divergence

StatsBase.kldivergenceMethod
kldivergence(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, [b], n::Int)

Obtain a distribution over the Kullback-Leibler divergence between two collections of uncertain values.

This is done by repeating the following procedure n times:

  1. First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
  2. Second, draw a length-L realisation of y in the same manner.
  3. Compute the Kullback-Leibler divergence between the two length-L draws.

This yields n estimates of the Kullback-Leibler divergence between n independent pairs of realisations of x and y. The n-member distribution of Kullback-Leibler divergence estimates is returned as a vector.

Optionally a real number b can be specified such that the divergence is scaled by 1/log(b).

source