Pairwise statistics on uncertain data collections

These estimators operate on pairs of uncertain value collections. Each element of such a collection can be an uncertain value of any type, such as populations, theoretical distributions, KDE distributions or fitted distributions.

The methods compute the statistic in question by drawing a length-k realisation of each of the k-element collections. Realisations are drawn by sampling each uncertain point in the collections independently. The statistic is then computed on either a single pair of such realisations (yielding a single value for the statistic) or over multiple pairs of realisations (yielding a distribution of the statistic).

Within each collection, point are always sampled independently according to their furnishing distributions, unless sampling constraints are provided (not yet implemented).

Syntax

The syntax for estimating of a statistic f on uncertain value collections x and y is

f(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, args..., n::Int; kwargs...), which draws independent length-n draws of x and y, then estimates the statistic f for those draws.

Methods

Covariance

Statistics.cov — Method

cov(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int; corrected::Bool = true)

Obtain a distribution on the covariance between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute the covariance between the two length-L draws.

This yields n estimates of the covariance between n independent pairs of realisations of x and y. The n-member distribution of covariance estimates is returned as a vector.

If corrected is true (the default) then the sum is scaled with n - 1 for each pair of draws, whereas the sum is scaled with n if corrected is false where n = length(x).

Correlation (Pearson)

Statistics.cor — Method

cor(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Estimate a distribution on Pearson's rank correlation coefficient between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute Pearson's rank correlation coefficient between the two length-L draws.

This yields n estimates of Pearson's rank correlation coefficient between n independent pairs of realisations of x and y. The n-member distribution of correlation estimates is returned as a vector.

Correlation (Kendall)

StatsBase.corkendall — Method

corkendall(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Estimate a n-member distribution on Kendalls's rank correlation coefficient between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute Kendall's rank correlation coefficient between the two length-L draws.

This yields n computations of Kendall's rank correlation coefficient between n independent pairs of realisations of x and y. The n-member distribution of correlation estimates is returned as a vector.

Correlation (Spearman)

StatsBase.corspearman — Method

corspearman(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Estimate a n-member distribution on Spearman's rank correlation coefficient between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute Spearman's rank correlation coefficient between the two length-L draws.

This yields n estimates of Spearman's rank correlation coefficient between n independent pairs of realisations of x and y. The n-member distribution of correlation estimates is returned as a vector.

Count non-equal

StatsBase.countne — Method

countne(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Estimate a n-member distribution on the number of indices at which the elements of two collections of uncertain values are not equal.

This is done by repeating the following procedure n times:

Draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Draw a length-L realisation of y in the same manner.
Count the number of indices at which the elements of the two length-L draws are not equal.

This yields n counts of non-equal values between n pairs of independent realisations of x and y. The n-member distribution of nonequal-value counts is returned as a vector.

Count equal

StatsBase.counteq — Method

counteq(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Estimate a n-member distribution on the number of indices at which the elements of two collections of uncertain values are equal.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Count the number of indices at which the elements of the two length-L draws are equal.

This yields n counts of non-equal values between n pairs of independent realisations of x and y. The n-member distribution of equal-value counts is returned as a vector.

Maximum absolute deviation

StatsBase.maxad — Method

maxad(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution over the maximum absolute deviation between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute the maximum absolute deviation between the two length-L draws.

This yields n estimates of the maximum absolute deviation between n independent pairs of realisations of x and y. The n-member distribution of maximum absolute deviation estimates is returned as a vector.

Mean absolute deviation

StatsBase.meanad — Method

meanad(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution over the mean absolute deviation between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute the mean absolute deviation between the two length-L draws.

This yields n estimates of the mean absolute deviation between n independent pairs of realisations of x and y. The n-member distribution of mean absolute deviation estimates is returned as a vector.

Mean squared deviation

StatsBase.msd — Method

msd(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution over the mean squared deviation between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute the mean squared deviation between the two length-L draws.

This yields n estimates of the mean squared deviation between n independent pairs of realisations of x and y. The n-member distribution of mean squared deviation estimates is returned as a vector.

Peak signal-to-noise ratio

StatsBase.psnr — Method

psnr(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, maxv, n::Int)

Obtain a distribution over the peak signal-to-noise ratio (PSNR) between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute the PSNR between the two length-L draws.

This yields n estimates of the peak signal-to-noise ratio between n independent pairs of realisations of x and y. The n-member distribution of PSNR estimates is returned as a vector.

The PSNR is computed as 10 * log10(maxv^2 / msd(x_draw, y_draw)), where maxv is the maximum possible value x or y can take

Root mean squared deviation

StatsBase.rmsd — Method

rmsd(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int, normalize = false)

Obtain a distribution over the root mean squared deviation between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute the root mean squared deviation between the two length-L draws.

This yields n estimates of the root mean squared deviation between n independent pairs of realisations of x and y. The n-member distribution of root mean squared deviation estimates is returned as a vector.

The root mean squared deviation is computed as sqrt(msd(x_draw, y_draw)) at each iteration. Optionally, x_draw and y_draw may be normalised.

Squared L2 distance

StatsBase.sqL2dist — Method

sqL2dist(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution over the squared L2 distance between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute the squared L2 distance between the two length-L draws.

This yields n estimates of the squared L2 distance between n independent pairs of realisations of x and y. The n-member distribution of squared L2 distance estimates is returned as a vector.

The squared L2 distance is computed as $\sum_{i=1}^n |x_i - y_i|^2$.

Cross correlation

StatsBase.crosscor — Method

crosscor(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, [lags], n::Int; demean = true)

Obtain a distribution over the cross correlation between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute the cross correlation between the two length-L draws.

This yields n estimates of the cross correlation between n independent pairs of realisations of x and y. The n-member distribution of cross correlation estimates is returned as a vector.

demean specifies whether, at each iteration, the respective means of the draws should be subtracted from them before computing their cross correlation.

When left unspecified, the lags used are -min(n-1, 10*log10(n)) to min(n, 10*log10(n)).

The output is normalized by sqrt(var(x_draw)*var(y_draw)). See crosscov for the unnormalized form.

Cross covariance

StatsBase.crosscov — Method

crosscov(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, [lags], n::Int; demean = true)

Obtain a distribution over the cross covariance function (CCF) between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute the CCF between the two length-L draws.

This yields n estimates of the CCF between n independent pairs of realisations of x and y. The n-member distribution of CCF estimates is returned as a vector.

demean specifies whether, at each iteration, the respective means of the draws should be subtracted from them before computing their CCF.

When left unspecified, the lags used are -min(n-1, 10*log10(n)) to min(n, 10*log10(n)).

The output is not normalized. See crosscor for a function with normalization.

Generalized Kullback-Leibler divergence

StatsBase.gkldiv — Method

gkldiv(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution over the generalized Kullback-Leibler divergence between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute the generalized Kullback-Leibler divergence between the two length-L draws.

This yields n estimates of the generalized Kullback-Leibler divergence between n independent pairs of realisations of x and y. The n-member distribution of generalized Kullback-Leibler divergence estimates is returned as a vector.

Kullback-Leibler divergence

StatsBase.kldivergence — Method

kldivergence(x::UVAL_COLLECTION_TYPES, y::UVAL_COLLECTION_TYPES, [b], n::Int)

Obtain a distribution over the Kullback-Leibler divergence between two collections of uncertain values.

This is done by repeating the following procedure n times:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Second, draw a length-L realisation of y in the same manner.
Compute the Kullback-Leibler divergence between the two length-L draws.

This yields n estimates of the Kullback-Leibler divergence between n independent pairs of realisations of x and y. The n-member distribution of Kullback-Leibler divergence estimates is returned as a vector.

Optionally a real number b can be specified such that the divergence is scaled by 1/log(b).