Statistics on single collections of uncertain data

These estimators operate on collections of uncertain values. Each element of such a collection can be an uncertain value of any type, such as populations, theoretical distributions, KDE distributions or fitted distributions.

The methods compute the statistic in question by drawing a length-k realisation of the k-element collection. Realisations are drawn by sampling each uncertain point in the collection independently. The statistic is then computed on either a single such realisation (yielding a single value for the statistic) or over multiple realisations (yielding a distribution of the statistic).

Syntax

The syntax for computing a statistic f for single instances of an uncertain value collections is

f(x::UVAL_COLLECTION_TYPES), which resamples x once, assuming no element-wise dependence between the elements of x.
f(x::UVAL_COLLECTION_TYPES, n::Int, args...; kwargs...), which resamples x n times, assuming no element-wise dependence between the elements of x, then computes the statistic on each of those n independent draws. Returns a distributions of estimates of the statistic.

Methods

Mean

Statistics.mean — Method

mean(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the mean of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the mean is computed for each of those length-L realisations, yielding a distribution of mean estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the mean for the realisation.
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the mean of x, which is returned as a vector.

Mode

StatsBase.mode — Method

mode(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the mode of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the mode is computed for each of those length-L realisations, yielding a distribution of mode estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the mode for the realisation, which is a vector of length L
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the mode of x, which is returned as a vector.

Quantile

Statistics.quantile — Method

quantile(x::UVAL_COLLECTION_TYPES, q, n::Int)

Obtain a distribution for the quantile(s) q of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the quantile is computed for each of those length-L realisations, yielding a distribution of quantile estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the quantile for the realisation, which is a vector of length L
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the quantile of x, which is returned as a vector.

IQR

StatsBase.iqr — Method

iqr(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the interquartile range (IQR), i.e. the 75th percentile minus the 25th percentile, of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the IQR is computed for each of those length-L realisations, yielding a distribution of IQR estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the IQR for the realisation, which is a vector of length L
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the IQR of x, which is returned as a vector.

Median

Statistics.median — Method

median(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the median of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the median is computed for each of those length-L realisations, yielding a distribution of median estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the median for the realisation.
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the median of x, which is returned as a vector.

Middle

Statistics.middle — Method

middle(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the middle of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the middle is computed for each of those length-L realisations, yielding a distribution of middle estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the middle for the realisation, which is a vector of length L
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the middle of x, which is returned as a vector.

Standard deviation

Statistics.std — Method

std(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the standard deviation of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the standard deviation is computed for each of those length-L realisations, yielding a distribution of standard deviation estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the std for the realisation, which is a vector of length L
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the standard deviation of x, which is returned as a vector.

Variance

Statistics.var — Method

var(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the variance of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the variance is computed for each of those length-L realisations, yielding a distribution of variance estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the variance for the realisation, which is a vector of length L
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the variance of x, which is returned as a vector.

Generalized/power mean

StatsBase.genmean — Method

genmean(x::UVAL_COLLECTION_TYPES, p, n::Int)

Obtain a distribution for the generalized/power mean with exponent p of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the generalized mean is computed for each of those length-L realisations, yielding a distribution of generalized mean estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the generalized mean for the realisation, which is a vector of length L
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the generalized mean of x, which is returned as a vector.

Generalized variance

StatsBase.genvar — Method

genvar(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the generalized sample variance of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the generalized sample variance is computed for each of those length-L realisations, yielding a distribution of generalized sample variance estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the generalized sample variance for the realisation, which is a vector of length L.
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the generalized sample variance of x, which is returned as a vector.

Harmonic mean

StatsBase.harmmean — Method

harmmean(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the harmonic mean of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the harmonic mean is computed for each of those length-L realisations, yielding a distribution of harmonic mean estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the harmonic mean for the realisation.
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the harmonic mean of x, which is returned as a vector.

Geometric mean

StatsBase.geomean — Method

geomean(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the geometric mean of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the geometric mean is computed for each of those length-L realisations, yielding a distribution of geometric mean estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the geometric mean for the realisation.
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the geometric mean of x, which is returned as a vector.

Kurtosis

StatsBase.kurtosis — Method

kurtosis(x::UVAL_COLLECTION_TYPES, n::Int, f = StatsBase.mean)

Obtain a distribution for the kurtosis of a collection of uncertain values.

This is done by first drawing n length-L realisations of x, where L = length(x). Then, the kurtosis is computed for each of those length-L realisations, yielding a distribution of kurtosis estimates.

Optionally, a center function f can be specified. This function is used to compute the center of each draw, i.e. for the i-th draw, call StatsBase.kurtosis(draw_i, f(draw_i)).

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the kurtosis for the realisation.
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the kurtosis of x, which is returned as a vector.

k-th order moment

StatsBase.moment — Method

moment(x::UVAL_COLLECTION_TYPES, k, n::Int)

Obtain a distribution for the k-th order central moment of a collection of uncertain values.

This is done by first drawing n length-L realisations of x, where L = length(x). Then, the k-th order central moment is computed for each of those length-Lrealisations, yielding a distribution of k-th order central moment estimates.

The procedure is as follows.

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the k-th order central moment for the realisation.
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the k-th order central moment of x, which is returned as a vector.

Percentile

StatsBase.percentile — Method

percentile(x::UVAL_COLLECTION_TYPES, p, n::Int)

Obtain a distribution for the percentile(s) p of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the percentile is computed for each of those length-L realisations, yielding a distribution of percentile estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the percentile for the realisation, which is a vector of length L
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the percentile of x, which is returned as a vector.

Renyi entropy

StatsBase.renyientropy — Method

renyientropy(x::UVAL_COLLECTION_TYPES, α, n::Int)

Obtain a distribution for the Rényi (generalized) entropy of order α of a collection of uncertain values.

This is done by first drawing n length-L realisations of x, where L = length(x). Then, the generalized entropy is computed for each of those length-Lrealisations, yielding a distribution of generalized entropy estimates.

The procedure is as follows.

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the Rényi (generalized) entropy of order α for the realisation.
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the Rényi (generalized) entropy of order α of x, which is returned as a vector.

Run-length encoding

StatsBase.rle — Method

rle(x::UVAL_COLLECTION_TYPES, α, n::Int)

Obtain a distribution for the run-length encoding of a collection of uncertain values.

This is done by first drawing n length-L realisations of x, where L = length(x). Then, the run-length encoding is computed for each of those length-Lrealisations, yielding a distribution of run-length encoding estimates.

Returns a vector of tuples of run-length encodings.

The procedure is as follows.

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the run-length encoding for the realisation. This gives a tuple, where the first element of the tuple is a vector of values of the input and the second is the number of consecutive occurrences of each element.
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the run-length encoding of x, which is returned as a vector of the run-length encoding tuples.

Standard error of the mean

StatsBase.sem — Method

sem(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the standard error of the mean of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the standard error of the mean is computed for each of those length-L realisations, yielding a distribution of standard error of the mean estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the standard error of the mean for the realisation, which is a vector of length L.
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the standard error of the mean of x, which is returned as a vector.

Skewness

StatsBase.skewness — Method

skewness(x::UVAL_COLLECTION_TYPES, n::Int, f = StatsBase.mean)

Obtain a distribution for the skewness of a collection of uncertain values.

This is done by first drawing n length-L realisations of x, where L = length(x). Then, the skewness is computed for each of those length-L realisations, yielding a distribution of skewness estimates.

Optionally, a center function f can be specified. This function is used to compute the center of each draw, i.e. for the i-th draw, call StatsBase.skewness(draw_i, f(draw_i)).

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the skewness for the realisation.
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the skewness of x, which is returned as a vector.

Span

StatsBase.span — Method

span(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the span of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the span is computed for each of those length-L realisations, yielding a distribution of span estimates.

Returns a length-L vector of spans, where the i-th span is the range minimum(draw_x_i):maximum(draw_x_i).

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the span for the realisation, which is a vector of length L
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the span of x, which is returned as a vector.

Summary statistics

StatsBase.summarystats — Method

summarystats(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the summary statistics of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the summary statistics is computed for each of those length-L realisations, yielding a distribution of summary statistics estimates.

Returns a length-L vector of SummaryStats objects containing the mean, minimum, 25th percentile, median, 75th percentile, and maximum for each draw of x.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the summary statistics for the realisation, which is a vector of length L
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the summary statistics of x, which is returned as a vector.

Total variance

StatsBase.totalvar — Method

totalvar(x::UVAL_COLLECTION_TYPES, n::Int)

Obtain a distribution for the total variance of a collection of uncertain values. This is done by first drawing n length-L realisations of x, where L = length(x). Then, the total variance is computed for each of those length-L realisations, yielding a distribution of total variance estimates.

Detailed steps:

First, draw a length-L realisation of x by drawing one random number from each uncertain value furnishing the dataset. The draws are independent, so that no element-wise dependencies (e.g. sequential correlations) that are not already present in the data are introduced in the realisation.
Compute the total variance for the realisation, which is a vector of length L
Repeat the procedure n times, drawing n independent realisations of x. This yields n estimates of the total variance of x, which is returned as a vector.