

Applying resampling schemes

Resampling schemes¶

For some uncertain collections and datasets, special resampling types are available to make resampling easier.

Constrained resampling schemes¶

Constrained resampling¶

# UncertainData.Resampling.resample — Method.

1	resample(x::AbstractUncertainIndexValueDataset, resampling::ConstrainedIndexValueResampling)

Resample x by first constraining the supports of the distributions/populations furnishing the uncertain indices and values, then drawing samples from the limited supports.

Sampling is done without assuming any sequential dependence between the elements of x, such no that no dependence is introduced in the draws beyond what is potentially already present in the collection of values.

Example

# Some example data 
N = 50
x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)]
y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)]
x = UncertainValueDataset(x_uncertain)
y = UncertainValueDataset(y_uncertain)

time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)];
time_certain = [CertainValue(i) for i = 1:length(x)];
timeinds_x = UncertainIndexDataset(time_uncertain)
timeinds_y = UncertainIndexDataset(time_certain)

X = UncertainIndexValueDataset(timeinds_x, x)
Y = UncertainIndexValueDataset(timeinds_y, y);

###########################
# Define resampling scheme 
###########################

# Truncate each of the indices for x at 0.8 their standard deviation around the mean
constraints_x_inds = TruncateStd(0.8)

# Truncate each of the indices for y at 1.5 their standard deviation around the mean
constraints_y_inds = TruncateStd(1.5)

# Truncate each of the values of x at the 20th percentile range
constraints_x_vals = [TruncateQuantiles(0.4, 0.6) for i = 1:N];

# Truncate each of the values of x at the 80th percentile range
constraints_y_vals = [TruncateQuantiles(0.1, 0.9) for i = 1:N];

cs_x = (constraints_x_inds, constraints_x_vals)
cs_y = (constraints_y_inds, constraints_y_vals)

###########
# Resample 
###########
resample(X, ConstrainedIndexValueResampling(cs_x))
resample(Y, ConstrainedIndexValueResampling(cs_y))

source

Sequential resampling schemes¶

Sequential¶

# UncertainData.Resampling.resample — Method.

1	resample(x::AbstractUncertainIndexValueDataset, resampling::SequentialResampling)

Resample x according to a sequential resampling constraint.

This way of resampling introduces some serial dependence between the elements of x - beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing) to the i-th value of the dataset imposes constraints on what is possible to sample from the i+1th value.

Example

# Some example data 
N = 50
x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)]
y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)]
x = UncertainValueDataset(x_uncertain)
y = UncertainValueDataset(y_uncertain)

time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)];
time_certain = [CertainValue(i) for i = 1:length(x)];
timeinds_x = UncertainIndexDataset(time_uncertain)
timeinds_y = UncertainIndexDataset(time_certain)

X = UncertainIndexValueDataset(timeinds_x, x)
Y = UncertainIndexValueDataset(timeinds_y, y);

# Resample 
seq_resampling = SequentialResampling(StrictlyIncreasing())
resample(X, seq_resampling)

source

Sequential and interpolated¶

# UncertainData.Resampling.resample — Method.

1	resample(x::AbstractUncertainIndexValueDataset, resampling::SequentialInterpolatedResampling)

Resample x according to a sequential resampling constraint, then interpolate the draw(s) to some specified grid.

This way of resampling introduces some serial dependence between the elements of x - beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing) to the i-th value of the dataset imposes constraints on what is possible to sample from the i+1th value.

Example

# Some example data 
N = 50
x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)]
y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)]
x = UncertainValueDataset(x_uncertain)
y = UncertainValueDataset(y_uncertain)

time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)];
time_certain = [CertainValue(i) for i = 1:length(x)];
timeinds_x = UncertainIndexDataset(time_uncertain)
timeinds_y = UncertainIndexDataset(time_certain)

X = UncertainIndexValueDataset(timeinds_x, x)
Y = UncertainIndexValueDataset(timeinds_y, y);

# Resample 
seqintp_resampling = SequentialInterpolatedResampling(StrictlyIncreasing(), RegularGrid(0:2:N))
resample(X, seqintp_resampling)

source

Binned resampling schemes¶

BinnedResampling¶

# UncertainData.Resampling.resample — Method.

1 2	resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedResampling; nan_threshold = 0.0)

Transform index-irregularly spaced uncertain data onto a regular index-grid. Distributions in each index bin are obtained by resampling all index values in x resampling.n times, and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. In total, length(x)*resampling.n draws are distributed among the bins to form the final KDEs.

Returns an UncertainIndexValueDataset. The distribution of values in the i-th bin is approximated by a kernel density estimate (KDE) over the draws falling in the i-th bin.

Assumes that the points in x are independent.

Example

vars = (1, 2)
npts, tstep = 100, 10
d_xind = Uniform(2.5, 15.5)
d_yind = Uniform(2.5, 15.5)
d_xval = Uniform(0.01, 0.2)
d_yval = Uniform(0.01, 0.2)

X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep,
    d_xind = d_xind, d_yind = d_yind,
    d_xval = d_xval, d_yval = d_yval);

n_draws = 10000 # draws per uncertain value
time_grid = 0:50:1000
resampling = BinnedResampling(time_grid, n_draws)

# Resample both X and Y so that they are both at the same time indices.
resampled_dataset = resample(X, resampling)
resampled_dataset = resample(Y, resampling)

source

BinnedMeanResampling¶

# UncertainData.Resampling.resample — Method.

1	resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanResampling)

Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin.

Distributions in each index bin are obtained by resampling all index values in x resampling.n times, and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n draws are distributed among the bins to form the final mean estimate.

Returns a vector of mean values, one for each bin.

Assumes that the points in x are independent.

Example

vars = (1, 2)
npts, tstep = 100, 10
d_xind = Uniform(2.5, 15.5)
d_yind = Uniform(2.5, 15.5)
d_xval = Uniform(0.01, 0.2)
d_yval = Uniform(0.01, 0.2)

X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep,
    d_xind = d_xind, d_yind = d_yind,
    d_xval = d_xval, d_yval = d_yval);

n_draws = 10000 # draws per uncertain value
time_grid = 0:50:1000

# Resample both X and Y so that they are both at the same time indices, 
# and take the mean of each bin.
resampled_dataset = resample(X, BinnedMeanResampling(time_grid, n_draws))
resampled_dataset = resample(Y, BinnedMeanResampling(time_grid, n_draws))

source

BinnedWeightedResampling¶

# UncertainData.Resampling.resample — Method.

1 2	resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedWeightedResampling; nan_threshold = 0.0)

Transform index-irregularly spaced uncertain data onto a regular index-grid. Distributions in each index bin are obtained by resampling all index values in x resampling.n times, sampled according to probabilities resampling.weights, and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. In total, length(x)*resampling.n draws are distributed among the bins to form the final KDEs.

Returns an UncertainIndexValueDataset. The distribution of values in the i-th bin is approximated by a kernel density estimate (KDE) over the draws falling in the i-th bin.

Assumes that the points in x are independent.

Example

vars = (1, 2)
npts, tstep = 100, 10
d_xind = Uniform(2.5, 15.5)
d_yind = Uniform(2.5, 15.5)
d_xval = Uniform(0.01, 0.2)
d_yval = Uniform(0.01, 0.2)

X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep,
    d_xind = d_xind, d_yind = d_yind,
    d_xval = d_xval, d_yval = d_yval);

left_bin_edges = 0:50:1000
n_draws = 10000
wts = Weights(rand(length(X)))
resampling = BinnedWeightedResampling(left_bin_edges, wts, 10)
resampled_dataset = resample(X, resampling)

source

BinnedMeanWeightedResampling¶

# UncertainData.Resampling.resample — Method.

1	resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanWeightedResampling)

Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin. Resamples the data points in x according to resampling.weights.

Distributions in each index bin are obtained by resampling all index values in x resampling.n times, in proportions obeying resampling.weights and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n draws are distributed among the bins to form the final mean estimate.

Returns a vector of mean values, one for each bin.

Assumes that the points in x are independent.

Example

vars = (1, 2)
npts, tstep = 100, 10
d_xind = Uniform(2.5, 15.5)
d_yind = Uniform(2.5, 15.5)
d_xval = Uniform(0.01, 0.2)
d_yval = Uniform(0.01, 0.2)

X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep,
d_xind = d_xind, d_yind = d_yind,
d_xval = d_xval, d_yval = d_yval);

n_draws = 10000 # draws per uncertain value
time_grid = 0:50:1000
wts = Weights(rand(length(X))) # some random weights

# Resample both X and Y so that they are both at the same time indices, 
# and take the mean of each bin.
resampled_dataset = resample(X, BinnedMeanWeightedResampling(time_grid, wts, n_draws))
resampled_dataset = resample(Y, BinnedMeanWeightedResampling(time_grid, wts, n_draws))

source