Resampling schemes

For some uncertain collections and datasets, special resampling types are available to make resampling easier.

Constrained resampling schemes

Constrained resampling

UncertainData.Resampling.resampleMethod
resample(x::AbstractUncertainIndexValueDataset, resampling::ConstrainedIndexValueResampling)

Resample x by first constraining the supports of the distributions/populations furnishing the uncertain indices and values, then drawing samples from the limited supports.

Sampling is done without assuming any sequential dependence between the elements of x, such no that no dependence is introduced in the draws beyond what is potentially already present in the collection of values.

Example

# Some example data 
N = 50
x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)]
y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)]
x = UncertainValueDataset(x_uncertain)
y = UncertainValueDataset(y_uncertain)

time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)];
time_certain = [CertainValue(i) for i = 1:length(x)];
timeinds_x = UncertainIndexDataset(time_uncertain)
timeinds_y = UncertainIndexDataset(time_certain)

X = UncertainIndexValueDataset(timeinds_x, x)
Y = UncertainIndexValueDataset(timeinds_y, y);

###########################
# Define resampling scheme 
###########################

# Truncate each of the indices for x at 0.8 their standard deviation around the mean
constraints_x_inds = TruncateStd(0.8)

# Truncate each of the indices for y at 1.5 their standard deviation around the mean
constraints_y_inds = TruncateStd(1.5)

# Truncate each of the values of x at the 20th percentile range
constraints_x_vals = [TruncateQuantiles(0.4, 0.6) for i = 1:N];

# Truncate each of the values of x at the 80th percentile range
constraints_y_vals = [TruncateQuantiles(0.1, 0.9) for i = 1:N];

cs_x = (constraints_x_inds, constraints_x_vals)
cs_y = (constraints_y_inds, constraints_y_vals)

###########
# Resample 
###########
resample(X, ConstrainedIndexValueResampling(cs_x))
resample(Y, ConstrainedIndexValueResampling(cs_y))
source

Sequential resampling schemes

Sequential

UncertainData.Resampling.resampleMethod
resample(x::AbstractUncertainIndexValueDataset, resampling::SequentialResampling)

Resample x according to a sequential resampling constraint.

This way of resampling introduces some serial dependence between the elements of x - beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing) to the i-th value of the dataset imposes constraints on what is possible to sample from the i+1th value.

Example

# Some example data 
N = 50
x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)]
y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)]
x = UncertainValueDataset(x_uncertain)
y = UncertainValueDataset(y_uncertain)

time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)];
time_certain = [CertainValue(i) for i = 1:length(x)];
timeinds_x = UncertainIndexDataset(time_uncertain)
timeinds_y = UncertainIndexDataset(time_certain)

X = UncertainIndexValueDataset(timeinds_x, x)
Y = UncertainIndexValueDataset(timeinds_y, y);

# Resample 
seq_resampling = SequentialResampling(StrictlyIncreasing())
resample(X, seq_resampling)
source

Sequential and interpolated

UncertainData.Resampling.resampleMethod
resample(x::AbstractUncertainIndexValueDataset, resampling::SequentialInterpolatedResampling)

Resample x according to a sequential resampling constraint, then interpolate the draw(s) to some specified grid.

This way of resampling introduces some serial dependence between the elements of x - beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing) to the i-th value of the dataset imposes constraints on what is possible to sample from the i+1th value.

Example

# Some example data 
N = 50
x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)]
y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)]
x = UncertainValueDataset(x_uncertain)
y = UncertainValueDataset(y_uncertain)

time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)];
time_certain = [CertainValue(i) for i = 1:length(x)];
timeinds_x = UncertainIndexDataset(time_uncertain)
timeinds_y = UncertainIndexDataset(time_certain)

X = UncertainIndexValueDataset(timeinds_x, x)
Y = UncertainIndexValueDataset(timeinds_y, y);

# Resample 
seqintp_resampling = SequentialInterpolatedResampling(StrictlyIncreasing(), RegularGrid(0:2:N))
resample(X, seqintp_resampling)
source

Binned resampling schemes

BinnedResampling

Missing docstring.

Missing docstring for resample(::AbstractUncertainIndexValueDataset, ::BinnedResampling). Check Documenter's build log for details.

BinnedMeanResampling

UncertainData.Resampling.resampleMethod
resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanResampling)

Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin.

Distributions in each index bin are obtained by resampling all index values in x resampling.n times, and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n draws are distributed among the bins to form the final mean estimate.

Returns a vector of mean values, one for each bin.

Assumes that the points in x are independent.

Example

vars = (1, 2)
npts, tstep = 100, 10
d_xind = Uniform(2.5, 15.5)
d_yind = Uniform(2.5, 15.5)
d_xval = Uniform(0.01, 0.2)
d_yval = Uniform(0.01, 0.2)

X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep,
    d_xind = d_xind, d_yind = d_yind,
    d_xval = d_xval, d_yval = d_yval);

n_draws = 10000 # draws per uncertain value
time_grid = 0:50:1000

# Resample both X and Y so that they are both at the same time indices, 
# and take the mean of each bin.
resampled_dataset = resample(X, BinnedMeanResampling(time_grid, n_draws))
resampled_dataset = resample(Y, BinnedMeanResampling(time_grid, n_draws))
source

BinnedWeightedResampling

UncertainData.Resampling.resampleMethod
resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedWeightedResampling;
    nan_threshold = 0.0)

Transform index-irregularly spaced uncertain data onto a regular index-grid.

Distributions in each index bin are obtained by resampling all index values in x resampling.n times, sampled according to probabilities resampling.weights, and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. In total, length(x)*resampling.n draws are distributed among the bins to form the final KDEs.

Returns an UncertainIndexValueDataset. The distribution of values in the i-th bin is approximated by a kernel density estimate (KDE) over the draws falling in the i-th bin.

Assumes that the points in x are independent.

Example

vars = (1, 2)
npts, tstep = 100, 10
d_xind = Uniform(2.5, 15.5)
d_yind = Uniform(2.5, 15.5)
d_xval = Uniform(0.01, 0.2)
d_yval = Uniform(0.01, 0.2)

X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep,
    d_xind = d_xind, d_yind = d_yind,
    d_xval = d_xval, d_yval = d_yval);

left_bin_edges = 0:50:1000
n_draws = 10000
wts = Weights(rand(length(X)))
resampling = BinnedWeightedResampling(left_bin_edges, wts, 10)
resampled_dataset = resample(X, resampling)
source

BinnedMeanWeightedResampling

UncertainData.Resampling.resampleMethod
resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanWeightedResampling)

Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin. Resamples the data points in x according to resampling.weights.

Distributions in each index bin are obtained by resampling all index values in x resampling.n times, in proportions obeying resampling.weights and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n draws are distributed among the bins to form the final mean estimate.

Returns a vector of mean values, one for each bin.

Assumes that the points in x are independent.

Example

vars = (1, 2)
npts, tstep = 100, 10
d_xind = Uniform(2.5, 15.5)
d_yind = Uniform(2.5, 15.5)
d_xval = Uniform(0.01, 0.2)
d_yval = Uniform(0.01, 0.2)

X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep,
d_xind = d_xind, d_yind = d_yind,
d_xval = d_xval, d_yval = d_yval);

n_draws = 10000 # draws per uncertain value
time_grid = 0:50:1000
wts = Weights(rand(length(X))) # some random weights

# Resample both X and Y so that they are both at the same time indices, 
# and take the mean of each bin.
resampled_dataset = resample(X, BinnedMeanWeightedResampling(time_grid, wts, n_draws))
resampled_dataset = resample(Y, BinnedMeanWeightedResampling(time_grid, wts, n_draws))
source

Interpolated-and-binned resampling

InterpolateAndBin resampling

UncertainData.Resampling.resampleMethod
resample(udata::AbstractUncertainIndexValueDataset, regularization_scheme::InterpolateAndBin{Linear})

Draw a single realisation of udata and interpolate-and-bin the data according to the provided regularization scheme. Assumes points in udata are independent and sorts the draw according to the index values before interpolating. See also InterpolateAndBin.

Example

npts = 50
y = rand(npts) 

N = Normal(0, 1)

for t in 3:npts
    y[t,1] = 0.7*y[t-1,1] - 0.35*y[t-2,1] + rand(N)
end

# Assume data are unevenly spaced 
time = sample(1.0:npts*5, npts, ordered = true, replace = false)

# Assign some uncertainties to both time indices and values and gather 
# in an UncertainIndexValueDataset
utime = UncertainValue.(Normal.(time, 2))
uy = UncertainValue.(Normal.(y, 0.1))
udata = UncertainIndexValueDataset(utime, uy)

# Interpolation-and-binning scheme. First interpolate to a very fine grid,
# then gather the points falling in each of the coarser bins and summarise 
# each bin using the mean of the points in each bin.
left_bin_edges = 0:10:npts*5
r = InterpolateAndBin(mean, left_bin_edges, Linear(), 0:0.1:1000, Flat(OnGrid()))

# The binned time axis:
time_binned = left_bin_edges[1:end-1] .+ step(left_bin_edges)/2

# Get a set corresponding resampled (interpolated+binned) values
y_binned = resample(udata, r)

# Plot some interpolated+binned draws
time_binned = left_bin_edges[1:end-1] .+ step(left_bin_edges)/2

p = plot(xlabel = "time", ylabel = "value")
for i = 1:100
    plot!(time_binned, resample(udata, r), lw = 0.3, α = 0.2, ms = 0.1, c = :red, 
        marker = stroke(0.1), label = "")
end
plot!(time, y, c = :black, lw = 1, ms = 2, marker = stroke(2.0, :black), label = "")
plot!(udata, c = :black, lw = 1, ms = 2, marker = stroke(0.1, :black), [0.05, 0.95], [0.05, 0.95])
vline!(left_bin_edges, c = :black, α = 0.3, lw = 0.3, label = "")
source