Resampling schemes
For some uncertain collections and datasets, special resampling types are available to make resampling easier.
Constrained resampling schemes
Constrained resampling
UncertainData.Resampling.resample
— Methodresample(x::AbstractUncertainIndexValueDataset, resampling::ConstrainedIndexValueResampling)
Resample x
by first constraining the supports of the distributions/populations furnishing the uncertain indices and values, then drawing samples from the limited supports.
Sampling is done without assuming any sequential dependence between the elements of x
, such no that no dependence is introduced in the draws beyond what is potentially already present in the collection of values.
Example
# Some example data
N = 50
x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)]
y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)]
x = UncertainValueDataset(x_uncertain)
y = UncertainValueDataset(y_uncertain)
time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)];
time_certain = [CertainValue(i) for i = 1:length(x)];
timeinds_x = UncertainIndexDataset(time_uncertain)
timeinds_y = UncertainIndexDataset(time_certain)
X = UncertainIndexValueDataset(timeinds_x, x)
Y = UncertainIndexValueDataset(timeinds_y, y);
###########################
# Define resampling scheme
###########################
# Truncate each of the indices for x at 0.8 their standard deviation around the mean
constraints_x_inds = TruncateStd(0.8)
# Truncate each of the indices for y at 1.5 their standard deviation around the mean
constraints_y_inds = TruncateStd(1.5)
# Truncate each of the values of x at the 20th percentile range
constraints_x_vals = [TruncateQuantiles(0.4, 0.6) for i = 1:N];
# Truncate each of the values of x at the 80th percentile range
constraints_y_vals = [TruncateQuantiles(0.1, 0.9) for i = 1:N];
cs_x = (constraints_x_inds, constraints_x_vals)
cs_y = (constraints_y_inds, constraints_y_vals)
###########
# Resample
###########
resample(X, ConstrainedIndexValueResampling(cs_x))
resample(Y, ConstrainedIndexValueResampling(cs_y))
Sequential resampling schemes
Sequential
UncertainData.Resampling.resample
— Methodresample(x::AbstractUncertainIndexValueDataset, resampling::SequentialResampling)
Resample x
according to a sequential resampling constraint.
This way of resampling introduces some serial dependence between the elements of x
- beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing
) to the i
-th value of the dataset imposes constraints on what is possible to sample from the i+1
th value.
Example
# Some example data
N = 50
x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)]
y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)]
x = UncertainValueDataset(x_uncertain)
y = UncertainValueDataset(y_uncertain)
time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)];
time_certain = [CertainValue(i) for i = 1:length(x)];
timeinds_x = UncertainIndexDataset(time_uncertain)
timeinds_y = UncertainIndexDataset(time_certain)
X = UncertainIndexValueDataset(timeinds_x, x)
Y = UncertainIndexValueDataset(timeinds_y, y);
# Resample
seq_resampling = SequentialResampling(StrictlyIncreasing())
resample(X, seq_resampling)
Sequential and interpolated
UncertainData.Resampling.resample
— Methodresample(x::AbstractUncertainIndexValueDataset, resampling::SequentialInterpolatedResampling)
Resample x
according to a sequential resampling constraint, then interpolate the draw(s) to some specified grid.
This way of resampling introduces some serial dependence between the elements of x
- beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing
) to the i
-th value of the dataset imposes constraints on what is possible to sample from the i+1
th value.
Example
# Some example data
N = 50
x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)]
y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)]
x = UncertainValueDataset(x_uncertain)
y = UncertainValueDataset(y_uncertain)
time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)];
time_certain = [CertainValue(i) for i = 1:length(x)];
timeinds_x = UncertainIndexDataset(time_uncertain)
timeinds_y = UncertainIndexDataset(time_certain)
X = UncertainIndexValueDataset(timeinds_x, x)
Y = UncertainIndexValueDataset(timeinds_y, y);
# Resample
seqintp_resampling = SequentialInterpolatedResampling(StrictlyIncreasing(), RegularGrid(0:2:N))
resample(X, seqintp_resampling)
Binned resampling schemes
BinnedResampling
Missing docstring for resample(::AbstractUncertainIndexValueDataset, ::BinnedResampling)
. Check Documenter's build log for details.
BinnedMeanResampling
UncertainData.Resampling.resample
— Methodresample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanResampling)
Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin.
Distributions in each index bin are obtained by resampling all index values in x
resampling.n
times, and mapping those index draws to the bins. Simultaneously, the values in x
are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n
draws are distributed among the bins to form the final mean estimate.
Returns a vector of mean values, one for each bin.
Assumes that the points in x
are independent.
Example
vars = (1, 2)
npts, tstep = 100, 10
d_xind = Uniform(2.5, 15.5)
d_yind = Uniform(2.5, 15.5)
d_xval = Uniform(0.01, 0.2)
d_yval = Uniform(0.01, 0.2)
X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep,
d_xind = d_xind, d_yind = d_yind,
d_xval = d_xval, d_yval = d_yval);
n_draws = 10000 # draws per uncertain value
time_grid = 0:50:1000
# Resample both X and Y so that they are both at the same time indices,
# and take the mean of each bin.
resampled_dataset = resample(X, BinnedMeanResampling(time_grid, n_draws))
resampled_dataset = resample(Y, BinnedMeanResampling(time_grid, n_draws))
BinnedWeightedResampling
UncertainData.Resampling.resample
— Methodresample(x::AbstractUncertainIndexValueDataset, resampling::BinnedWeightedResampling;
nan_threshold = 0.0)
Transform index-irregularly spaced uncertain data onto a regular index-grid.
Distributions in each index bin are obtained by resampling all index values in x
resampling.n
times, sampled according to probabilities resampling.weights
, and mapping those index draws to the bins. Simultaneously, the values in x
are resampled and placed in the corresponding bins. In total, length(x)*resampling.n
draws are distributed among the bins to form the final KDEs.
Returns an UncertainIndexValueDataset
. The distribution of values in the i
-th bin is approximated by a kernel density estimate (KDE) over the draws falling in the i
-th bin.
Assumes that the points in x
are independent.
Example
vars = (1, 2)
npts, tstep = 100, 10
d_xind = Uniform(2.5, 15.5)
d_yind = Uniform(2.5, 15.5)
d_xval = Uniform(0.01, 0.2)
d_yval = Uniform(0.01, 0.2)
X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep,
d_xind = d_xind, d_yind = d_yind,
d_xval = d_xval, d_yval = d_yval);
left_bin_edges = 0:50:1000
n_draws = 10000
wts = Weights(rand(length(X)))
resampling = BinnedWeightedResampling(left_bin_edges, wts, 10)
resampled_dataset = resample(X, resampling)
BinnedMeanWeightedResampling
UncertainData.Resampling.resample
— Methodresample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanWeightedResampling)
Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin. Resamples the data points in x
according to resampling.weights
.
Distributions in each index bin are obtained by resampling all index values in x
resampling.n
times, in proportions obeying resampling.weights
and mapping those index draws to the bins. Simultaneously, the values in x
are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n
draws are distributed among the bins to form the final mean estimate.
Returns a vector of mean values, one for each bin.
Assumes that the points in x
are independent.
Example
vars = (1, 2)
npts, tstep = 100, 10
d_xind = Uniform(2.5, 15.5)
d_yind = Uniform(2.5, 15.5)
d_xval = Uniform(0.01, 0.2)
d_yval = Uniform(0.01, 0.2)
X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep,
d_xind = d_xind, d_yind = d_yind,
d_xval = d_xval, d_yval = d_yval);
n_draws = 10000 # draws per uncertain value
time_grid = 0:50:1000
wts = Weights(rand(length(X))) # some random weights
# Resample both X and Y so that they are both at the same time indices,
# and take the mean of each bin.
resampled_dataset = resample(X, BinnedMeanWeightedResampling(time_grid, wts, n_draws))
resampled_dataset = resample(Y, BinnedMeanWeightedResampling(time_grid, wts, n_draws))
Interpolated-and-binned resampling
InterpolateAndBin resampling
UncertainData.Resampling.resample
— Methodresample(udata::AbstractUncertainIndexValueDataset, regularization_scheme::InterpolateAndBin{Linear})
Draw a single realisation of udata
and interpolate-and-bin the data according to the provided regularization scheme. Assumes points in udata
are independent and sorts the draw according to the index values before interpolating. See also InterpolateAndBin
.
Example
npts = 50
y = rand(npts)
N = Normal(0, 1)
for t in 3:npts
y[t,1] = 0.7*y[t-1,1] - 0.35*y[t-2,1] + rand(N)
end
# Assume data are unevenly spaced
time = sample(1.0:npts*5, npts, ordered = true, replace = false)
# Assign some uncertainties to both time indices and values and gather
# in an UncertainIndexValueDataset
utime = UncertainValue.(Normal.(time, 2))
uy = UncertainValue.(Normal.(y, 0.1))
udata = UncertainIndexValueDataset(utime, uy)
# Interpolation-and-binning scheme. First interpolate to a very fine grid,
# then gather the points falling in each of the coarser bins and summarise
# each bin using the mean of the points in each bin.
left_bin_edges = 0:10:npts*5
r = InterpolateAndBin(mean, left_bin_edges, Linear(), 0:0.1:1000, Flat(OnGrid()))
# The binned time axis:
time_binned = left_bin_edges[1:end-1] .+ step(left_bin_edges)/2
# Get a set corresponding resampled (interpolated+binned) values
y_binned = resample(udata, r)
# Plot some interpolated+binned draws
time_binned = left_bin_edges[1:end-1] .+ step(left_bin_edges)/2
p = plot(xlabel = "time", ylabel = "value")
for i = 1:100
plot!(time_binned, resample(udata, r), lw = 0.3, α = 0.2, ms = 0.1, c = :red,
marker = stroke(0.1), label = "")
end
plot!(time, y, c = :black, lw = 1, ms = 2, marker = stroke(2.0, :black), label = "")
plot!(udata, c = :black, lw = 1, ms = 2, marker = stroke(0.1, :black), [0.05, 0.95], [0.05, 0.95])
vline!(left_bin_edges, c = :black, α = 0.3, lw = 0.3, label = "")