resample
Resampling schemes¶
For some uncertain collections and datasets, special resampling types are available to make resampling easier.
Constrained resampling schemes¶
Constrained resampling¶
#
UncertainData.Resampling.resample
— Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::ConstrainedIndexValueResampling) |
Resample x
by first constraining the supports of the distributions/populations furnishing the uncertain indices and values, then drawing samples from the limited supports.
Sampling is done without assuming any sequential dependence between the elements of x
, such no that no dependence is introduced in the draws beyond what is potentially already present in the collection of values.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | # Some example data N = 50 x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)] y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)] x = UncertainValueDataset(x_uncertain) y = UncertainValueDataset(y_uncertain) time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)]; time_certain = [CertainValue(i) for i = 1:length(x)]; timeinds_x = UncertainIndexDataset(time_uncertain) timeinds_y = UncertainIndexDataset(time_certain) X = UncertainIndexValueDataset(timeinds_x, x) Y = UncertainIndexValueDataset(timeinds_y, y); ########################### # Define resampling scheme ########################### # Truncate each of the indices for x at 0.8 their standard deviation around the mean constraints_x_inds = TruncateStd(0.8) # Truncate each of the indices for y at 1.5 their standard deviation around the mean constraints_y_inds = TruncateStd(1.5) # Truncate each of the values of x at the 20th percentile range constraints_x_vals = [TruncateQuantiles(0.4, 0.6) for i = 1:N]; # Truncate each of the values of x at the 80th percentile range constraints_y_vals = [TruncateQuantiles(0.1, 0.9) for i = 1:N]; cs_x = (constraints_x_inds, constraints_x_vals) cs_y = (constraints_y_inds, constraints_y_vals) ########### # Resample ########### resample(X, ConstrainedIndexValueResampling(cs_x)) resample(Y, ConstrainedIndexValueResampling(cs_y)) |
Sequential resampling schemes¶
Sequential¶
#
UncertainData.Resampling.resample
— Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::SequentialResampling) |
Resample x
according to a sequential resampling constraint.
This way of resampling introduces some serial dependence between the elements of x
- beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing
) to the i
-th value of the dataset imposes constraints on what is possible to sample from the i+1
th value.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # Some example data N = 50 x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)] y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)] x = UncertainValueDataset(x_uncertain) y = UncertainValueDataset(y_uncertain) time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)]; time_certain = [CertainValue(i) for i = 1:length(x)]; timeinds_x = UncertainIndexDataset(time_uncertain) timeinds_y = UncertainIndexDataset(time_certain) X = UncertainIndexValueDataset(timeinds_x, x) Y = UncertainIndexValueDataset(timeinds_y, y); # Resample seq_resampling = SequentialResampling(StrictlyIncreasing()) resample(X, seq_resampling) |
Sequential and interpolated¶
#
UncertainData.Resampling.resample
— Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::SequentialInterpolatedResampling) |
Resample x
according to a sequential resampling constraint, then interpolate the draw(s) to some specified grid.
This way of resampling introduces some serial dependence between the elements of x
- beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing
) to the i
-th value of the dataset imposes constraints on what is possible to sample from the i+1
th value.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # Some example data N = 50 x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)] y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)] x = UncertainValueDataset(x_uncertain) y = UncertainValueDataset(y_uncertain) time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)]; time_certain = [CertainValue(i) for i = 1:length(x)]; timeinds_x = UncertainIndexDataset(time_uncertain) timeinds_y = UncertainIndexDataset(time_certain) X = UncertainIndexValueDataset(timeinds_x, x) Y = UncertainIndexValueDataset(timeinds_y, y); # Resample seqintp_resampling = SequentialInterpolatedResampling(StrictlyIncreasing(), RegularGrid(0:2:N)) resample(X, seqintp_resampling) |
Binned resampling schemes¶
BinnedResampling¶
Missing docstring.
Missing docstring for resample(::AbstractUncertainIndexValueDataset, ::BinnedResampling)
. Check Documenter's build log for details.
BinnedMeanResampling¶
#
UncertainData.Resampling.resample
— Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanResampling) |
Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin.
Distributions in each index bin are obtained by resampling all index values in x
resampling.n
times, and mapping those index draws to the bins. Simultaneously, the values in x
are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n
draws are distributed among the bins to form the final mean estimate.
Returns a vector of mean values, one for each bin.
Assumes that the points in x
are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); n_draws = 10000 # draws per uncertain value time_grid = 0:50:1000 # Resample both X and Y so that they are both at the same time indices, # and take the mean of each bin. resampled_dataset = resample(X, BinnedMeanResampling(time_grid, n_draws)) resampled_dataset = resample(Y, BinnedMeanResampling(time_grid, n_draws)) |
BinnedWeightedResampling¶
#
UncertainData.Resampling.resample
— Method.
1 2 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedWeightedResampling; nan_threshold = 0.0) |
Transform index-irregularly spaced uncertain data onto a regular index-grid.
Distributions in each index bin are obtained by resampling all index values in x
resampling.n
times, sampled according to probabilities resampling.weights
, and mapping those index draws to the bins. Simultaneously, the values in x
are resampled and placed in the corresponding bins. In total, length(x)*resampling.n
draws are distributed among the bins to form the final KDEs.
Returns an UncertainIndexValueDataset
. The distribution of values in the i
-th bin is approximated by a kernel density estimate (KDE) over the draws falling in the i
-th bin.
Assumes that the points in x
are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); left_bin_edges = 0:50:1000 n_draws = 10000 wts = Weights(rand(length(X))) resampling = BinnedWeightedResampling(left_bin_edges, wts, 10) resampled_dataset = resample(X, resampling) |
BinnedMeanWeightedResampling¶
#
UncertainData.Resampling.resample
— Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanWeightedResampling) |
Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin. Resamples the data points in x
according to resampling.weights
.
Distributions in each index bin are obtained by resampling all index values in x
resampling.n
times, in proportions obeying resampling.weights
and mapping those index draws to the bins. Simultaneously, the values in x
are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n
draws are distributed among the bins to form the final mean estimate.
Returns a vector of mean values, one for each bin.
Assumes that the points in x
are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); n_draws = 10000 # draws per uncertain value time_grid = 0:50:1000 wts = Weights(rand(length(X))) # some random weights # Resample both X and Y so that they are both at the same time indices, # and take the mean of each bin. resampled_dataset = resample(X, BinnedMeanWeightedResampling(time_grid, wts, n_draws)) resampled_dataset = resample(Y, BinnedMeanWeightedResampling(time_grid, wts, n_draws)) |
Interpolated-and-binned resampling¶
InterpolateAndBin resampling¶
#
UncertainData.Resampling.resample
— Method.
1 | resample(udata::AbstractUncertainIndexValueDataset, regularization_scheme::InterpolateAndBin{Linear}) |
Draw a single realisation of udata
and interpolate-and-bin the data according to the provided regularization scheme. Assumes points in udata
are independent and sorts the draw according to the index values before interpolating. See also InterpolateAndBin
.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | npts = 50 y = rand(npts) N = Normal(0, 1) for t in 3:npts y[t,1] = 0.7*y[t-1,1] - 0.35*y[t-2,1] + rand(N) end # Assume data are unevenly spaced time = sample(1.0:npts*5, npts, ordered = true, replace = false) # Assign some uncertainties to both time indices and values and gather # in an UncertainIndexValueDataset utime = UncertainValue.(Normal.(time, 2)) uy = UncertainValue.(Normal.(y, 0.1)) udata = UncertainIndexValueDataset(utime, uy) # Interpolation-and-binning scheme. First interpolate to a very fine grid, # then gather the points falling in each of the coarser bins and summarise # each bin using the mean of the points in each bin. left_bin_edges = 0:10:npts*5 r = InterpolateAndBin(mean, left_bin_edges, Linear(), 0:0.1:1000, Flat(OnGrid())) # The binned time axis: time_binned = left_bin_edges[1:end-1] .+ step(left_bin_edges)/2 # Get a set corresponding resampled (interpolated+binned) values y_binned = resample(udata, r) # Plot some interpolated+binned draws time_binned = left_bin_edges[1:end-1] .+ step(left_bin_edges)/2 p = plot(xlabel = "time", ylabel = "value") for i = 1:100 plot!(time_binned, resample(udata, r), lw = 0.3, α = 0.2, ms = 0.1, c = :red, marker = stroke(0.1), label = "") end plot!(time, y, c = :black, lw = 1, ms = 2, marker = stroke(2.0, :black), label = "") plot!(udata, c = :black, lw = 1, ms = 2, marker = stroke(0.1, :black), [0.05, 0.95], [0.05, 0.95]) vline!(left_bin_edges, c = :black, α = 0.3, lw = 0.3, label = "") |