resample
Resampling schemes¶
For some uncertain collections and datasets, special resampling types are available to make resampling easier.
Constrained resampling schemes¶
Constrained resampling¶
#
UncertainData.Resampling.resample — Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::ConstrainedIndexValueResampling) |
Resample x by first constraining the supports of the distributions/populations furnishing the uncertain indices and values, then drawing samples from the limited supports.
Sampling is done without assuming any sequential dependence between the elements of x, such no that no dependence is introduced in the draws beyond what is potentially already present in the collection of values.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | # Some example data N = 50 x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)] y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)] x = UncertainValueDataset(x_uncertain) y = UncertainValueDataset(y_uncertain) time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)]; time_certain = [CertainValue(i) for i = 1:length(x)]; timeinds_x = UncertainIndexDataset(time_uncertain) timeinds_y = UncertainIndexDataset(time_certain) X = UncertainIndexValueDataset(timeinds_x, x) Y = UncertainIndexValueDataset(timeinds_y, y); ########################### # Define resampling scheme ########################### # Truncate each of the indices for x at 0.8 their standard deviation around the mean constraints_x_inds = TruncateStd(0.8) # Truncate each of the indices for y at 1.5 their standard deviation around the mean constraints_y_inds = TruncateStd(1.5) # Truncate each of the values of x at the 20th percentile range constraints_x_vals = [TruncateQuantiles(0.4, 0.6) for i = 1:N]; # Truncate each of the values of x at the 80th percentile range constraints_y_vals = [TruncateQuantiles(0.1, 0.9) for i = 1:N]; cs_x = (constraints_x_inds, constraints_x_vals) cs_y = (constraints_y_inds, constraints_y_vals) ########### # Resample ########### resample(X, ConstrainedIndexValueResampling(cs_x)) resample(Y, ConstrainedIndexValueResampling(cs_y)) |
Sequential resampling schemes¶
Sequential¶
#
UncertainData.Resampling.resample — Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::SequentialResampling) |
Resample x according to a sequential resampling constraint.
This way of resampling introduces some serial dependence between the elements of x - beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing) to the i-th value of the dataset imposes constraints on what is possible to sample from the i+1th value.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # Some example data N = 50 x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)] y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)] x = UncertainValueDataset(x_uncertain) y = UncertainValueDataset(y_uncertain) time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)]; time_certain = [CertainValue(i) for i = 1:length(x)]; timeinds_x = UncertainIndexDataset(time_uncertain) timeinds_y = UncertainIndexDataset(time_certain) X = UncertainIndexValueDataset(timeinds_x, x) Y = UncertainIndexValueDataset(timeinds_y, y); # Resample seq_resampling = SequentialResampling(StrictlyIncreasing()) resample(X, seq_resampling) |
Sequential and interpolated¶
#
UncertainData.Resampling.resample — Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::SequentialInterpolatedResampling) |
Resample x according to a sequential resampling constraint, then interpolate the draw(s) to some specified grid.
This way of resampling introduces some serial dependence between the elements of x - beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing) to the i-th value of the dataset imposes constraints on what is possible to sample from the i+1th value.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # Some example data N = 50 x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)] y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)] x = UncertainValueDataset(x_uncertain) y = UncertainValueDataset(y_uncertain) time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)]; time_certain = [CertainValue(i) for i = 1:length(x)]; timeinds_x = UncertainIndexDataset(time_uncertain) timeinds_y = UncertainIndexDataset(time_certain) X = UncertainIndexValueDataset(timeinds_x, x) Y = UncertainIndexValueDataset(timeinds_y, y); # Resample seqintp_resampling = SequentialInterpolatedResampling(StrictlyIncreasing(), RegularGrid(0:2:N)) resample(X, seqintp_resampling) |
Binned resampling schemes¶
BinnedResampling¶
Missing docstring.
Missing docstring for resample(::AbstractUncertainIndexValueDataset, ::BinnedResampling). Check Documenter's build log for details.
BinnedMeanResampling¶
#
UncertainData.Resampling.resample — Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanResampling) |
Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin.
Distributions in each index bin are obtained by resampling all index values in x resampling.n times, and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n draws are distributed among the bins to form the final mean estimate.
Returns a vector of mean values, one for each bin.
Assumes that the points in x are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); n_draws = 10000 # draws per uncertain value time_grid = 0:50:1000 # Resample both X and Y so that they are both at the same time indices, # and take the mean of each bin. resampled_dataset = resample(X, BinnedMeanResampling(time_grid, n_draws)) resampled_dataset = resample(Y, BinnedMeanResampling(time_grid, n_draws)) |
BinnedWeightedResampling¶
#
UncertainData.Resampling.resample — Method.
1 2 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedWeightedResampling; nan_threshold = 0.0) |
Transform index-irregularly spaced uncertain data onto a regular index-grid.
Distributions in each index bin are obtained by resampling all index values in x resampling.n times, sampled according to probabilities resampling.weights, and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. In total, length(x)*resampling.n draws are distributed among the bins to form the final KDEs.
Returns an UncertainIndexValueDataset. The distribution of values in the i-th bin is approximated by a kernel density estimate (KDE) over the draws falling in the i-th bin.
Assumes that the points in x are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); left_bin_edges = 0:50:1000 n_draws = 10000 wts = Weights(rand(length(X))) resampling = BinnedWeightedResampling(left_bin_edges, wts, 10) resampled_dataset = resample(X, resampling) |
BinnedMeanWeightedResampling¶
#
UncertainData.Resampling.resample — Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanWeightedResampling) |
Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin. Resamples the data points in x according to resampling.weights.
Distributions in each index bin are obtained by resampling all index values in x resampling.n times, in proportions obeying resampling.weights and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n draws are distributed among the bins to form the final mean estimate.
Returns a vector of mean values, one for each bin.
Assumes that the points in x are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); n_draws = 10000 # draws per uncertain value time_grid = 0:50:1000 wts = Weights(rand(length(X))) # some random weights # Resample both X and Y so that they are both at the same time indices, # and take the mean of each bin. resampled_dataset = resample(X, BinnedMeanWeightedResampling(time_grid, wts, n_draws)) resampled_dataset = resample(Y, BinnedMeanWeightedResampling(time_grid, wts, n_draws)) |
Interpolated-and-binned resampling¶
InterpolateAndBin resampling¶
#
UncertainData.Resampling.resample — Method.
1 | resample(udata::AbstractUncertainIndexValueDataset, regularization_scheme::InterpolateAndBin{Linear}) |
Draw a single realisation of udata and interpolate-and-bin the data according to the provided regularization scheme. Assumes points in udata are independent and sorts the draw according to the index values before interpolating. See also InterpolateAndBin.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | npts = 50 y = rand(npts) N = Normal(0, 1) for t in 3:npts y[t,1] = 0.7*y[t-1,1] - 0.35*y[t-2,1] + rand(N) end # Assume data are unevenly spaced time = sample(1.0:npts*5, npts, ordered = true, replace = false) # Assign some uncertainties to both time indices and values and gather # in an UncertainIndexValueDataset utime = UncertainValue.(Normal.(time, 2)) uy = UncertainValue.(Normal.(y, 0.1)) udata = UncertainIndexValueDataset(utime, uy) # Interpolation-and-binning scheme. First interpolate to a very fine grid, # then gather the points falling in each of the coarser bins and summarise # each bin using the mean of the points in each bin. left_bin_edges = 0:10:npts*5 r = InterpolateAndBin(mean, left_bin_edges, Linear(), 0:0.1:1000, Flat(OnGrid())) # The binned time axis: time_binned = left_bin_edges[1:end-1] .+ step(left_bin_edges)/2 # Get a set corresponding resampled (interpolated+binned) values y_binned = resample(udata, r) # Plot some interpolated+binned draws time_binned = left_bin_edges[1:end-1] .+ step(left_bin_edges)/2 p = plot(xlabel = "time", ylabel = "value") for i = 1:100 plot!(time_binned, resample(udata, r), lw = 0.3, α = 0.2, ms = 0.1, c = :red, marker = stroke(0.1), label = "") end plot!(time, y, c = :black, lw = 1, ms = 2, marker = stroke(2.0, :black), label = "") plot!(udata, c = :black, lw = 1, ms = 2, marker = stroke(0.1, :black), [0.05, 0.95], [0.05, 0.95]) vline!(left_bin_edges, c = :black, α = 0.3, lw = 0.3, label = "") |