Applying resampling schemes
Resampling schemes¶
For some uncertain collections and datasets, special resampling types are available to make resampling easier.
Constrained resampling schemes¶
Constrained resampling¶
#
UncertainData.Resampling.resample — Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::ConstrainedIndexValueResampling) |
Resample x by first constraining the supports of the distributions/populations furnishing the uncertain indices and values, then drawing samples from the limited supports.
Sampling is done without assuming any sequential dependence between the elements of x, such no that no dependence is introduced in the draws beyond what is potentially already present in the collection of values.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | # Some example data N = 50 x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)] y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)] x = UncertainValueDataset(x_uncertain) y = UncertainValueDataset(y_uncertain) time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)]; time_certain = [CertainValue(i) for i = 1:length(x)]; timeinds_x = UncertainIndexDataset(time_uncertain) timeinds_y = UncertainIndexDataset(time_certain) X = UncertainIndexValueDataset(timeinds_x, x) Y = UncertainIndexValueDataset(timeinds_y, y); ########################### # Define resampling scheme ########################### # Truncate each of the indices for x at 0.8 their standard deviation around the mean constraints_x_inds = TruncateStd(0.8) # Truncate each of the indices for y at 1.5 their standard deviation around the mean constraints_y_inds = TruncateStd(1.5) # Truncate each of the values of x at the 20th percentile range constraints_x_vals = [TruncateQuantiles(0.4, 0.6) for i = 1:N]; # Truncate each of the values of x at the 80th percentile range constraints_y_vals = [TruncateQuantiles(0.1, 0.9) for i = 1:N]; cs_x = (constraints_x_inds, constraints_x_vals) cs_y = (constraints_y_inds, constraints_y_vals) ########### # Resample ########### resample(X, ConstrainedIndexValueResampling(cs_x)) resample(Y, ConstrainedIndexValueResampling(cs_y)) |
Sequential resampling schemes¶
Sequential¶
#
UncertainData.Resampling.resample — Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::SequentialResampling) |
Resample x according to a sequential resampling constraint.
This way of resampling introduces some serial dependence between the elements of x - beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing) to the i-th value of the dataset imposes constraints on what is possible to sample from the i+1th value.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # Some example data N = 50 x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)] y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)] x = UncertainValueDataset(x_uncertain) y = UncertainValueDataset(y_uncertain) time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)]; time_certain = [CertainValue(i) for i = 1:length(x)]; timeinds_x = UncertainIndexDataset(time_uncertain) timeinds_y = UncertainIndexDataset(time_certain) X = UncertainIndexValueDataset(timeinds_x, x) Y = UncertainIndexValueDataset(timeinds_y, y); # Resample seq_resampling = SequentialResampling(StrictlyIncreasing()) resample(X, seq_resampling) |
Sequential and interpolated¶
#
UncertainData.Resampling.resample — Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::SequentialInterpolatedResampling) |
Resample x according to a sequential resampling constraint, then interpolate the draw(s) to some specified grid.
This way of resampling introduces some serial dependence between the elements of x - beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing) to the i-th value of the dataset imposes constraints on what is possible to sample from the i+1th value.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # Some example data N = 50 x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)] y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)] x = UncertainValueDataset(x_uncertain) y = UncertainValueDataset(y_uncertain) time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)]; time_certain = [CertainValue(i) for i = 1:length(x)]; timeinds_x = UncertainIndexDataset(time_uncertain) timeinds_y = UncertainIndexDataset(time_certain) X = UncertainIndexValueDataset(timeinds_x, x) Y = UncertainIndexValueDataset(timeinds_y, y); # Resample seqintp_resampling = SequentialInterpolatedResampling(StrictlyIncreasing(), RegularGrid(0:2:N)) resample(X, seqintp_resampling) |
Binned resampling schemes¶
BinnedResampling¶
#
UncertainData.Resampling.resample — Method.
1 2 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedResampling; nan_threshold = 0.0) |
Transform index-irregularly spaced uncertain data onto a regular index-grid. Distributions in each index bin are obtained by resampling all index values in x resampling.n times, and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. In total, length(x)*resampling.n draws are distributed among the bins to form the final KDEs.
Returns an UncertainIndexValueDataset. The distribution of values in the i-th bin is approximated by a kernel density estimate (KDE) over the draws falling in the i-th bin.
Assumes that the points in x are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); n_draws = 10000 # draws per uncertain value time_grid = 0:50:1000 resampling = BinnedResampling(time_grid, n_draws) # Resample both X and Y so that they are both at the same time indices. resampled_dataset = resample(X, resampling) resampled_dataset = resample(Y, resampling) |
BinnedMeanResampling¶
#
UncertainData.Resampling.resample — Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanResampling) |
Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin.
Distributions in each index bin are obtained by resampling all index values in x resampling.n times, and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n draws are distributed among the bins to form the final mean estimate.
Returns a vector of mean values, one for each bin.
Assumes that the points in x are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); n_draws = 10000 # draws per uncertain value time_grid = 0:50:1000 # Resample both X and Y so that they are both at the same time indices, # and take the mean of each bin. resampled_dataset = resample(X, BinnedMeanResampling(time_grid, n_draws)) resampled_dataset = resample(Y, BinnedMeanResampling(time_grid, n_draws)) |
BinnedWeightedResampling¶
#
UncertainData.Resampling.resample — Method.
1 2 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedWeightedResampling; nan_threshold = 0.0) |
Transform index-irregularly spaced uncertain data onto a regular index-grid. Distributions in each index bin are obtained by resampling all index values in x resampling.n times, sampled according to probabilities resampling.weights, and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. In total, length(x)*resampling.n draws are distributed among the bins to form the final KDEs.
Returns an UncertainIndexValueDataset. The distribution of values in the i-th bin is approximated by a kernel density estimate (KDE) over the draws falling in the i-th bin.
Assumes that the points in x are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); left_bin_edges = 0:50:1000 n_draws = 10000 wts = Weights(rand(length(X))) resampling = BinnedWeightedResampling(left_bin_edges, wts, 10) resampled_dataset = resample(X, resampling) |
BinnedMeanWeightedResampling¶
#
UncertainData.Resampling.resample — Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanWeightedResampling) |
Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin. Resamples the data points in x according to resampling.weights.
Distributions in each index bin are obtained by resampling all index values in x resampling.n times, in proportions obeying resampling.weights and mapping those index draws to the bins. Simultaneously, the values in x are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n draws are distributed among the bins to form the final mean estimate.
Returns a vector of mean values, one for each bin.
Assumes that the points in x are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); n_draws = 10000 # draws per uncertain value time_grid = 0:50:1000 wts = Weights(rand(length(X))) # some random weights # Resample both X and Y so that they are both at the same time indices, # and take the mean of each bin. resampled_dataset = resample(X, BinnedMeanWeightedResampling(time_grid, wts, n_draws)) resampled_dataset = resample(Y, BinnedMeanWeightedResampling(time_grid, wts, n_draws)) |