Applying resampling schemes
Resampling schemes¶
For some uncertain collections and datasets, special resampling types are available to make resampling easier.
Constrained resampling schemes¶
Constrained resampling¶
#
UncertainData.Resampling.resample
— Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::ConstrainedIndexValueResampling) |
Resample x
by first constraining the supports of the distributions/populations furnishing the uncertain indices and values, then drawing samples from the limited supports.
Sampling is done without assuming any sequential dependence between the elements of x
, such no that no dependence is introduced in the draws beyond what is potentially already present in the collection of values.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | # Some example data N = 50 x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)] y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)] x = UncertainValueDataset(x_uncertain) y = UncertainValueDataset(y_uncertain) time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)]; time_certain = [CertainValue(i) for i = 1:length(x)]; timeinds_x = UncertainIndexDataset(time_uncertain) timeinds_y = UncertainIndexDataset(time_certain) X = UncertainIndexValueDataset(timeinds_x, x) Y = UncertainIndexValueDataset(timeinds_y, y); ########################### # Define resampling scheme ########################### # Truncate each of the indices for x at 0.8 their standard deviation around the mean constraints_x_inds = TruncateStd(0.8) # Truncate each of the indices for y at 1.5 their standard deviation around the mean constraints_y_inds = TruncateStd(1.5) # Truncate each of the values of x at the 20th percentile range constraints_x_vals = [TruncateQuantiles(0.4, 0.6) for i = 1:N]; # Truncate each of the values of x at the 80th percentile range constraints_y_vals = [TruncateQuantiles(0.1, 0.9) for i = 1:N]; cs_x = (constraints_x_inds, constraints_x_vals) cs_y = (constraints_y_inds, constraints_y_vals) ########### # Resample ########### resample(X, ConstrainedIndexValueResampling(cs_x)) resample(Y, ConstrainedIndexValueResampling(cs_y)) |
Sequential resampling schemes¶
Sequential¶
#
UncertainData.Resampling.resample
— Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::SequentialResampling) |
Resample x
according to a sequential resampling constraint.
This way of resampling introduces some serial dependence between the elements of x
- beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing
) to the i
-th value of the dataset imposes constraints on what is possible to sample from the i+1
th value.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # Some example data N = 50 x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)] y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)] x = UncertainValueDataset(x_uncertain) y = UncertainValueDataset(y_uncertain) time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)]; time_certain = [CertainValue(i) for i = 1:length(x)]; timeinds_x = UncertainIndexDataset(time_uncertain) timeinds_y = UncertainIndexDataset(time_certain) X = UncertainIndexValueDataset(timeinds_x, x) Y = UncertainIndexValueDataset(timeinds_y, y); # Resample seq_resampling = SequentialResampling(StrictlyIncreasing()) resample(X, seq_resampling) |
Sequential and interpolated¶
#
UncertainData.Resampling.resample
— Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::SequentialInterpolatedResampling) |
Resample x
according to a sequential resampling constraint, then interpolate the draw(s) to some specified grid.
This way of resampling introduces some serial dependence between the elements of x
- beyond what might already be present in the dataset. This is because imposing a sequential constraint (e.g. StrictlyIncreasing
) to the i
-th value of the dataset imposes constraints on what is possible to sample from the i+1
th value.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | # Some example data N = 50 x_uncertain = [UncertainValue(Normal, x, rand(Uniform(0.1, 0.8))) for x in rand(N)] y_uncertain = [UncertainValue(Normal, y, rand(Uniform(0.1, 0.8))) for y in rand(N)] x = UncertainValueDataset(x_uncertain) y = UncertainValueDataset(y_uncertain) time_uncertain = [UncertainValue(Normal, i, 1) for i = 1:length(x)]; time_certain = [CertainValue(i) for i = 1:length(x)]; timeinds_x = UncertainIndexDataset(time_uncertain) timeinds_y = UncertainIndexDataset(time_certain) X = UncertainIndexValueDataset(timeinds_x, x) Y = UncertainIndexValueDataset(timeinds_y, y); # Resample seqintp_resampling = SequentialInterpolatedResampling(StrictlyIncreasing(), RegularGrid(0:2:N)) resample(X, seqintp_resampling) |
Binned resampling schemes¶
BinnedResampling¶
#
UncertainData.Resampling.resample
— Method.
1 2 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedResampling; nan_threshold = 0.0) |
Transform index-irregularly spaced uncertain data onto a regular index-grid. Distributions in each index bin are obtained by resampling all index values in x
resampling.n
times, and mapping those index draws to the bins. Simultaneously, the values in x
are resampled and placed in the corresponding bins. In total, length(x)*resampling.n
draws are distributed among the bins to form the final KDEs.
Returns an UncertainIndexValueDataset
. The distribution of values in the i
-th bin is approximated by a kernel density estimate (KDE) over the draws falling in the i
-th bin.
Assumes that the points in x
are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); n_draws = 10000 # draws per uncertain value time_grid = 0:50:1000 resampling = BinnedResampling(time_grid, n_draws) # Resample both X and Y so that they are both at the same time indices. resampled_dataset = resample(X, resampling) resampled_dataset = resample(Y, resampling) |
BinnedMeanResampling¶
#
UncertainData.Resampling.resample
— Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanResampling) |
Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin.
Distributions in each index bin are obtained by resampling all index values in x
resampling.n
times, and mapping those index draws to the bins. Simultaneously, the values in x
are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n
draws are distributed among the bins to form the final mean estimate.
Returns a vector of mean values, one for each bin.
Assumes that the points in x
are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); n_draws = 10000 # draws per uncertain value time_grid = 0:50:1000 # Resample both X and Y so that they are both at the same time indices, # and take the mean of each bin. resampled_dataset = resample(X, BinnedMeanResampling(time_grid, n_draws)) resampled_dataset = resample(Y, BinnedMeanResampling(time_grid, n_draws)) |
BinnedWeightedResampling¶
#
UncertainData.Resampling.resample
— Method.
1 2 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedWeightedResampling; nan_threshold = 0.0) |
Transform index-irregularly spaced uncertain data onto a regular index-grid. Distributions in each index bin are obtained by resampling all index values in x
resampling.n
times, sampled according to probabilities resampling.weights
, and mapping those index draws to the bins. Simultaneously, the values in x
are resampled and placed in the corresponding bins. In total, length(x)*resampling.n
draws are distributed among the bins to form the final KDEs.
Returns an UncertainIndexValueDataset
. The distribution of values in the i
-th bin is approximated by a kernel density estimate (KDE) over the draws falling in the i
-th bin.
Assumes that the points in x
are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); left_bin_edges = 0:50:1000 n_draws = 10000 wts = Weights(rand(length(X))) resampling = BinnedWeightedResampling(left_bin_edges, wts, 10) resampled_dataset = resample(X, resampling) |
BinnedMeanWeightedResampling¶
#
UncertainData.Resampling.resample
— Method.
1 | resample(x::AbstractUncertainIndexValueDataset, resampling::BinnedMeanWeightedResampling) |
Transform index-irregularly spaced uncertain data onto a regular index-grid and take the mean of the values in each bin. Resamples the data points in x
according to resampling.weights
.
Distributions in each index bin are obtained by resampling all index values in x
resampling.n
times, in proportions obeying resampling.weights
and mapping those index draws to the bins. Simultaneously, the values in x
are resampled and placed in the corresponding bins. Finally, the mean in each bin is calculated. In total, length(x)*resampling.n
draws are distributed among the bins to form the final mean estimate.
Returns a vector of mean values, one for each bin.
Assumes that the points in x
are independent.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | vars = (1, 2) npts, tstep = 100, 10 d_xind = Uniform(2.5, 15.5) d_yind = Uniform(2.5, 15.5) d_xval = Uniform(0.01, 0.2) d_yval = Uniform(0.01, 0.2) X, Y = example_uncertain_indexvalue_datasets(ar1_unidir(c_xy = 0.5), npts, vars, tstep = tstep, d_xind = d_xind, d_yind = d_yind, d_xval = d_xval, d_yval = d_yval); n_draws = 10000 # draws per uncertain value time_grid = 0:50:1000 wts = Weights(rand(length(X))) # some random weights # Resample both X and Y so that they are both at the same time indices, # and take the mean of each bin. resampled_dataset = resample(X, BinnedMeanWeightedResampling(time_grid, wts, n_draws)) resampled_dataset = resample(Y, BinnedMeanWeightedResampling(time_grid, wts, n_draws)) |