List of schemes
List of resampling schemes and their purpose¶
For collections of uncertain data, sampling constraints can be represented using the ConstrainedIndexValueResampling
type. This allows for passing complicated sampling constraints as a single input argument to functions that accept uncertain value collections. Sequential constraints also make it possible to impose constraints on the indices of datasets while sampling.
Constrained¶
Constrained resampling¶
#
UncertainData.Resampling.ConstrainedIndexValueResampling
— Type.
1 | ConstrainedIndexValueResampling(constraints::NTuple{N_DATASETS, NTuple{N_VARIABLES, Union{SamplingConstraint, Vector{<:SamplingConstraint}}}}, n::Int) |
Indicates that resampling should be performed with constraints on a set of uncertain index-value datasets. See examples for usage.
Fields
constraints
. The constraints for the datasets. The constraints are represented as a tuple of lengthN_DATASETS
, where thei
-th tuple element is itself aN_VARIABLES
-length tuple containing the constraints for theN_VARIABLES
different variables. See "Indexing" below for details. Constraints for each individual variable must be supplied as either a single sampling constraint, or as a vector of sampling constraints with length matching the length of the variable (Union{SamplingConstraint, Vector{<:SamplingConstraint}}}
). For example, if thej
-th variable for thei
-th dataset contains 352 observations, thenconstraints[i, j]
must be either a single sampling constraint (e.g.TruncateStd(1.1)
) or a vector of 352 different sampling constraints (e.g.[TruncateStd(1.0 + rand()) for i = 1:352]
).n::Int
. The number of draws.
Indexing
Assume c
is an instance of ConstrainedIndexValueResampling
. Then
c[i]
returns theNTuple
of constraints for thei
-th dataset, andc[i, j]
returns the constraint(s) for thej
-th variable of thei
-th dataset.
Example
Defining ConstrainedIndexValueResampling
s.
Assume we want to constraints three separate uncertain index-value datasets, with different sampling constraints for the indices and the values for each of the datasets.
1 2 3 4 5 | # (index constraints, value constraints) for the 1st, 2nd and 3rd datasets c1 = (TruncateStd(1), TruncateStd(1.1)) c2 = (TruncateStd(0.5), TruncateQuantiles(0.1, 0.8)) c3 = (TruncateQuantiles(0.05, 0.95), TruncateQuantiles(0.33, 0.67)) c = ConstrainedIndexValueResampling(c1, c2, c3) |
Now,
c[2]
returns theNTuple
of constraints for the 2nd dataset, andc[1, 2]
returns the constraint(s) for the 2nd variable of the 1st dataset.
Controlling the number of draws
The number of draws defaults to 1 if not specified. To indicate that more than one draw should be performed, just input the number of draws before supplying the constraints to the constructor.
1 2 3 4 5 6 7 8 | c1 = (TruncateStd(1), TruncateStd(1.1)) c2 = (TruncateStd(0.5), TruncateQuantiles(0.1, 0.8)) # A single draw c_single = ConstrainedIndexValueResampling(c1, c2) # Multiple (300) draws c_multiple = ConstrainedIndexValueResampling(300, c1, c2) |
Detailed example
Let's say we have two uncertain index-value datasets x
and y
. We want to constrain the furnishing distributions/population for both the time indices and values, both for x
and y
. For x
, truncate the indices at 0.8
times the standard deviation around their mean, and for y
, trucate the indices at 1.5
times the standard deviation around their mean. Next, truncate x
s values at roughly (roughly) at their 20th percentile range, and truncate y
s values at roughly their 80th percentile range.
All this information can be combined in a ConstrainedIndexValueResampling
instance. This instance can be passed on to any function that accepts uncertain index-value datasets, to indicate that resampling should be performed on truncated versions of the distributions/populations furnishing the datasets.
1 2 3 4 5 6 7 8 9 10 11 12 13 | # some noise, so we don't truncate all furnishing distributions/population at # exactly the same quantiles. r = Uniform(0, 0.01) constraints_x_inds = TruncateStd(0.8) constraints_y_inds = TruncateStd(1.5) constraints_x_vals = [TruncateQuantiles(0.4 + rand(r), 0.6 + rand(r)) for i = 1:length(x)]; constraints_y_vals = [TruncateQuantiles(0.1 + rand(r), 0.9 + rand(r)) for i = 1:length(x)]; cs_x = (constraints_x_inds, constraints_x_vals) cs_y = (constraints_y_inds, constraints_y_vals) resampling = ConstrainedIndexValueResampling(cs_x, cs_y) |
Sequential¶
Sequential resampling¶
#
UncertainData.Resampling.SequentialResampling
— Type.
1 | SequentialResampling{SequentialSamplingConstraint} |
Indicates that resampling should be done by resampling sequentially.
Fields
sequential_constraint::SequentialSamplingConstraint
. The sequential sampling constraint, for exampleStrictlyIncreasing()
.
Examples
1 | SequentialResampling(StrictlyIncreasing()) |
Sequential and interpolated resampling¶
#
UncertainData.Resampling.SequentialInterpolatedResampling
— Type.
1 | SequentialInterpolatedResampling{SequentialSamplingConstraint, InterpolationGrid} |
Indicates that resampling should be done by first resampling sequentially, then interpolating the sample to an interpolation grid.
Fields
sequential_constraint::SequentialSamplingConstraint
. The sequential sampling constraint, for exampleStrictlyIncreasing()
.grid::InterpolationGrid
. The grid onto which the resampled draw (generated according to the sequential constraint) is interpolated, for exampleRegularGrid(0, 100, 2.5)
.
Examples
For example, SequentialInterpolatedResampling(StrictlyIncreasing(), RegularGrid(0:2:100))
indicates a sequential draw that is then interpolated to the grid 0:2:100.
Binned resampling¶
BinnedResampling¶
#
UncertainData.Resampling.BinnedResampling
— Type.
1 | BinnedResampling
|
Indicates that binned resampling should be performed.
Fields
left_bin_edges
. The left edgepoints of the bins. Either a range or some custom type which implementsminimum
andstep
methods.n
. The number of draws. Each point in the dataset is sampledn
times. If there arem
points in the dataset, then the total number of draws isn*m
.
Examples
1 2 3 4 5 6 7 8 9 10 | using UncertainData # Resample on a grid from 0 to 200 in steps of 20 grid = 0:20:200 # The number of samples per point in the dataset n_draws = 10000 # Create the resampling scheme resampling = BinnedResampling(grid, n_draws) |
BinnedWeightedResampling¶
#
UncertainData.Resampling.BinnedWeightedResampling
— Type.
1 | BinnedWeightedResampling
|
Indicates that binned resampling should be performed, but weighting each point in the dataset differently.
Fields
left_bin_edges
. The left edgepoints of the bins. Either a range or some custom type which implementsminimum
andstep
methods.weights
. The relative probability weights assigned to each point.n
. The total number of draws. These are distributed among the points of the dataset according toweights
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | using UncertainData, StatsBase # Resample on a grid from 0 to 200 in steps of 20 grid = 0:20:200 # Assume our dataset has 50 points. We'll assign random weights to them. wts = Weights(rand(50)) # The total number of draws (on average 1000000/50 = 20000 draws per point # if weights are equal) n_draws = 10000000 # Create the resampling scheme resampling = BinnedWeightedResampling(grid, wts, n_draws) |
BinnedMeanResampling¶
#
UncertainData.Resampling.BinnedMeanResampling
— Type.
1 | BinnedMeanResampling
|
Binned resampling where each bin is summarised using the mean of all draws falling in that bin.
Fields
left_bin_edges
. The left edgepoints of the bins. Either a range or some custom type which implementsminimum
andstep
methods.n
. The number of draws. Each point in the dataset is sampledn
times. If there arem
points in the dataset, then the total number of draws isn*m
.
Examples
1 2 3 4 5 6 7 8 9 10 | using UncertainData # Resample on a grid from 0 to 200 in steps of 20 grid = 0:20:200 # The number of samples per point in the dataset n_draws = 10000 # Create the resampling scheme resampling = BinnedMeanResampling(grid, n_draws) |
BinnedMeanWeightedResampling¶
#
UncertainData.Resampling.BinnedMeanWeightedResampling
— Type.
1 | BinnedMeanWeightedResampling
|
Binned resampling where each bin is summarised using the mean of all draws falling in that bin. Points in the dataset are sampled with probabilities according to weights
.
Fields
left_bin_edges
. The left edgepoints of the bins. Either a range or some custom type which implementsminimum
andstep
methods.weights
. The relative probability weights assigned to each point.n
. The total number of draws. These are distributed among the points of the dataset according toweights
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | using UncertainData, StatsBase # Resample on a grid from 0 to 200 in steps of 20 grid = 0:20:200 # Assume our dataset has 50 points. We'll assign random weights to them. wts = Weights(rand(50)) # The total number of draws (on average 1000000/50 = 20000 draws per point # if weights are equal) n_draws = 10000000 # Create the resampling scheme resampling = BinnedMeanWeightedResampling(grid, wts, n_draws) |