Skip to content

List of resampling schemes

List of resampling schemes and their purpose

For collections of uncertain data, sampling constraints can be represented using the ConstrainedValueResampling type. This allows for passing complicated sampling constraints as a single input argument to functions that accept uncertain value collections.

Constrained resampling

# UncertainData.Resampling.ConstrainedValueResamplingType.

1
ConstrainedValueResampling{N_DATASETS}

Indicates that resampling should be done with constraints on the furnishing distributions/populations.

Fields

  • constraints. The constraints for the datasets. The constraints are represented as a tuple of length N_DATASETS, where the i-th tuple element contains the constraints for that dataset. Constraints for each dataset must be supplied as either a single sampling constraint, or as a vector of sampling constraints with length matching the length of the dataset (Union{SamplingConstraint, Vector{<:SamplingConstraint}}}). For example, if the i-th dataset contains 352 observations, then constraints[i] must be either a single sampling constraint (e.g. TruncateStd(1.1)) or a vector of 352 different sampling constraints (e.g. [TruncateStd(1.0 + rand()) for i = 1:352]).
  • n::Int. The number of draws.

Example

Assume we have three collections of uncertain values of, each of length L = 50. These should be resampled 250 times. Before resampling, however, the distributions/populations furnishing the uncertain values should be truncated:

  • For the first collection, truncate each value at 1.5 times its standard deviation around its mean. This could simulate measurement errors from an instrument that yields stable measurements whose errors are normally distributed, but for which we are not interested in outliers or values beyond 1.5 standard devations for our analyses.
  • For the second collection, truncate each value at the 80th percentile range. This could simulate measurement errors from an instrument that yields stable measurements, whose errors are not normally distributed, so that confidence intervals are better to use than standard deviations. In this case, we're not interested in outliers, and therefore exclude values smaller than the 10th percentile and larger than the 90th percentile of the data.
  • For the third collection, truncate the i-th value at an fraction of its standard deviation around the mean slightly larger than at the i-1-th value, so that the standard deviation ranges from 0.5 to 0.5 + L/100. This could simulate, for example, an instrument whose measurement error increases over time.
1
2
3
4
L = 50
constraints_d1 = TruncateStd(1.5)
constraints_d2 = TruncateQuantiles(0.1, 0.9)
constraints_d3 = [TruncateStd(0.5 + i/100) for i = 1:L]

source