Skip to content

UncertainData.jl

Motivation

UncertainData.jl was born to systematically deal with uncertain data, and to sample uncertain dataset more rigorously. It makes workflows involving uncertain data of different types and from different sources significantly easier.

Package philosophy

Way too often in data analysis the uncertainties in observational data are ignored or not dealt with in a systematic manner. The core concept of the package is that uncertain data should live in the probability domain, not as single value representations of the data (e.g. the mean).

In this package, data values are stored as probability distributions. Individual uncertain observations may be collected in UncertainDatasets, which can be sampled according to user-provided sampling constraints. Likewise, indices (e.g. time, depth or any other index) of observations are also represented as probability distributions. Indices may also be sampled using constraints, for example enforcing strictly increasing values.

Basic workflow

  1. Define uncertain values by probability distributions.
  2. Define uncertain datasets by gathering uncertain values.
  3. Use sampling constraints to constraint the support of the distributions furnishing the uncertain values (i.e. apply subjective criteria to decide what is acceptable data and what is not).
  4. Resample the the uncertain values or uncertain datasets.
  5. Extend existing algorithm to accept uncertain values/datasets.
  6. Quantify the uncertainty in your dataset or on whatever measure your algorithm computes.