UncertainData.jl

Motivation

UncertainData.jl was born to systematically deal with uncertain data, and to sample from uncertain datasets more rigorously. It makes workflows involving uncertain data of different types and from different sources significantly easier.

Package philosophy

Way too often in data analysis the uncertainties in observational data are ignored or not dealt with in a systematic manner. The core concept of the package is that uncertain data should live in the probability domain, not as single value representations of the data (e.g. the mean).

In this package, uncertain data values are thus stored as probability distributions or populations. Only when performing a computation or plotting, the uncertain values are realized by resampling the probability distributions furnishing them.

Organising uncertain data

Individual uncertain observations of different types are seamlessly mixed and can be organised in collections of uncertain values.

Mathematical operations

Several elementary mathematical operations and trigonometric functions are supported for uncertain values. Computations are done using a resampling approach.

Statistics on uncertain datasets

Statistics on uncertain datasets are computed using a resampling approach:

Resampling

Resampling is done by drawing random numbers from the furnishing distributions/populations of the uncertain value(s), using one of the resample methods.

Individual uncertain values may be sampled as they are, or after first applying sampling constraints on the underlying distributions/populations.
Collections of uncertain values can be resampled by either assuming no sequential dependence for your data, or by applying sequential sampling models. During this process sampling constraints can be applied element-wise or on entire collections.

Basic workflow

Define uncertain values by probability distributions.
Define uncertain datasets by gathering uncertain values.
Use sampling constraints to constraint the support of the distributions furnishing the uncertain values (i.e. apply subjective criteria to decide what is acceptable data and what is not).
Resample the uncertain values or uncertain datasets.
Extend existing algorithm to accept uncertain values/datasets.
Quantify the uncertainty in your dataset or on whatever measure your algorithm computes.

A related package is Measurements.jl, which propagates errors exactly and handles correlated uncertainties. However, Measurements.jl accepts only normally distributed values. This package serves a slightly different purpose: it was born to provide an easy way of handling uncertainties of many different types, using a resampling approach to obtain statistics when needed, and providing a rich set of sampling constraints that makes it easy for the user to reason about and plot their uncertain data under different assumptions.

Depending on your needs, Measurements.jl may be a better (and faster) choice if your data satisfies the requirements for the package (normally distributed) and if your uncertainties are correlated.

Contributing

If you have questions, or a good idea for new functionality that could be useful to have in the package, please submit an issue, or even better - a pull request.

Citing

If you use UncertainData.jl for any of your projects or scientific publications, please cite this small Journal of Open Source Software (JOSS) publication as follows

Haaga, (2019). UncertainData.jl: a Julia package for working with measurements and datasets with uncertainties. Journal of Open Source Software, 4(43), 1666, https://doi.org/10.21105/joss.01666