Uncertain value types
The core concept of UncertainData
is to replace an uncertain data value with a probability distribution describing the point's uncertainty.
The following types of uncertain values are currently implemented:
- Theoretical distributions with known parameters.
- Theoretical distributions with parameters fitted to empirical data.
- Kernel density estimated distributions estimated from empirical data.
- Weighted (nested) populations where the probability of drawing values are already known, so you can skip kernel density estimation. Populations can be nested, and may contain numerical values, uncertain values or both.
- Values without uncertainty have their own dedicated
CertainValue
type, so that you can uncertain values with certain values. Measurement
instances from Measurements.jl are treated as normal distributions with known mean and standard devation.
Some quick examples
See also the extended examples!
Kernel density estimation (KDE)
If the data doesn't follow an obvious theoretical distribution, the recommended course of action is to represent the uncertain value with a kernel density estimate of the distribution.
using Distributions, UncertainData, KernelDensity
# Generate some random data from a normal distribution, so that we get a
# histogram resembling a normal distribution.
some_sample = rand(Normal(), 1000)
# Uncertain value represented by a kernel density estimate (it is inferred
# that KDE is wanted when no distribution is provided to the constructor).
uv = UncertainValue(some_sample)
using Distributions, UncertainData
# Generate some random data from a normal distribution, so that we get a
# histogram resembling a normal distribution.
some_sample = rand(Normal(), 1000)
# Specify that we want a kernel density estimate representation
uv = UncertainValue(UnivariateKDE, some_sample)
Populations
If you have a population of values where each value has a probability assigned to it, you can construct an uncertain value by providing the values and uncertainties as two equal-length vectors to the constructor. Weights are normalized by default.
vals = rand(100)
weights = rand(100)
p = UncertainValue(vals, weights)
Fitting a theoretical distribution
If your data has a histogram closely resembling some theoretical distribution, the uncertain value may be represented by fitting such a distribution to the data.
using Distributions, UncertainData
# Generate some random data from a normal distribution, so that we get a
# histogram resembling a normal distribution.
some_sample = rand(Normal(), 1000)
# Uncertain value represented by a theoretical normal distribution with
# parameters fitted to the data.
uv = UncertainValue(Normal, some_sample)
using Distributions, UncertainData
# Generate some random data from a gamma distribution, so that we get a
# histogram resembling a gamma distribution.
some_sample = rand(Gamma(), 1000)
# Uncertain value represented by a theoretical gamma distribution with
# parameters fitted to the data.
uv = UncertainValue(Gamma, some_sample)
Theoretical distribution with known parameters
It is common when working with uncertain data found in the scientific literature that data value are stated to follow a distribution with given parameters. For example, a data value may be given as normal distribution with a given mean μ = 2.2
and standard deviation σ = 0.3
.
# Uncertain value represented by a theoretical normal distribution with
# known parameters μ = 2.2 and σ = 0.3
uv = UncertainValue(Normal, 2.2, 0.3)
# Uncertain value represented by a theoretical gamma distribution with
# known parameters α = 2.1 and θ = 3.1
uv = UncertainValue(Gamma, 2.1, 3.1)
# Uncertain value represented by a theoretical binomial distribution with
# known parameters p = 32 and p = 0.13
uv = UncertainValue(Binomial, 32, 0.13)
Values with no uncertainty
Scalars with no uncertainty can also be represented.
c1, c2 = UncertainValue(2), UncertainValue(2.2)