Kernel density estimated distributions
When your data have an empirical distribution that doesn't follow any obvious theoretical distribution, the data may be represented by a kernel density estimate.
Generic constructor
UncertainData.UncertainValues.UncertainValue
— MethodUncertainValue(values::Vector, probs::Union{Vector, AbstractWeights})
Construct a population whose members are given by values
and whose sampling probabilities are given by probs
. The elements of values
can be either numeric or uncertain values of any type.
UncertainValue(data::Vector{T};
kernel::Type{D} = Normal,
npoints::Int=2048) where {D <: Distributions.Distribution, T}
Construct an uncertain value by a kernel density estimate to data
.
Fast Fourier transforms are used in the kernel density estimation, so the number of points should be a power of 2 (default = 2048).
UncertainValue(kerneldensity::Type{K}, data::Vector{T};
kernel::Type{D} = Normal,
npoints::Int=2048) where {K <: UnivariateKDE, D <: Distribution, T}
Construct an uncertain value by a kernel density estimate to data
.
Fast Fourier transforms are used in the kernel density estimation, so the number of points should be a power of 2 (default = 2048).
UncertainValue(empiricaldata::AbstractVector{T},
d::Type{D}) where {D <: Distribution}
Constructor for empirical distributions.
Fit a distribution of type d
to the data and use that as the representation of the empirical distribution. Calls Distributions.fit
behind the scenes.
Arguments
empiricaldata
: The data for which to fit thedistribution
.distribution
: A valid univariate distribution fromDistributions.jl
.
Type documentation
UncertainData.UncertainValues.UncertainScalarKDE
— TypeUncertainScalarKDE(d::KernelDensity.UnivariateKDE, values::AbstractVector{T}, range, pdf)
An empirical value represented by a distribution estimated from actual data.
Fields
distribution
: TheUnivariateKDE
estimate for the distribution ofvalues
.values
: The values from whichdistribution
is estimated.range
: The values for which the pdf is estimated.pdf
: The values of the pdf at each point inrange
.
Examples
using Distributions, UncertainData
# Create a normal distribution
d = Normal()
# Draw a 1000-point sample from the distribution.
some_sample = rand(d, 1000)
# Use the implicit KDE constructor to create the uncertain value
uv = UncertainValue(v::Vector)
using Distributions, UncertainData, KernelDensity
# Create a normal distribution
d = Normal()
# Draw a 1000-point sample from the distribution.
some_sample = rand(d, 1000)
# Use the explicit KDE constructor to create the uncertain value.
# This constructor follows the same convention as when fitting distributions
# to empirical data, so this is the recommended way to construct KDE estimates.
uv = UncertainValue(UnivariateKDE, v::Vector)
using Distributions, UncertainData, KernelDensity
# Create a normal distribution
d = Normal()
# Draw a 1000-point sample from the distribution.
some_sample = rand(d, 1000)
# Use the explicit KDE constructor to create the uncertain value, specifying
# that we want to use normal distributions as the kernel. The kernel can be
# any valid kernel from Distributions.jl, and the default is to use normal
# distributions.
uv = UncertainValue(UnivariateKDE, v::Vector; kernel = Normal)
using Distributions, UncertainData, KernelDensity
# Create a normal distribution
d = Normal()
# Draw a 1000-point sample from the distribution.
some_sample = rand(d, 1000)
# Use the explicit KDE constructor to create the uncertain value, specifying
# the number of points we want to use for the kernel density estimate. Fast
# Fourier transforms are used behind the scenes, so the number of points
# should be a power of 2 (the default is 2048 points).
uv = UncertainValue(UnivariateKDE, v::Vector; npoints = 1024)
Extended example
Let's create a bimodal distribution, then sample 10000 values from it.
using Distributions
n1 = Normal(-3.0, 1.2)
n2 = Normal(8.0, 1.2)
n3 = Normal(0.0, 2.5)
# Use a mixture model to create a bimodal distribution
M = MixtureModel([n1, n2, n3])
# Sample the mixture model.
samples_empirical = rand(M, Int(1e4));
It is not obvious which distribution to fit to such data.
A kernel density estimate, however, will always be a decent representation of the data, because it doesn't follow a specific distribution and adapts to the data values.
To create a kernel density estimate, simply call the UncertainValue(v::Vector{Number})
constructor with a vector containing the sample:
uv = UncertainValue(samples_empirical)
The plot below compares the empirical histogram (here represented as a density plot) with our kernel density estimate.
using Plots, StatPlots, UncertainData
uv = UncertainValue(samples_empirical)
density(mvals, label = "10000 mixture model (M) samples")
density!(rand(uv, Int(1e4)),
label = "10000 samples from KDE estimate to M")
xlabel!("data value")
ylabel!("probability density")
Constructor
UncertainData.UncertainValues.UncertainValue
— MethodUncertainValue(data::Vector{T};
kernel::Type{D} = Normal,
npoints::Int=2048) where {D <: Distributions.Distribution, T}
Construct an uncertain value by a kernel density estimate to data
.
Fast Fourier transforms are used in the kernel density estimation, so the number of points should be a power of 2 (default = 2048).
Additional keyword arguments and examples
If the only argument to the UncertainValue
constructor is a vector of values, the default behaviour is to represent the distribution by a kernel density estimate (KDE), i.e. UncertainValue(data)
. Gaussian kernels are used by default. The syntax UncertainValue(UnivariateKDE, data)
will also work if KernelDensity.jl
is loaded.