Kernel density estimated distributions

When your data have an empirical distribution that doesn't follow any obvious theoretical distribution, the data may be represented by a kernel density estimate.

Generic constructor

UncertainData.UncertainValues.UncertainValueMethod
UncertainValue(values::Vector, probs::Union{Vector, AbstractWeights})

Construct a population whose members are given by values and whose sampling probabilities are given by probs. The elements of values can be either numeric or uncertain values of any type.

source
UncertainValue(data::Vector{T};
    kernel::Type{D} = Normal,
    npoints::Int=2048) where {D <: Distributions.Distribution, T}

Construct an uncertain value by a kernel density estimate to data.

Fast Fourier transforms are used in the kernel density estimation, so the number of points should be a power of 2 (default = 2048).

source
UncertainValue(kerneldensity::Type{K}, data::Vector{T};
    kernel::Type{D} = Normal,
    npoints::Int=2048) where {K <: UnivariateKDE, D <: Distribution, T}

Construct an uncertain value by a kernel density estimate to data.

Fast Fourier transforms are used in the kernel density estimation, so the number of points should be a power of 2 (default = 2048).

source
UncertainValue(empiricaldata::AbstractVector{T},
    d::Type{D}) where {D <: Distribution}

Constructor for empirical distributions.

Fit a distribution of type d to the data and use that as the representation of the empirical distribution. Calls Distributions.fit behind the scenes.

Arguments

  • empiricaldata: The data for which to fit the distribution.
  • distribution: A valid univariate distribution from Distributions.jl.
source

Type documentation

UncertainData.UncertainValues.UncertainScalarKDEType
UncertainScalarKDE(d::KernelDensity.UnivariateKDE, values::AbstractVector{T}, range, pdf)

An empirical value represented by a distribution estimated from actual data.

Fields

  • distribution: The UnivariateKDE estimate for the distribution of values.
  • values: The values from which distribution is estimated.
  • range: The values for which the pdf is estimated.
  • pdf: The values of the pdf at each point in range.
source

Examples

using Distributions, UncertainData

# Create a normal distribution
d = Normal()

# Draw a 1000-point sample from the distribution.
some_sample = rand(d, 1000)

# Use the implicit KDE constructor to create the uncertain value
uv = UncertainValue(v::Vector)
using Distributions, UncertainData, KernelDensity

# Create a normal distribution
d = Normal()

# Draw a 1000-point sample from the distribution.
some_sample = rand(d, 1000)

# Use the explicit KDE constructor to create the uncertain value.
# This constructor follows the same convention as when fitting distributions
# to empirical data, so this is the recommended way to construct KDE estimates.
uv = UncertainValue(UnivariateKDE, v::Vector)
using Distributions, UncertainData, KernelDensity

# Create a normal distribution
d = Normal()

# Draw a 1000-point sample from the distribution.
some_sample = rand(d, 1000)

# Use the explicit KDE constructor to create the uncertain value, specifying
# that we want to use normal distributions as the kernel. The kernel can be
# any valid kernel from Distributions.jl, and the default is to use normal
# distributions.
uv = UncertainValue(UnivariateKDE, v::Vector; kernel = Normal)
using Distributions, UncertainData, KernelDensity

# Create a normal distribution
d = Normal()

# Draw a 1000-point sample from the distribution.
some_sample = rand(d, 1000)

# Use the explicit KDE constructor to create the uncertain value, specifying
# the number of points we want to use for the kernel density estimate. Fast
# Fourier transforms are used behind the scenes, so the number of points
# should be a power of 2 (the default is 2048 points).
uv = UncertainValue(UnivariateKDE, v::Vector; npoints = 1024)

Extended example

Let's create a bimodal distribution, then sample 10000 values from it.

using Distributions

n1 = Normal(-3.0, 1.2)
n2 = Normal(8.0, 1.2)
n3 = Normal(0.0, 2.5)

# Use a mixture model to create a bimodal distribution
M = MixtureModel([n1, n2, n3])

# Sample the mixture model.
samples_empirical = rand(M, Int(1e4));

It is not obvious which distribution to fit to such data.

A kernel density estimate, however, will always be a decent representation of the data, because it doesn't follow a specific distribution and adapts to the data values.

To create a kernel density estimate, simply call the UncertainValue(v::Vector{Number}) constructor with a vector containing the sample:

uv = UncertainValue(samples_empirical)

The plot below compares the empirical histogram (here represented as a density plot) with our kernel density estimate.

using Plots, StatPlots, UncertainData
uv = UncertainValue(samples_empirical)
density(mvals, label = "10000 mixture model (M) samples")
density!(rand(uv, Int(1e4)),
    label = "10000 samples from KDE estimate to M")
xlabel!("data value")
ylabel!("probability density")

Constructor

UncertainData.UncertainValues.UncertainValueMethod
UncertainValue(data::Vector{T};
    kernel::Type{D} = Normal,
    npoints::Int=2048) where {D <: Distributions.Distribution, T}

Construct an uncertain value by a kernel density estimate to data.

Fast Fourier transforms are used in the kernel density estimation, so the number of points should be a power of 2 (default = 2048).

source

Additional keyword arguments and examples

If the only argument to the UncertainValue constructor is a vector of values, the default behaviour is to represent the distribution by a kernel density estimate (KDE), i.e. UncertainValue(data). Gaussian kernels are used by default. The syntax UncertainValue(UnivariateKDE, data) will also work if KernelDensity.jl is loaded.