

Kernel density estimates (KDE)

Kernel density estimated distributions¶

When your data have an empirical distribution that doesn't follow any obvious theoretical distribution, the data may be represented by a kernel density estimate.

Generic constructor¶

# UncertainData.UncertainValues.UncertainValue — Method.

1	UncertainValue(values::Vector, probs::Union{Vector, AbstractWeights})

Construct a population whose members are given by values and whose sampling probabilities are given by probs. The elements of values can be either numeric or uncertain values of any type.

source

1
2
3

UncertainValue(data::Vector{T};
    kernel::Type{D} = Normal,
    npoints::Int=2048) where {D <: Distributions.Distribution, T}

Construct an uncertain value by a kernel density estimate to data.

Fast Fourier transforms are used in the kernel density estimation, so the number of points should be a power of 2 (default = 2048).

source

1
2
3

UncertainValue(kerneldensity::Type{K}, data::Vector{T};
    kernel::Type{D} = Normal,
    npoints::Int=2048) where {K <: UnivariateKDE, D <: Distribution, T}

Construct an uncertain value by a kernel density estimate to data.

Fast Fourier transforms are used in the kernel density estimation, so the number of points should be a power of 2 (default = 2048).

source

1 2	UncertainValue(empiricaldata::AbstractVector{T}, d::Type{D}) where {D <: Distribution}

Constructor for empirical distributions.

Fit a distribution of type d to the data and use that as the representation of the empirical distribution. Calls Distributions.fit behind the scenes.

Arguments

empiricaldata: The data for which to fit the distribution.
distribution: A valid univariate distribution from Distributions.jl.

source

Type documentation¶

# UncertainData.UncertainValues.UncertainScalarKDE — Type.

1	UncertainScalarKDE

An empirical value represented by a distribution estimated from actual data.

Fields

distribution: The UnvariateKDE estimate for the distribution of values.
values: The values from which distribution is estimated.
range: The values for which the pdf is estimated.
pdf: The values of the pdf at each point in range.

source

Examples¶

Implicit KDE constructor

using Distributions, UncertainData

# Create a normal distribution
d = Normal()

# Draw a 1000-point sample from the distribution.
some_sample = rand(d, 1000)

# Use the implicit KDE constructor to create the uncertain value
uv = UncertainValue(v::Vector)

Explicit KDE constructor

using Distributions, UncertainData, KernelDensity

# Create a normal distribution
d = Normal()

# Draw a 1000-point sample from the distribution.
some_sample = rand(d, 1000)

# Use the explicit KDE constructor to create the uncertain value.
# This constructor follows the same convention as when fitting distributions
# to empirical data, so this is the recommended way to construct KDE estimates.
uv = UncertainValue(UnivariateKDE, v::Vector)

Changing the kernel

using Distributions, UncertainData, KernelDensity

# Create a normal distribution
d = Normal()

# Draw a 1000-point sample from the distribution.
some_sample = rand(d, 1000)

# Use the explicit KDE constructor to create the uncertain value, specifying
# that we want to use normal distributions as the kernel. The kernel can be
# any valid kernel from Distributions.jl, and the default is to use normal
# distributions.
uv = UncertainValue(UnivariateKDE, v::Vector; kernel = Normal)

Adjusting number of points

using Distributions, UncertainData, KernelDensity

# Create a normal distribution
d = Normal()

# Draw a 1000-point sample from the distribution.
some_sample = rand(d, 1000)

# Use the explicit KDE constructor to create the uncertain value, specifying
# the number of points we want to use for the kernel density estimate. Fast
# Fourier transforms are used behind the scenes, so the number of points
# should be a power of 2 (the default is 2048 points).
uv = UncertainValue(UnivariateKDE, v::Vector; npoints = 1024)

Extended example¶

Let's create a bimodal distribution, then sample 10000 values from it.

using Distributions

n1 = Normal(-3.0, 1.2)
n2 = Normal(8.0, 1.2)
n3 = Normal(0.0, 2.5)

# Use a mixture model to create a bimodal distribution
M = MixtureModel([n1, n2, n3])

# Sample the mixture model.
samples_empirical = rand(M, Int(1e4));

It is not obvious which distribution to fit to such data.

A kernel density estimate, however, will always be a decent representation of the data, because it doesn't follow a specific distribution and adapts to the data values.

To create a kernel density estimate, simply call the UncertainValue(v::Vector{Number}) constructor with a vector containing the sample:

1	uv = UncertainValue(samples_empirical)

The plot below compares the empirical histogram (here represented as a density plot) with our kernel density estimate.

using Plots, StatPlots, UncertainData
uv = UncertainValue(samples_empirical)
density(mvals, label = "10000 mixture model (M) samples")
density!(rand(uv, Int(1e4)),
    label = "10000 samples from KDE estimate to M")
xlabel!("data value")
ylabel!("probability density")

Constructor¶

# UncertainData.UncertainValues.UncertainValue — Method.

1
2
3

UncertainValue(data::Vector{T};
    kernel::Type{D} = Normal,
    npoints::Int=2048) where {D <: Distributions.Distribution, T}

Construct an uncertain value by a kernel density estimate to data.

Fast Fourier transforms are used in the kernel density estimation, so the number of points should be a power of 2 (default = 2048).

source

Additional keyword arguments and examples¶

If the only argument to the UncertainValue constructor is a vector of values, the default behaviour is to represent the distribution by a kernel density estimate (KDE), i.e. UncertainValue(data). Gaussian kernels are used by default. The syntax UncertainValue(UnivariateKDE, data) will also work if KernelDensity.jl is loaded.