Kernel density estimates (KDE)
Kernel density estimated distributions¶
When your data have an empirical distribution that doesn't follow any obvious theoretical distribution, the data may be represented by a kernel density estimate.
Generic constructor¶
#
UncertainData.UncertainValues.UncertainValue
— Method.
1 | UncertainValue(values::Vector, probs::Union{Vector, AbstractWeights}) |
Construct a population whose members are given by values
and whose sampling probabilities are given by probs
. The elements of values
can be either numeric or uncertain values of any type.
1 2 3 | UncertainValue(data::Vector{T}; kernel::Type{D} = Normal, npoints::Int=2048) where {D <: Distributions.Distribution, T} |
Construct an uncertain value by a kernel density estimate to data
.
Fast Fourier transforms are used in the kernel density estimation, so the number of points should be a power of 2 (default = 2048).
1 2 3 | UncertainValue(kerneldensity::Type{K}, data::Vector{T}; kernel::Type{D} = Normal, npoints::Int=2048) where {K <: UnivariateKDE, D <: Distribution, T} |
Construct an uncertain value by a kernel density estimate to data
.
Fast Fourier transforms are used in the kernel density estimation, so the number of points should be a power of 2 (default = 2048).
1 2 | UncertainValue(empiricaldata::AbstractVector{T}, d::Type{D}) where {D <: Distribution} |
Constructor for empirical distributions.
Fit a distribution of type d
to the data and use that as the representation of the empirical distribution. Calls Distributions.fit
behind the scenes.
Arguments
empiricaldata
: The data for which to fit thedistribution
.distribution
: A valid univariate distribution fromDistributions.jl
.
Type documentation¶
#
UncertainData.UncertainValues.UncertainScalarKDE
— Type.
1 | UncertainScalarKDE
|
An empirical value represented by a distribution estimated from actual data.
Fields
distribution
: TheUnvariateKDE
estimate for the distribution ofvalues
.values
: The values from whichdistribution
is estimated.range
: The values for which the pdf is estimated.pdf
: The values of the pdf at each point inrange
.
Examples¶
1 2 3 4 5 6 7 8 9 10 | using Distributions, UncertainData # Create a normal distribution d = Normal() # Draw a 1000-point sample from the distribution. some_sample = rand(d, 1000) # Use the implicit KDE constructor to create the uncertain value uv = UncertainValue(v::Vector) |
1 2 3 4 5 6 7 8 9 10 11 12 | using Distributions, UncertainData, KernelDensity # Create a normal distribution d = Normal() # Draw a 1000-point sample from the distribution. some_sample = rand(d, 1000) # Use the explicit KDE constructor to create the uncertain value. # This constructor follows the same convention as when fitting distributions # to empirical data, so this is the recommended way to construct KDE estimates. uv = UncertainValue(UnivariateKDE, v::Vector) |
1 2 3 4 5 6 7 8 9 10 11 12 13 | using Distributions, UncertainData, KernelDensity # Create a normal distribution d = Normal() # Draw a 1000-point sample from the distribution. some_sample = rand(d, 1000) # Use the explicit KDE constructor to create the uncertain value, specifying # that we want to use normal distributions as the kernel. The kernel can be # any valid kernel from Distributions.jl, and the default is to use normal # distributions. uv = UncertainValue(UnivariateKDE, v::Vector; kernel = Normal) |
1 2 3 4 5 6 7 8 9 10 11 12 13 | using Distributions, UncertainData, KernelDensity # Create a normal distribution d = Normal() # Draw a 1000-point sample from the distribution. some_sample = rand(d, 1000) # Use the explicit KDE constructor to create the uncertain value, specifying # the number of points we want to use for the kernel density estimate. Fast # Fourier transforms are used behind the scenes, so the number of points # should be a power of 2 (the default is 2048 points). uv = UncertainValue(UnivariateKDE, v::Vector; npoints = 1024) |
Extended example¶
Let's create a bimodal distribution, then sample 10000 values from it.
1 2 3 4 5 6 7 8 9 10 11 | using Distributions n1 = Normal(-3.0, 1.2) n2 = Normal(8.0, 1.2) n3 = Normal(0.0, 2.5) # Use a mixture model to create a bimodal distribution M = MixtureModel([n1, n2, n3]) # Sample the mixture model. samples_empirical = rand(M, Int(1e4)); |
It is not obvious which distribution to fit to such data.
A kernel density estimate, however, will always be a decent representation of the data, because it doesn't follow a specific distribution and adapts to the data values.
To create a kernel density estimate, simply call the UncertainValue(v::Vector{Number})
constructor with a vector containing the sample:
1 | uv = UncertainValue(samples_empirical) |
The plot below compares the empirical histogram (here represented as a density plot) with our kernel density estimate.
1 2 3 4 5 6 7 | using Plots, StatPlots, UncertainData uv = UncertainValue(samples_empirical) density(mvals, label = "10000 mixture model (M) samples") density!(rand(uv, Int(1e4)), label = "10000 samples from KDE estimate to M") xlabel!("data value") ylabel!("probability density") |
Constructor¶
#
UncertainData.UncertainValues.UncertainValue
— Method.
1 2 3 | UncertainValue(data::Vector{T}; kernel::Type{D} = Normal, npoints::Int=2048) where {D <: Distributions.Distribution, T} |
Construct an uncertain value by a kernel density estimate to data
.
Fast Fourier transforms are used in the kernel density estimation, so the number of points should be a power of 2 (default = 2048).
Additional keyword arguments and examples¶
If the only argument to the UncertainValue
constructor is a vector of values, the default behaviour is to represent the distribution by a kernel density estimate (KDE), i.e. UncertainValue(data)
. Gaussian kernels are used by default. The syntax UncertainValue(UnivariateKDE, data)
will also work if KernelDensity.jl
is loaded.