Binning scalar values
Bin values
UncertainData.bin
— Methodbin(left_bin_edges::AbstractRange, xs, ys) -> Vector{Vector{T}} where T
Distribute the elements of ys
into N-1
different bin vectors, based on how the values in xs
are distributed among the bins defined by the N
grid points in left_bin_edges
. If xs[i]
falls in the n
-th bin interval, then ys[i]
is assigned to the n
-th bin vector. If xs[i]
lie outside the grid, then the corresponding ys[i]
is ignored. See also bin!
Returns N - 1
bin vectors.
Examples
Getting the values in each bin:
xs = [1.2, 1.7, 2.2, 3.3, 4.5, 4.6, 7.1]
ys = [4.2, 5.1, 6.5, 4.2, 3.2, 3.1, 2.5]
left_bin_edges = 0.0:1.0:6.0
bin(left_bin_edges, xs, ys)
# Some example data with unevenly spaced time indices
npts = 300
time, vals = sort(rand(1:1000, npts)), rand(npts)
# See which values fall in 25 time step wide time bins ranging
# from time indices 100 to 900.
left_bin_edges = 100:25:900
bin(left_bin_edges, time, vals)
UncertainData.InterpolationAndGrids.bin!
— Methodbin!(bins::Vector{AbstractVector{T}}, left_bin_edges::AbstractRange{T}, xs, ys) where T
Distribute the elements of ys
into N-1
different pre-allocated empty bin vectors, based on how the values in xs
are distributed among the bins defined by the N
grid points in left_bin_edges
. bins
must be a vector of vector-like mutable containers.
If xs[i]
falls in the n
-th bin interval, then ys[i]
is assigned to the n
-th bin vector. If xs[i]
lie outside the grid, the corresponding ys[i]
is ignored.
See also bin(::AbstractRange)
.
Bin summaries
UncertainData.bin
— Methodbin(f::Function, left_bin_edges::AbstractRange, xs, ys, args...; kwargs...) -> Vector{T} where T
Distribute the elements of ys
into N-1
different bin vectors, based on how the values in xs
are distributed among the bins defined by the N
grid points in left_bin_edges
. If xs[i]
falls in the n
-th bin interval, then ys[i]
is assigned to the n
-th bin vector. If xs[i]
lie outside the grid, then the corresponding ys[i]
is ignored. See also bin!
Then, apply the summary function element-wise to each of the bin vectors, with args
and kwargs
as arguments and keyword arguments. Then, N-1
summary values, one for each bin, are returned. Empty bins are assigned NaN
values.
Returns N-1
bin summaries.
Examples
Applying a summary function to each bin
Any function that accepts a vector of values can be used in conjunction with bin
.
xs = [1.2, 1.7, 2.2, 3.3, 4.5, 4.6, 7.1]
ys = [4.2, 5.1, 6.5, 4.2, 3.2, 3.1, 2.5]
left_bin_edges = 0.0:1.0:6.0
bin(median, left_bin_edges, xs, ys)
Functions with additional arguments also work (arguments and keyword arguments must be supplied last in the function call):
xs = [1.2, 1.7, 2.2, 3.3, 4.5, 4.6, 7.1]
ys = [4.2, 5.1, 6.5, 4.2, 3.2, 3.1, 2.5]
left_bin_edges = 0.0:1.0:6.0
bin(quantile, left_bin_edges, xs, ys, [0.1])
Fast bin summaries
UncertainData.InterpolationAndGrids.bin_mean
— Functionbin_mean(left_bin_edges::AbstractRange, xs, ys)
Distribute the elements of ys
into N - 1
different bin vectors, based on how the values in xs
are distributed among the bins defined by the N
grid points in left_bin_edges
. Then compute the bin mean for each bin.
If xs[i]
falls in the n
-th bin interval, then ys[i]
is assigned to the n
-th bin vector. If values fall outside the grid, they are ignored (if xs[i] < minimum(left_bin_edges)
, ignore ys[i]
). After the ys
values have been assigned to bin vectors, apply the summary function f
element-wise to each of the bin vectors, with args
and kwargs
as arguments and keyword arguments.
Returns N - 1
mean values, one for each bin.
Examples
xs = [1.2, 1.7, 2.2, 3.3, 4.5, 4.6, 7.1]
ys = [4.2, 5.1, 6.5, 4.2, 3.2, 3.2, 2.5]
left_bin_edges = 0.0:1.0:6.0
bin_mean(left_bin_edges, xs, ys)
# output
6-element Array{Float64,1}:
NaN
4.65
6.5
4.2
3.2
NaN
Binning uncertain data
Bin values
UncertainData.bin
— Methodbin(x::AbstractUncertainIndexValueDataset, binning::BinnedResampling{RawValues}) -> Tuple(Vector, Vector{Vector})
Resample every element of x
the number of times given by binning.n
. After resampling, distribute the values according to their indices, into the N
bins given by binning.left_bin_edges
.
Returns
Return a tuple containing the N
different bin centers and a N
-length vector of resampled values whose resampled indices fall in the N
different bins.
Example
# Some example data with unevenly spaced time indices
npts = 300
time, vals = sort(rand(1:1000, npts)), rand(npts)
# Add uncertainties to indices and values, and represent as
# UncertainIndexValueDataset
utime = [UncertainValue(Normal, t, 10) for t in time]
uvals = [UncertainValue(Normal, v, 0.1) for v in vals]
udata = UncertainIndexValueDataset(utime, uvals)
# Bin data into fall in 25 time step wide time bins ranging
# from time indices 100 to 900 and return a vector of raw
# values for each bin. Do this by resampling each uncertain
# data point 10000 times and distributing those draws among
# the bins.
left_bin_edges = 100:25:900
n_draws = 10000
binning = BinnedResampling(RawValues, left_bin_edges, n_draws)
bin_centers, bin_draws = bin(udata, binning)
UncertainData.bin
— Methodbin(x::AbstractUncertainIndexValueDataset,
binning::BinnedWeightedResampling{RawValues}) -> Tuple(Vector, Vector{Vector})
Resample every element of x
a number of times. After resampling, distribute the values according to their indices, into the N
bins given by the N-1
-element grid defined by binning.left_bin_edges
. In total, length(x)*binning.n
draws are distributed among the bins. The precise number of times x[i]
is resampled is given by the binning.weights[i]
(probability weights are always normalised to 1).
Returns
Return a tuple containing the N
different bin centers and a N
-length vector of resampled values whose resampled indices fall in the N
different bins.
Example
using Plots, UncertainData
# Some example data with unevenly spaced time indices
function ar1(n::Int, x0 = 0.5, p = 0.3)
vals = zeros(n)
[vals[i] = vals[i - 1]*p + rand()*0.5 for i = 2:n]
return vals
end
npts = 50
time, vals = sort(rand(1:1000, npts)), ar1(npts)
# Add uncertainties to indices and values, and represent as
# UncertainIndexValueDataset
utime = [UncertainValue(Normal, t, 5) for t in time]
uvals = [UncertainValue(Normal, v, 0.03) for v in vals]
udata = UncertainIndexValueDataset(utime, uvals)
# Bin data into fall in 25 time step wide time bins ranging
# from time indices 100 to 900 and return a vector of raw
# values for each bin. Do this by resampling each uncertain
# data point on average 10000 times and distributing those
# draws among the bins.
time_grid = 100:40:900
n_draws = 5000
# Let odd-indexed values be three times as likely to be
# sampled compared to even-indexed values.
wts = Weights([i % 2 == 0 ? 1 : 3 for i = 1:length(udata)])
binning = BinnedWeightedResampling(RawValues, time_grid, wts, n_draws)
bin_centers, bin_draws = bin(udata, binning);
Bin summaries
UncertainData.bin
— Methodbin(x::AbstractUncertainIndexValueDataset, binning::BinnedResampling{UncertainScalarKDE}) -> UncertainIndexValueDataset
bin(x::AbstractUncertainIndexValueDataset, binning::BinnedResampling{UncertainScalarPopulation}) -> UncertainIndexValueDataset
Resample every element of x
the number of times given by binning.n
. After resampling, distribute the values according to their indices, into the bins given by binning.left_bin_edges
.
Returns
Returns an UncertainIndexValueDataset
. Indices are assumed to be uniformly distributed within each bin, and are represented as CertainValue
s at the bin centers. Values of the dataset have different representations depending on what binning
is:
- If
binning isa BinnedResampling{UncertainScalarKDE}
, then values in each bin are represented by a kernel density estimate to the distribution of the resampled values whose resampled indices fall in that bin. - If
binning isa BinnedResampling{UncertainScalarPopulation}
, then values in each bin are represented by equiprobable populations consisting of the resampled values whose resampled indices fall in the bins.
UncertainData.bin
— Methodbin(x::AbstractUncertainIndexValueDataset, binning::BinnedWeightedResampling{UncertainScalarKDE}) -> UncertainIndexValueDataset
bin(x::AbstractUncertainIndexValueDataset, binning::BinnedWeightedResampling{UncertainScalarPopulation}) -> UncertainIndexValueDataset
Resample every element of x
a number of times. After resampling, distribute the values according to their indices, into the N
bins given by the N-1
-element grid defined by binning.left_bin_edges
. In total, length(x)*binning.n
draws are distributed among the bins. The precise number of times x[i]
is resampled is given by binning.weights[i]
(probability weights are always normalised to 1).
Returns
Returns an UncertainIndexValueDataset
. Indices are assumed to be uniformly distributed within each bin, and are represented as CertainValue
s at the bin centers. Values of the dataset have different representations depending on what binning
is:
- If
binning isa BinnedWeightedResampling{UncertainScalarKDE}
, then values in each bin are represented by a kernel density estimate to the distribution of the resampled values whose resampled indices fall in that bin. - If
binning isa BinnedWeightedResampling{UncertainScalarPopulation}
, then values in each bin are represented by equiprobable populations consisting of the resampled values whose resampled indices fall in the bins.