

Binning

Binning scalar values¶

Bin values¶

# UncertainData.bin — Method.

1	bin(left_bin_edges::AbstractRange, xs, ys) -> Vector{Vector{T}} where T

Distribute the elements of ys into N-1 different bin vectors, based on how the values in xs are distributed among the bins defined by the N grid points in left_bin_edges. If xs[i] falls in the n-th bin interval, then ys[i] is assigned to the n-th bin vector. If xs[i] lie outside the grid, then the corresponding ys[i] is ignored. See also bin!

Returns N - 1 bin vectors.

Examples

Getting the values in each bin:

xs = [1.2, 1.7, 2.2, 3.3, 4.5, 4.6, 7.1]
ys = [4.2, 5.1, 6.5, 4.2, 3.2, 3.1, 2.5]
left_bin_edges = 0.0:1.0:6.0
bin(left_bin_edges, xs, ys)

# Some example data with unevenly spaced time indices
npts = 300
time, vals = sort(rand(1:1000, npts)), rand(npts)

# See which values fall in 25 time step wide time bins ranging 
# from time indices 100 to 900.
left_bin_edges = 100:25:900

bin(left_bin_edges, time, vals)

source

# UncertainData.InterpolationAndGrids.bin! — Method.

1	bin!(bins::Vector{AbstractVector{T}}, left_bin_edges::AbstractRange{T}, xs, ys) where T

Distribute the elements of ys into N-1 different pre-allocated empty bin vectors, based on how the values in xs are distributed among the bins defined by the N grid points in left_bin_edges. bins must be a vector of vector-like mutable containers.

If xs[i] falls in the n-th bin interval, then ys[i] is assigned to the n-th bin vector. If xs[i] lie outside the grid, the corresponding ys[i] is ignored.

Bin summaries¶

# UncertainData.bin — Method.

1	bin(f::Function, left_bin_edges::AbstractRange, xs, ys, args...; kwargs...) -> Vector{T} where T

Distribute the elements of ys into N-1 different bin vectors, based on how the values in xs are distributed among the bins defined by the N grid points in left_bin_edges. If xs[i] falls in the n-th bin interval, then ys[i] is assigned to the n-th bin vector. If xs[i] lie outside the grid, then the corresponding ys[i] is ignored. See also bin!

Then, apply the summary function element-wise to each of the bin vectors, with args and kwargs as arguments and keyword arguments. Then, N-1 summary values, one for each bin, are returned. Empty bins are assigned NaN values.

Returns N-1 bin summaries.

Examples

Applying a summary function to each bin

Any function that accepts a vector of values can be used in conjunction with bin.

xs = [1.2, 1.7, 2.2, 3.3, 4.5, 4.6, 7.1]
ys = [4.2, 5.1, 6.5, 4.2, 3.2, 3.1, 2.5]
left_bin_edges = 0.0:1.0:6.0
bin(median, left_bin_edges, xs, ys)

Functions with additional arguments also work (arguments and keyword arguments must be supplied last in the function call):

xs = [1.2, 1.7, 2.2, 3.3, 4.5, 4.6, 7.1]
ys = [4.2, 5.1, 6.5, 4.2, 3.2, 3.1, 2.5]
left_bin_edges = 0.0:1.0:6.0
bin(quantile, left_bin_edges, xs, ys, [0.1])

source

Fast bin summaries¶

# UncertainData.InterpolationAndGrids.bin_mean — Function.

1	bin_mean(left_bin_edges::AbstractRange, xs, ys)

Distribute the elements of ys into N - 1 different bin vectors, based on how the values in xs are distributed among the bins defined by the N grid points in left_bin_edges. Then compute the bin mean for each bin.

If xs[i] falls in the n-th bin interval, then ys[i] is assigned to the n-th bin vector. If values fall outside the grid, they are ignored (if xs[i] < minimum(left_bin_edges), ignore ys[i]). After the ys values have been assigned to bin vectors, apply the summary function f element-wise to each of the bin vectors, with args and kwargs as arguments and keyword arguments.

Returns N - 1 mean values, one for each bin.

Examples

xs = [1.2, 1.7, 2.2, 3.3, 4.5, 4.6, 7.1]
ys = [4.2, 5.1, 6.5, 4.2, 3.2, 3.2, 2.5]
left_bin_edges = 0.0:1.0:6.0
bin_mean(left_bin_edges, xs, ys)

# output
6-element Array{Float64,1}:
 NaN   
   4.65
   6.5 
   4.2 
   3.2 
 NaN

source

Binning uncertain data¶

Bin values¶

# UncertainData.bin — Method.

1	bin(x::AbstractUncertainIndexValueDataset, binning::BinnedResampling{RawValues}) -> Tuple(Vector, Vector{Vector})

Resample every element of x the number of times given by binning.n. After resampling, distribute the values according to their indices, into the N bins given by binning.left_bin_edges.

Returns

Return a tuple containing the N different bin centers and a N-length vector of resampled values whose resampled indices fall in the N different bins.

Example

# Some example data with unevenly spaced time indices
npts = 300
time, vals = sort(rand(1:1000, npts)), rand(npts)

# Add uncertainties to indices and values, and represent as 
# UncertainIndexValueDataset 
utime = [UncertainValue(Normal, t, 10) for t in time]
uvals = [UncertainValue(Normal, v, 0.1) for v in vals]

udata = UncertainIndexValueDataset(utime, uvals)

# Bin data into fall in 25 time step wide time bins ranging 
# from time indices 100 to 900 and return a vector of raw 
# values for each bin. Do this by resampling each uncertain
# data point 10000 times and distributing those draws among 
# the bins.
left_bin_edges = 100:25:900
n_draws = 10000
binning = BinnedResampling(RawValues, left_bin_edges, n_draws)

bin_centers, bin_draws = bin(udata, binning)

source

# UncertainData.bin — Method.

1 2	bin(x::AbstractUncertainIndexValueDataset, binning::BinnedWeightedResampling{RawValues}) -> Tuple(Vector, Vector{Vector})

Resample every element of x a number of times. After resampling, distribute the values according to their indices, into the N bins given by the N-1-element grid defined by binning.left_bin_edges. In total, length(x)*binning.n draws are distributed among the bins. The precise number of times x[i] is resampled is given by the binning.weights[i] (probability weights are always normalised to 1).

Returns

Return a tuple containing the N different bin centers and a N-length vector of resampled values whose resampled indices fall in the N different bins.

Example

using Plots, UncertainData
# Some example data with unevenly spaced time indices
function ar1(n::Int, x0 = 0.5, p = 0.3)
    vals = zeros(n)
    [vals[i] = vals[i - 1]*p + rand()*0.5 for i = 2:n]
    return vals
end

npts = 50
time, vals = sort(rand(1:1000, npts)), ar1(npts)

# Add uncertainties to indices and values, and represent as 
# UncertainIndexValueDataset 
utime = [UncertainValue(Normal, t, 5) for t in time]
uvals = [UncertainValue(Normal, v, 0.03) for v in vals]

udata = UncertainIndexValueDataset(utime, uvals)

# Bin data into fall in 25 time step wide time bins ranging 
# from time indices 100 to 900 and return a vector of raw 
# values for each bin. Do this by resampling each uncertain
# data point on average 10000 times and distributing those 
# draws among the bins. 
time_grid = 100:40:900
n_draws = 5000
# Let odd-indexed values be three times as likely to be 
# sampled compared to even-indexed values.
wts = Weights([i % 2 == 0 ? 1 : 3 for i = 1:length(udata)])
binning = BinnedWeightedResampling(RawValues, time_grid, wts, n_draws)

bin_centers, bin_draws = bin(udata, binning);

source

Bin summaries¶

# UncertainData.bin — Method.

1
2

bin(x::AbstractUncertainIndexValueDataset, binning::BinnedResampling{UncertainScalarKDE}) -> UncertainIndexValueDataset
bin(x::AbstractUncertainIndexValueDataset, binning::BinnedResampling{UncertainScalarPopulation}) -> UncertainIndexValueDataset

Resample every element of x the number of times given by binning.n. After resampling, distribute the values according to their indices, into the bins given by binning.left_bin_edges.

Returns

Returns an UncertainIndexValueDataset. Indices are assumed to be uniformly distributed within each bin, and are represented as CertainValues at the bin centers. Values of the dataset have different representations depending on what binning is:

If binning isa BinnedResampling{UncertainScalarKDE}, then values in each bin are represented by a kernel density estimate to the distribution of the resampled values whose resampled indices fall in that bin.
If binning isa BinnedResampling{UncertainScalarPopulation}, then values in each bin are represented by equiprobable populations consisting of the resampled values whose resampled indices fall in the bins.

source

# UncertainData.bin — Method.

1
2

bin(x::AbstractUncertainIndexValueDataset, binning::BinnedWeightedResampling{UncertainScalarKDE}) -> UncertainIndexValueDataset
bin(x::AbstractUncertainIndexValueDataset, binning::BinnedWeightedResampling{UncertainScalarPopulation}) -> UncertainIndexValueDataset

Resample every element of x a number of times. After resampling, distribute the values according to their indices, into the N bins given by the N-1-element grid defined by binning.left_bin_edges. In total, length(x)*binning.n draws are distributed among the bins. The precise number of times x[i] is resampled is given by binning.weights[i] (probability weights are always normalised to 1).

Returns

Returns an UncertainIndexValueDataset. Indices are assumed to be uniformly distributed within each bin, and are represented as CertainValues at the bin centers. Values of the dataset have different representations depending on what binning is:

If binning isa BinnedWeightedResampling{UncertainScalarKDE}, then values in each bin are represented by a kernel density estimate to the distribution of the resampled values whose resampled indices fall in that bin.
If binning isa BinnedWeightedResampling{UncertainScalarPopulation}, then values in each bin are represented by equiprobable populations consisting of the resampled values whose resampled indices fall in the bins.

source