Binning
Binning scalar values¶
Bin values¶
#
UncertainData.bin
— Method.
1 | bin(left_bin_edges::AbstractRange, xs, ys) -> Vector{Vector{T}} where T |
Distribute the elements of ys
into N-1
different bin vectors, based on how the values in xs
are distributed among the bins defined by the N
grid points in left_bin_edges
. If xs[i]
falls in the n
-th bin interval, then ys[i]
is assigned to the n
-th bin vector. If xs[i]
lie outside the grid, then the corresponding ys[i]
is ignored. See also bin!
Returns N - 1
bin vectors.
Examples
Getting the values in each bin:
1 2 3 4 | xs = [1.2, 1.7, 2.2, 3.3, 4.5, 4.6, 7.1] ys = [4.2, 5.1, 6.5, 4.2, 3.2, 3.1, 2.5] left_bin_edges = 0.0:1.0:6.0 bin(left_bin_edges, xs, ys) |
1 2 3 4 5 6 7 8 9 | # Some example data with unevenly spaced time indices npts = 300 time, vals = sort(rand(1:1000, npts)), rand(npts) # See which values fall in 25 time step wide time bins ranging # from time indices 100 to 900. left_bin_edges = 100:25:900 bin(left_bin_edges, time, vals) |
#
UncertainData.InterpolationAndGrids.bin!
— Method.
1 | bin!(bins::Vector{AbstractVector{T}}, left_bin_edges::AbstractRange{T}, xs, ys) where T |
Distribute the elements of ys
into N-1
different pre-allocated empty bin vectors, based on how the values in xs
are distributed among the bins defined by the N
grid points in left_bin_edges
. bins
must be a vector of vector-like mutable containers.
If xs[i]
falls in the n
-th bin interval, then ys[i]
is assigned to the n
-th bin vector. If xs[i]
lie outside the grid, the corresponding ys[i]
is ignored.
See also bin(::AbstractRange)
.
Bin summaries¶
#
UncertainData.bin
— Method.
1 | bin(f::Function, left_bin_edges::AbstractRange, xs, ys, args...; kwargs...) -> Vector{T} where T |
Distribute the elements of ys
into N-1
different bin vectors, based on how the values in xs
are distributed among the bins defined by the N
grid points in left_bin_edges
. If xs[i]
falls in the n
-th bin interval, then ys[i]
is assigned to the n
-th bin vector. If xs[i]
lie outside the grid, then the corresponding ys[i]
is ignored. See also bin!
Then, apply the summary function element-wise to each of the bin vectors, with args
and kwargs
as arguments and keyword arguments. Then, N-1
summary values, one for each bin, are returned. Empty bins are assigned NaN
values.
Returns N-1
bin summaries.
Examples
Applying a summary function to each bin
Any function that accepts a vector of values can be used in conjunction with bin
.
1 2 3 4 | xs = [1.2, 1.7, 2.2, 3.3, 4.5, 4.6, 7.1] ys = [4.2, 5.1, 6.5, 4.2, 3.2, 3.1, 2.5] left_bin_edges = 0.0:1.0:6.0 bin(median, left_bin_edges, xs, ys) |
Functions with additional arguments also work (arguments and keyword arguments must be supplied last in the function call):
1 2 3 4 | xs = [1.2, 1.7, 2.2, 3.3, 4.5, 4.6, 7.1] ys = [4.2, 5.1, 6.5, 4.2, 3.2, 3.1, 2.5] left_bin_edges = 0.0:1.0:6.0 bin(quantile, left_bin_edges, xs, ys, [0.1]) |
Fast bin summaries¶
#
UncertainData.InterpolationAndGrids.bin_mean
— Function.
1 | bin_mean(left_bin_edges::AbstractRange, xs, ys) |
Distribute the elements of ys
into N - 1
different bin vectors, based on how the values in xs
are distributed among the bins defined by the N
grid points in left_bin_edges
. Then compute the bin mean for each bin.
If xs[i]
falls in the n
-th bin interval, then ys[i]
is assigned to the n
-th bin vector. If values fall outside the grid, they are ignored (if xs[i] < minimum(left_bin_edges)
, ignore ys[i]
). After the ys
values have been assigned to bin vectors, apply the summary function f
element-wise to each of the bin vectors, with args
and kwargs
as arguments and keyword arguments.
Returns N - 1
mean values, one for each bin.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 | xs = [1.2, 1.7, 2.2, 3.3, 4.5, 4.6, 7.1] ys = [4.2, 5.1, 6.5, 4.2, 3.2, 3.2, 2.5] left_bin_edges = 0.0:1.0:6.0 bin_mean(left_bin_edges, xs, ys) # output 6-element Array{Float64,1}: NaN 4.65 6.5 4.2 3.2 NaN |
Binning uncertain data¶
Bin values¶
#
UncertainData.bin
— Method.
1 | bin(x::AbstractUncertainIndexValueDataset, binning::BinnedResampling{RawValues}) -> Tuple(Vector, Vector{Vector}) |
Resample every element of x
the number of times given by binning.n
. After resampling, distribute the values according to their indices, into the N
bins given by binning.left_bin_edges
.
Returns
Return a tuple containing the N
different bin centers and a N
-length vector of resampled values whose resampled indices fall in the N
different bins.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # Some example data with unevenly spaced time indices npts = 300 time, vals = sort(rand(1:1000, npts)), rand(npts) # Add uncertainties to indices and values, and represent as # UncertainIndexValueDataset utime = [UncertainValue(Normal, t, 10) for t in time] uvals = [UncertainValue(Normal, v, 0.1) for v in vals] udata = UncertainIndexValueDataset(utime, uvals) # Bin data into fall in 25 time step wide time bins ranging # from time indices 100 to 900 and return a vector of raw # values for each bin. Do this by resampling each uncertain # data point 10000 times and distributing those draws among # the bins. left_bin_edges = 100:25:900 n_draws = 10000 binning = BinnedResampling(RawValues, left_bin_edges, n_draws) bin_centers, bin_draws = bin(udata, binning) |
#
UncertainData.bin
— Method.
1 2 | bin(x::AbstractUncertainIndexValueDataset, binning::BinnedWeightedResampling{RawValues}) -> Tuple(Vector, Vector{Vector}) |
Resample every element of x
a number of times. After resampling, distribute the values according to their indices, into the N
bins given by the N-1
-element grid defined by binning.left_bin_edges
. In total, length(x)*binning.n
draws are distributed among the bins. The precise number of times x[i]
is resampled is given by the binning.weights[i]
(probability weights are always normalised to 1).
Returns
Return a tuple containing the N
different bin centers and a N
-length vector of resampled values whose resampled indices fall in the N
different bins.
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | using Plots, UncertainData # Some example data with unevenly spaced time indices function ar1(n::Int, x0 = 0.5, p = 0.3) vals = zeros(n) [vals[i] = vals[i - 1]*p + rand()*0.5 for i = 2:n] return vals end npts = 50 time, vals = sort(rand(1:1000, npts)), ar1(npts) # Add uncertainties to indices and values, and represent as # UncertainIndexValueDataset utime = [UncertainValue(Normal, t, 5) for t in time] uvals = [UncertainValue(Normal, v, 0.03) for v in vals] udata = UncertainIndexValueDataset(utime, uvals) # Bin data into fall in 25 time step wide time bins ranging # from time indices 100 to 900 and return a vector of raw # values for each bin. Do this by resampling each uncertain # data point on average 10000 times and distributing those # draws among the bins. time_grid = 100:40:900 n_draws = 5000 # Let odd-indexed values be three times as likely to be # sampled compared to even-indexed values. wts = Weights([i % 2 == 0 ? 1 : 3 for i = 1:length(udata)]) binning = BinnedWeightedResampling(RawValues, time_grid, wts, n_draws) bin_centers, bin_draws = bin(udata, binning); |
Bin summaries¶
#
UncertainData.bin
— Method.
1 2 | bin(x::AbstractUncertainIndexValueDataset, binning::BinnedResampling{UncertainScalarKDE}) -> UncertainIndexValueDataset bin(x::AbstractUncertainIndexValueDataset, binning::BinnedResampling{UncertainScalarPopulation}) -> UncertainIndexValueDataset |
Resample every element of x
the number of times given by binning.n
. After resampling, distribute the values according to their indices, into the bins given by binning.left_bin_edges
.
Returns
Returns an UncertainIndexValueDataset
. Indices are assumed to be uniformly distributed within each bin, and are represented as CertainValue
s at the bin centers. Values of the dataset have different representations depending on what binning
is:
- If
binning isa BinnedResampling{UncertainScalarKDE}
, then values in each bin are represented by a kernel density estimate to the distribution of the resampled values whose resampled indices fall in that bin. - If
binning isa BinnedResampling{UncertainScalarPopulation}
, then values in each bin are represented by equiprobable populations consisting of the resampled values whose resampled indices fall in the bins.
#
UncertainData.bin
— Method.
1 2 | bin(x::AbstractUncertainIndexValueDataset, binning::BinnedWeightedResampling{UncertainScalarKDE}) -> UncertainIndexValueDataset bin(x::AbstractUncertainIndexValueDataset, binning::BinnedWeightedResampling{UncertainScalarPopulation}) -> UncertainIndexValueDataset |
Resample every element of x
a number of times. After resampling, distribute the values according to their indices, into the N
bins given by the N-1
-element grid defined by binning.left_bin_edges
. In total, length(x)*binning.n
draws are distributed among the bins. The precise number of times x[i]
is resampled is given by binning.weights[i]
(probability weights are always normalised to 1).
Returns
Returns an UncertainIndexValueDataset
. Indices are assumed to be uniformly distributed within each bin, and are represented as CertainValue
s at the bin centers. Values of the dataset have different representations depending on what binning
is:
- If
binning isa BinnedWeightedResampling{UncertainScalarKDE}
, then values in each bin are represented by a kernel density estimate to the distribution of the resampled values whose resampled indices fall in that bin. - If
binning isa BinnedWeightedResampling{UncertainScalarPopulation}
, then values in each bin are represented by equiprobable populations consisting of the resampled values whose resampled indices fall in the bins.