Class SegmentsCostCache

java.lang.Object
org.apache.druid.server.coordinator.balancer.SegmentsCostCache

public class SegmentsCostCache extends Object
SegmentsCostCache provides faster way to calculate cost function proposed in CostBalancerStrategy. See https://github.com/apache/druid/pull/2972 for more details about the cost function. Joint cost for two segments (you can make formulas below readable by copy-pasting to https://www.codecogs.com/latex/eqneditor.php): cost(X, Y) = \int_{x_0}^{x_1} \int_{y_0}^{y_1} e^{-\lambda |x-y|}dxdy or cost(X, Y) = e^{y_0 + y_1} (e^{x_0} - e^{x_1})(e^{y_0} - e^{y_1}) (*) if x_0 <= x_1 <= y_0 <= y_1 (*) lambda coefficient is omitted for simplicity. For a group of segments {S_xi}, i = {0, n} total joint cost with segment S_y could be calculated as: cost(X, Y) = \sum cost(X_i, Y) = e^{y_0 + y_1} (e^{y_0} - e^{y_1}) \sum (e^{xi_0} - e^{xi_1}) if xi_0 <= xi_1 <= y_0 <= y_1 and cost(X, Y) = \sum cost(X_i, Y) = (e^{y_0} - e^{y_1}) \sum e^{xi_0 + xi_1} (e^{xi_0} - e^{xi_1}) if y_0 <= y_1 <= xi_0 <= xi_1 SegmentsCostCache stores pre-computed sums for a group of segments {S_xi}: 1) \sum (e^{xi_0} - e^{xi_1}) -> leftSum 2) \sum e^{xi_0 + xi_1} (e^{xi_0} - e^{xi_1}) -> rightSum so that calculation of joint cost function for segment S_y became a O(1 + m) complexity task, where m is the number of segments in {S_xi} that overlaps S_y. Segments are stored in buckets. Bucket is a subset of segments contained in SegmentsCostCache, so that startTime of all segments inside a bucket are in the same time interval (with some granularity): |------------------------|--------------------------|-----------------------|-------- .... t_0 t_0+D t_0 + 2D t0 + 3D .... S_x1 S_x2 S_x3 S_x4 S_x5 S_x6 S_x7 S_x8 S_x9 bucket1 bucket2 bucket3 Reasons to store segments in Buckets: 1) Cost function tends to 0 as distance between segments' intervals increases; buckets are used to avoid redundant 0 calculations for thousands of times 2) To reduce number of calculations when segment is added or removed from SegmentsCostCache 3) To avoid infinite values during exponents calculations