Groups small token ranges on the same server(s) in order to reduce task scheduling overhead.
Groups small token ranges on the same server(s) in order to reduce task scheduling overhead. Useful mostly with virtual nodes, which may create lots of small token range splits. Each group will make a single Spark task.
(tokenRangeClusterer: StringAdd).self
(tokenRangeClusterer: StringFormat).self
(tokenRangeClusterer: ArrowAssoc[TokenRangeClusterer[V, T]]).x
(Since version 2.10.0) Use leftOfArrow instead
(tokenRangeClusterer: Ensuring[TokenRangeClusterer[V, T]]).x
(Since version 2.10.0) Use resultOfEnsuring instead
Divides a set of token rangesContaining into groups containing not more than
maxRowCountPerGrouprows and not more thanmaxGroupSizetoken rangesContaining. Each group will form a singleCassandraPartition.The algorithm is as follows: 1. Sort token rangesContaining by endpoints lexicographically. 2. Take the highest possible number of token rangesContaining from the beginning of the list, such that their sum of rowCounts does not exceed
maxRowCountPerGroupand they all contain at least one common endpoint. If it is not possible, take at least one item. Those token rangesContaining will make a group. 3. Repeat the previous step until no more token rangesContaining left.