K - type of the key in the pairV - type of the value in the pairpublic class HoodieListPairData<K,V> extends HoodieBaseListData<Pair<K,V>> implements HoodiePairData<K,V>
HoodiePairData holding internally a Stream of Pairs.
HoodieListData can have either of the 2 execution semantics:
HoodieJavaPairRDD, and it strives to provide
similar semantic as RDD container -- all intermediate (non-terminal, not de-referencing
the stream like "collect", "groupBy", etc) operations are executed *lazily*.
This allows to make sure that compute/memory churn is minimal since only necessary
computations will ultimately be performed.
Please note, however, that while RDD container allows the same collection to be
de-referenced more than once (ie terminal operation invoked more than once),
HoodieListData allows that only when instantiated w/ an eager execution semantic.data, lazy| Modifier and Type | Method and Description |
|---|---|
List<Pair<K,V>> |
collectAsList()
Collects results of the underlying collection into a
List
This is a terminal operation |
long |
count()
Returns number of held pairs
|
Map<K,Long> |
countByKey()
Counts the number of pairs grouping them by key
|
int |
deduceNumPartitions() |
static <K,V> HoodieListPairData<K,V> |
eager(List<Pair<K,V>> data) |
static <K,V> HoodieListPairData<K,V> |
eager(Map<K,List<V>> data) |
<W> HoodiePairData<K,W> |
flatMapValues(SerializableFunction<V,Iterator<W>> func) |
List<Pair<K,V>> |
get() |
HoodiePairData<K,Iterable<V>> |
groupByKey()
Groups the values for each key in the dataset into a single sequence
|
HoodieData<K> |
keys()
Returns a
HoodieData holding the key from every corresponding pair |
static <K,V> HoodieListPairData<K,V> |
lazy(List<Pair<K,V>> data) |
static <K,V> HoodieListPairData<K,V> |
lazy(Map<K,List<V>> data) |
<W> HoodiePairData<K,Pair<V,Option<W>>> |
leftOuterJoin(HoodiePairData<K,W> other)
Performs a left outer join of this dataset against
other. |
<O> HoodieData<O> |
map(SerializableFunction<Pair<K,V>,O> func)
Maps key-value pairs of this
HoodiePairData container leveraging provided mapper
NOTE: That this returns HoodieData and not HoodiePairData |
<L,W> HoodiePairData<L,W> |
mapToPair(SerializablePairFunction<Pair<K,V>,L,W> mapToPairFunc) |
<W> HoodiePairData<K,W> |
mapValues(SerializableFunction<V,W> func)
Maps values of this
HoodiePairData container leveraging provided mapper |
void |
persist(String cacheConfig)
Persists the data (if applicable)
|
HoodiePairData<K,V> |
reduceByKey(SerializableBiFunction<V,V,V> combiner,
int parallelism)
Reduces original sequence by de-duplicating the pairs w/ the same key, using provided
binary operator
combiner. |
void |
unpersist()
Un-persists the data (if applicable)
|
HoodieData<V> |
values()
Returns a
HoodieData holding the value from every corresponding pair |
asStream, isEmptypublic List<Pair<K,V>> get()
get in interface HoodiePairData<K,V>public void persist(String cacheConfig)
HoodiePairDatapersist in interface HoodiePairData<K,V>cacheConfig - config value for caching.public void unpersist()
HoodiePairDataunpersist in interface HoodiePairData<K,V>public HoodieData<K> keys()
HoodiePairDataHoodieData holding the key from every corresponding pairkeys in interface HoodiePairData<K,V>public HoodieData<V> values()
HoodiePairDataHoodieData holding the value from every corresponding pairvalues in interface HoodiePairData<K,V>public Map<K,Long> countByKey()
HoodiePairDatacountByKey in interface HoodiePairData<K,V>public HoodiePairData<K,Iterable<V>> groupByKey()
HoodiePairDatagroupByKey in interface HoodiePairData<K,V>public HoodiePairData<K,V> reduceByKey(SerializableBiFunction<V,V,V> combiner, int parallelism)
HoodiePairDatacombiner. Returns an instance of HoodiePairData holding
the "de-duplicated" pairs, ie only pairs with unique keys.reduceByKey in interface HoodiePairData<K,V>combiner - method to combine values of the pairs with the same keyparallelism - target parallelism (if applicable)public <O> HoodieData<O> map(SerializableFunction<Pair<K,V>,O> func)
HoodiePairDataHoodiePairData container leveraging provided mapper
NOTE: That this returns HoodieData and not HoodiePairDatamap in interface HoodiePairData<K,V>public <W> HoodiePairData<K,W> mapValues(SerializableFunction<V,W> func)
HoodiePairDataHoodiePairData container leveraging provided mappermapValues in interface HoodiePairData<K,V>public <W> HoodiePairData<K,W> flatMapValues(SerializableFunction<V,Iterator<W>> func)
public <L,W> HoodiePairData<L,W> mapToPair(SerializablePairFunction<Pair<K,V>,L,W> mapToPairFunc)
mapToPair in interface HoodiePairData<K,V>L - new key type.W - new value type.mapToPairFunc - serializable map function to generate another pair.public <W> HoodiePairData<K,Pair<V,Option<W>>> leftOuterJoin(HoodiePairData<K,W> other)
HoodiePairDataother.
For each element (k, v) in this, the resulting HoodiePairData will either contain all
pairs (k, (v, Some(w))) for every w in the other, or the pair (k, (v, None))
if no elements in other have the pair w/ a key kleftOuterJoin in interface HoodiePairData<K,V>W - value type of the other HoodiePairDataother - the other HoodiePairDatapublic long count()
HoodiePairDatacount in interface HoodiePairData<K,V>count in class HoodieBaseListData<Pair<K,V>>public List<Pair<K,V>> collectAsList()
HoodiePairDataList>
This is a terminal operationcollectAsList in interface HoodiePairData<K,V>collectAsList in class HoodieBaseListData<Pair<K,V>>public int deduceNumPartitions()
deduceNumPartitions in interface HoodiePairData<K,V>public static <K,V> HoodieListPairData<K,V> lazy(List<Pair<K,V>> data)
public static <K,V> HoodieListPairData<K,V> eager(List<Pair<K,V>> data)
public static <K,V> HoodieListPairData<K,V> lazy(Map<K,List<V>> data)
public static <K,V> HoodieListPairData<K,V> eager(Map<K,List<V>> data)
Copyright © 2024 The Apache Software Foundation. All rights reserved.