K - type of key.V - type of value.public interface HoodiePairData<K,V> extends Serializable
| Modifier and Type | Method and Description |
|---|---|
List<Pair<K,V>> |
collectAsList()
Collects results of the underlying collection into a
List
This is a terminal operation |
long |
count()
Returns number of held pairs
|
Map<K,Long> |
countByKey()
Counts the number of pairs grouping them by key
|
int |
deduceNumPartitions() |
Object |
get() |
HoodiePairData<K,Iterable<V>> |
groupByKey()
Groups the values for each key in the dataset into a single sequence
|
HoodieData<K> |
keys()
Returns a
HoodieData holding the key from every corresponding pair |
<W> HoodiePairData<K,Pair<V,Option<W>>> |
leftOuterJoin(HoodiePairData<K,W> other)
Performs a left outer join of this dataset against
other. |
<O> HoodieData<O> |
map(SerializableFunction<Pair<K,V>,O> func)
Maps key-value pairs of this
HoodiePairData container leveraging provided mapper
NOTE: That this returns HoodieData and not HoodiePairData |
<L,W> HoodiePairData<L,W> |
mapToPair(SerializablePairFunction<Pair<K,V>,L,W> mapToPairFunc) |
<W> HoodiePairData<K,W> |
mapValues(SerializableFunction<V,W> func)
Maps values of this
HoodiePairData container leveraging provided mapper |
void |
persist(String cacheConfig)
Persists the data (if applicable)
|
HoodiePairData<K,V> |
reduceByKey(SerializableBiFunction<V,V,V> combiner,
int parallelism)
Reduces original sequence by de-duplicating the pairs w/ the same key, using provided
binary operator
combiner. |
void |
unpersist()
Un-persists the data (if applicable)
|
HoodieData<V> |
values()
Returns a
HoodieData holding the value from every corresponding pair |
Object get()
void persist(String cacheConfig)
cacheConfig - config value for caching.void unpersist()
HoodieData<K> keys()
HoodieData holding the key from every corresponding pairHoodieData<V> values()
HoodieData holding the value from every corresponding pairlong count()
HoodiePairData<K,Iterable<V>> groupByKey()
HoodiePairData<K,V> reduceByKey(SerializableBiFunction<V,V,V> combiner, int parallelism)
combiner. Returns an instance of HoodiePairData holding
the "de-duplicated" pairs, ie only pairs with unique keys.combiner - method to combine values of the pairs with the same keyparallelism - target parallelism (if applicable)<O> HoodieData<O> map(SerializableFunction<Pair<K,V>,O> func)
HoodiePairData container leveraging provided mapper
NOTE: That this returns HoodieData and not HoodiePairData<W> HoodiePairData<K,W> mapValues(SerializableFunction<V,W> func)
HoodiePairData container leveraging provided mapper<L,W> HoodiePairData<L,W> mapToPair(SerializablePairFunction<Pair<K,V>,L,W> mapToPairFunc)
L - new key type.W - new value type.mapToPairFunc - serializable map function to generate another pair.<W> HoodiePairData<K,Pair<V,Option<W>>> leftOuterJoin(HoodiePairData<K,W> other)
other.
For each element (k, v) in this, the resulting HoodiePairData will either contain all
pairs (k, (v, Some(w))) for every w in the other, or the pair (k, (v, None))
if no elements in other have the pair w/ a key kW - value type of the other HoodiePairDataother - the other HoodiePairDataList<Pair<K,V>> collectAsList()
List>
This is a terminal operationint deduceNumPartitions()
Copyright © 2024 The Apache Software Foundation. All rights reserved.