T - Sub type of HoodieRecordPayloadI - Type of inputsK - Type of keysO - Type of outputspublic abstract class HoodieTable<T extends HoodieRecordPayload,I,K,O> extends Object implements Serializable
| Modifier and Type | Field and Description |
|---|---|
protected HoodieWriteConfig |
config |
protected HoodieEngineContext |
context |
protected HoodieIndex<T,?,?,?> |
index |
protected HoodieTableMetaClient |
metaClient |
protected TaskContextSupplier |
taskContextSupplier |
| Modifier | Constructor and Description |
|---|---|
protected |
HoodieTable(HoodieWriteConfig config,
HoodieEngineContext context,
HoodieTableMetaClient metaClient) |
| Modifier and Type | Method and Description |
|---|---|
abstract HoodieBootstrapWriteMetadata<O> |
bootstrap(HoodieEngineContext context,
Option<Map<String,String>> extraMetadata)
Perform metadata/full bootstrap of a Hudi table.
|
abstract HoodieWriteMetadata<O> |
bulkInsert(HoodieEngineContext context,
String instantTime,
I records,
Option<BulkInsertPartitioner<I>> bulkInsertPartitioner)
Bulk Insert a batch of new records into Hoodie table at the supplied instantTime.
|
abstract HoodieWriteMetadata<O> |
bulkInsertPrepped(HoodieEngineContext context,
String instantTime,
I preppedRecords,
Option<BulkInsertPartitioner<I>> bulkInsertPartitioner)
Bulk Insert the given prepared records into the Hoodie table, at the supplied instantTime.
|
abstract HoodieCleanMetadata |
clean(HoodieEngineContext context,
String cleanInstantTime,
boolean skipLocking)
Executes a new clean action.
|
abstract HoodieWriteMetadata<O> |
cluster(HoodieEngineContext context,
String clusteringInstantTime)
Execute Clustering on the table.
|
abstract HoodieWriteMetadata<O> |
compact(HoodieEngineContext context,
String compactionInstantTime)
Run Compaction on the table.
|
abstract HoodieWriteMetadata<O> |
delete(HoodieEngineContext context,
String instantTime,
K keys)
|
abstract HoodieWriteMetadata |
deletePartitions(HoodieEngineContext context,
String instantTime,
List<String> partitions)
Deletes all data of partitions.
|
void |
finalizeWrite(HoodieEngineContext context,
String instantTs,
List<HoodieWriteStat> stats)
Finalize the written data onto storage.
|
HoodieActiveTimeline |
getActiveTimeline() |
String |
getBaseFileExtension() |
HoodieFileFormat |
getBaseFileFormat() |
TableFileSystemView.BaseFileOnlyView |
getBaseFileOnlyView()
Get the base file only view of the file system for this table.
|
HoodieTimeline |
getCleanTimeline()
Get clean timeline.
|
HoodieTimeline |
getCompletedCleanTimeline()
Get only the completed (no-inflights) clean timeline.
|
HoodieTimeline |
getCompletedCommitsTimeline()
Get only the completed (no-inflights) commit + deltacommit timeline.
|
HoodieTimeline |
getCompletedCommitTimeline()
Get only the completed (no-inflights) commit timeline.
|
HoodieTimeline |
getCompletedSavepointTimeline()
Get only the completed (no-inflights) savepoint timeline.
|
HoodieWriteConfig |
getConfig() |
static ConsistencyGuard |
getConsistencyGuard(org.apache.hadoop.fs.FileSystem fs,
ConsistencyGuardConfig consistencyGuardConfig)
Instantiate
ConsistencyGuard based on configs. |
HoodieEngineContext |
getContext() |
TableFileSystemView |
getFileSystemView()
Get the view of the file system for this table.
|
org.apache.hadoop.conf.Configuration |
getHadoopConf() |
SyncableFileSystemView |
getHoodieView()
Get complete view of the file system for this table with ability to force sync.
|
HoodieIndex<T,?,?,?> |
getIndex()
Return the index.
|
protected abstract HoodieIndex<T,?,?,?> |
getIndex(HoodieWriteConfig config,
HoodieEngineContext context) |
protected Set<String> |
getInvalidDataPaths(WriteMarkers markers)
Returns the possible invalid data file name with given marker files.
|
HoodieLogBlock.HoodieLogBlockType |
getLogDataBlockFormat() |
HoodieFileFormat |
getLogFileFormat() |
HoodieTableMetaClient |
getMetaClient() |
Option<HoodieTableMetadataWriter> |
getMetadataWriter(String triggeringInstantTimestamp)
Get Table metadata writer.
|
<T extends org.apache.avro.specific.SpecificRecordBase> |
getMetadataWriter(String triggeringInstantTimestamp,
Option<T> actionMetadata)
Get Table metadata writer.
|
HoodieTimeline |
getPendingCommitTimeline()
Get only the inflights (no-completed) commit timeline.
|
HoodieTimeline |
getRollbackTimeline()
Get rollback timeline.
|
List<String> |
getSavepoints()
Get the list of savepoints in this table.
|
TableFileSystemView.SliceView |
getSliceView()
Get the full view of the file system for this table.
|
TaskContextSupplier |
getTaskContextSupplier() |
abstract HoodieWriteMetadata<O> |
insert(HoodieEngineContext context,
String instantTime,
I records)
Insert a batch of new records into Hoodie table at the supplied instantTime.
|
abstract HoodieWriteMetadata<O> |
insertOverwrite(HoodieEngineContext context,
String instantTime,
I records)
Replaces all the existing records and inserts the specified new records into Hoodie table at the supplied instantTime,
for the partition paths contained in input records.
|
abstract HoodieWriteMetadata<O> |
insertOverwriteTable(HoodieEngineContext context,
String instantTime,
I records)
Delete all the existing records of the Hoodie table and inserts the specified new records into Hoodie table at the supplied instantTime,
for the partition paths contained in input records.
|
abstract HoodieWriteMetadata<O> |
insertPrepped(HoodieEngineContext context,
String instantTime,
I preppedRecords)
Inserts the given prepared records into the Hoodie table, at the supplied instantTime.
|
abstract boolean |
isTableServiceAction(String actionType)
Check if action type is a table service.
|
protected void |
reconcileAgainstMarkers(HoodieEngineContext context,
String instantTs,
List<HoodieWriteStat> stats,
boolean consistencyCheckEnabled)
Reconciles WriteStats and marker files to detect and safely delete duplicate data files created because of Spark
retries.
|
boolean |
requireSortedRecords() |
abstract HoodieRestoreMetadata |
restore(HoodieEngineContext context,
String restoreInstantTime,
String instantToRestore)
Restore the table to the given instant.
|
abstract HoodieRollbackMetadata |
rollback(HoodieEngineContext context,
String rollbackInstantTime,
HoodieInstant commitInstant,
boolean deleteInstants,
boolean skipLocking)
Rollback the (inflight/committed) record changes with the given commit time.
|
abstract void |
rollbackBootstrap(HoodieEngineContext context,
String instantTime)
Perform rollback of bootstrap of a Hudi table.
|
void |
rollbackInflightCompaction(HoodieInstant inflightInstant)
Rollback failed compactions.
|
abstract HoodieSavepointMetadata |
savepoint(HoodieEngineContext context,
String instantToSavepoint,
String user,
String comment)
Create a savepoint at the specified instant, so that the table can be restored
to this point-in-timeline later if needed.
|
abstract Option<HoodieCleanerPlan> |
scheduleCleaning(HoodieEngineContext context,
String instantTime,
Option<Map<String,String>> extraMetadata)
Schedule cleaning for the instant time.
|
abstract Option<HoodieClusteringPlan> |
scheduleClustering(HoodieEngineContext context,
String instantTime,
Option<Map<String,String>> extraMetadata)
Schedule clustering for the instant time.
|
abstract Option<HoodieCompactionPlan> |
scheduleCompaction(HoodieEngineContext context,
String instantTime,
Option<Map<String,String>> extraMetadata)
Schedule compaction for the instant time.
|
abstract Option<HoodieRollbackPlan> |
scheduleRollback(HoodieEngineContext context,
String instantTime,
HoodieInstant instantToRollback,
boolean skipTimelinePublish,
boolean shouldRollbackUsingMarkers)
Schedule rollback for the instant time.
|
abstract void |
updateMetadataIndexes(HoodieEngineContext context,
List<HoodieWriteStat> stats,
String instantTime)
Updates Metadata Indexes (like Z-Index)
TODO rebase onto metadata table (post RFC-27)
|
abstract HoodieWriteMetadata<O> |
upsert(HoodieEngineContext context,
String instantTime,
I records)
Upsert a batch of new records into Hoodie table at the supplied instantTime.
|
abstract HoodieWriteMetadata<O> |
upsertPrepped(HoodieEngineContext context,
String instantTime,
I preppedRecords)
Upserts the given prepared records into the Hoodie table, at the supplied instantTime.
|
void |
validateInsertSchema() |
void |
validateUpsertSchema() |
protected final HoodieWriteConfig config
protected final HoodieTableMetaClient metaClient
protected final HoodieIndex<T extends HoodieRecordPayload,?,?,?> index
protected final TaskContextSupplier taskContextSupplier
protected final transient HoodieEngineContext context
protected HoodieTable(HoodieWriteConfig config, HoodieEngineContext context, HoodieTableMetaClient metaClient)
protected abstract HoodieIndex<T,?,?,?> getIndex(HoodieWriteConfig config, HoodieEngineContext context)
public abstract HoodieWriteMetadata<O> upsert(HoodieEngineContext context, String instantTime, I records)
context - HoodieEngineContextinstantTime - Instant Time for the actionrecords - hoodieRecords to upsertpublic abstract HoodieWriteMetadata<O> insert(HoodieEngineContext context, String instantTime, I records)
context - HoodieEngineContextinstantTime - Instant Time for the actionrecords - hoodieRecords to upsertpublic abstract HoodieWriteMetadata<O> bulkInsert(HoodieEngineContext context, String instantTime, I records, Option<BulkInsertPartitioner<I>> bulkInsertPartitioner)
context - HoodieEngineContextinstantTime - Instant Time for the actionrecords - hoodieRecords to upsertbulkInsertPartitioner - User Defined Partitionerpublic abstract HoodieWriteMetadata<O> delete(HoodieEngineContext context, String instantTime, K keys)
public abstract HoodieWriteMetadata deletePartitions(HoodieEngineContext context, String instantTime, List<String> partitions)
context - HoodieEngineContextinstantTime - Instant Time for the actionpartitions - List of partition to be deletedpublic abstract HoodieWriteMetadata<O> upsertPrepped(HoodieEngineContext context, String instantTime, I preppedRecords)
This implementation requires that the input records are already tagged, and de-duped if needed.
context - HoodieEngineContextinstantTime - Instant Time for the actionpreppedRecords - hoodieRecords to upsertpublic abstract HoodieWriteMetadata<O> insertPrepped(HoodieEngineContext context, String instantTime, I preppedRecords)
This implementation requires that the input records are already tagged, and de-duped if needed.
context - HoodieEngineContextinstantTime - Instant Time for the actionpreppedRecords - hoodieRecords to upsertpublic abstract HoodieWriteMetadata<O> bulkInsertPrepped(HoodieEngineContext context, String instantTime, I preppedRecords, Option<BulkInsertPartitioner<I>> bulkInsertPartitioner)
This implementation requires that the input records are already tagged, and de-duped if needed.
context - HoodieEngineContextinstantTime - Instant Time for the actionpreppedRecords - hoodieRecords to upsertbulkInsertPartitioner - User Defined Partitionerpublic abstract HoodieWriteMetadata<O> insertOverwrite(HoodieEngineContext context, String instantTime, I records)
context - HoodieEngineContextinstantTime - Instant time for the replace actionrecords - input recordspublic abstract HoodieWriteMetadata<O> insertOverwriteTable(HoodieEngineContext context, String instantTime, I records)
context - HoodieEngineContextinstantTime - Instant time for the replace actionrecords - input recordspublic abstract void updateMetadataIndexes(@Nonnull HoodieEngineContext context, @Nonnull List<HoodieWriteStat> stats, @Nonnull String instantTime) throws Exception
context - instance of HoodieEngineContextinstantTime - instant of the carried operation triggering the updateExceptionpublic HoodieWriteConfig getConfig()
public HoodieTableMetaClient getMetaClient()
public org.apache.hadoop.conf.Configuration getHadoopConf()
public TableFileSystemView getFileSystemView()
public TableFileSystemView.BaseFileOnlyView getBaseFileOnlyView()
public TableFileSystemView.SliceView getSliceView()
public SyncableFileSystemView getHoodieView()
public HoodieTimeline getCompletedCommitsTimeline()
public HoodieTimeline getCompletedCommitTimeline()
public HoodieTimeline getPendingCommitTimeline()
public HoodieTimeline getCompletedCleanTimeline()
public HoodieTimeline getCleanTimeline()
public HoodieTimeline getRollbackTimeline()
public HoodieTimeline getCompletedSavepointTimeline()
public HoodieActiveTimeline getActiveTimeline()
public HoodieIndex<T,?,?,?> getIndex()
public abstract Option<HoodieCompactionPlan> scheduleCompaction(HoodieEngineContext context, String instantTime, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextinstantTime - Instant Time for scheduling compactionextraMetadata - additional metadata to write into planpublic abstract HoodieWriteMetadata<O> compact(HoodieEngineContext context, String compactionInstantTime)
context - HoodieEngineContextcompactionInstantTime - Instant Timepublic abstract Option<HoodieClusteringPlan> scheduleClustering(HoodieEngineContext context, String instantTime, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextinstantTime - Instant Time for scheduling clusteringextraMetadata - additional metadata to write into planpublic abstract HoodieWriteMetadata<O> cluster(HoodieEngineContext context, String clusteringInstantTime)
context - HoodieEngineContextclusteringInstantTime - Instant Timepublic abstract HoodieBootstrapWriteMetadata<O> bootstrap(HoodieEngineContext context, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextextraMetadata - Additional Metadata for storing in commit file.public abstract void rollbackBootstrap(HoodieEngineContext context, String instantTime)
context - HoodieEngineContextpublic abstract Option<HoodieCleanerPlan> scheduleCleaning(HoodieEngineContext context, String instantTime, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextinstantTime - Instant Time for scheduling cleaningextraMetadata - additional metadata to write into planpublic abstract HoodieCleanMetadata clean(HoodieEngineContext context, String cleanInstantTime, boolean skipLocking)
public abstract Option<HoodieRollbackPlan> scheduleRollback(HoodieEngineContext context, String instantTime, HoodieInstant instantToRollback, boolean skipTimelinePublish, boolean shouldRollbackUsingMarkers)
context - HoodieEngineContextinstantTime - Instant Time for scheduling rollbackinstantToRollback - instant to be rolled backshouldRollbackUsingMarkers - uses marker based rollback strategy when set to true. uses list based rollback when false.public abstract HoodieRollbackMetadata rollback(HoodieEngineContext context, String rollbackInstantTime, HoodieInstant commitInstant, boolean deleteInstants, boolean skipLocking)
Three steps: (1) Atomically unpublish this commit (2) clean indexing data (3) clean new generated parquet files. (4) Finally delete .commit or .inflight file, if deleteInstants = true
public abstract HoodieSavepointMetadata savepoint(HoodieEngineContext context, String instantToSavepoint, String user, String comment)
public abstract HoodieRestoreMetadata restore(HoodieEngineContext context, String restoreInstantTime, String instantToRestore)
public void rollbackInflightCompaction(HoodieInstant inflightInstant)
inflightInstant - Inflight Compaction Instantpublic void finalizeWrite(HoodieEngineContext context, String instantTs, List<HoodieWriteStat> stats) throws HoodieIOException
context - HoodieEngineContextstats - List of HoodieWriteStatsHoodieIOException - if some paths can't be finalized on storageprotected Set<String> getInvalidDataPaths(WriteMarkers markers) throws IOException
IOExceptionprotected void reconcileAgainstMarkers(HoodieEngineContext context, String instantTs, List<HoodieWriteStat> stats, boolean consistencyCheckEnabled) throws HoodieIOException
context - HoodieEngineContextinstantTs - Instant Timestampstats - Hoodie Write StatconsistencyCheckEnabled - Consistency Check EnabledHoodieIOExceptionpublic static ConsistencyGuard getConsistencyGuard(org.apache.hadoop.fs.FileSystem fs, ConsistencyGuardConfig consistencyGuardConfig) throws IOException
ConsistencyGuard based on configs.
Default consistencyGuard class is OptimisticConsistencyGuard.
IOExceptionpublic TaskContextSupplier getTaskContextSupplier()
public void validateUpsertSchema()
throws HoodieUpsertException
HoodieUpsertExceptionpublic void validateInsertSchema()
throws HoodieInsertException
HoodieInsertExceptionpublic HoodieFileFormat getBaseFileFormat()
public HoodieFileFormat getLogFileFormat()
public HoodieLogBlock.HoodieLogBlockType getLogDataBlockFormat()
public String getBaseFileExtension()
public boolean requireSortedRecords()
public HoodieEngineContext getContext()
public final Option<HoodieTableMetadataWriter> getMetadataWriter(String triggeringInstantTimestamp)
triggeringInstantTimestamp - - The instant that is triggering this metadata writepublic abstract boolean isTableServiceAction(String actionType)
actionType - action type of interest.public <T extends org.apache.avro.specific.SpecificRecordBase> Option<HoodieTableMetadataWriter> getMetadataWriter(String triggeringInstantTimestamp, Option<T> actionMetadata)
Note: Get the metadata writer for the conf. If the metadata table doesn't exist, this wil trigger the creation of the table and the initial bootstrapping. Since this call is under the transaction lock, other concurrent writers are blocked from doing the similar initial metadata table creation and the bootstrapping.
triggeringInstantTimestamp - - The instant that is triggering this metadata writeHoodieTableMetadataWriterCopyright © 2022 The Apache Software Foundation. All rights reserved.