T - Sub type of HoodieRecordPayloadI - Type of inputsK - Type of keysO - Type of outputspublic abstract class HoodieTable<T extends HoodieRecordPayload,I,K,O> extends Object implements Serializable
| Modifier and Type | Field and Description |
|---|---|
protected HoodieWriteConfig |
config |
protected HoodieEngineContext |
context |
protected HoodieIndex<?,?> |
index |
protected HoodieTableMetaClient |
metaClient |
protected TaskContextSupplier |
taskContextSupplier |
| Modifier | Constructor and Description |
|---|---|
protected |
HoodieTable(HoodieWriteConfig config,
HoodieEngineContext context,
HoodieTableMetaClient metaClient) |
| Modifier and Type | Method and Description |
|---|---|
abstract HoodieBootstrapWriteMetadata<O> |
bootstrap(HoodieEngineContext context,
Option<Map<String,String>> extraMetadata)
Perform metadata/full bootstrap of a Hudi table.
|
abstract HoodieWriteMetadata<O> |
bulkInsert(HoodieEngineContext context,
String instantTime,
I records,
Option<BulkInsertPartitioner> bulkInsertPartitioner)
Bulk Insert a batch of new records into Hoodie table at the supplied instantTime.
|
abstract HoodieWriteMetadata<O> |
bulkInsertPrepped(HoodieEngineContext context,
String instantTime,
I preppedRecords,
Option<BulkInsertPartitioner> bulkInsertPartitioner)
Bulk Insert the given prepared records into the Hoodie table, at the supplied instantTime.
|
abstract HoodieCleanMetadata |
clean(HoodieEngineContext context,
String cleanInstantTime,
boolean skipLocking)
Executes a new clean action.
|
abstract HoodieWriteMetadata<O> |
cluster(HoodieEngineContext context,
String clusteringInstantTime)
Execute Clustering on the table.
|
abstract HoodieWriteMetadata<O> |
compact(HoodieEngineContext context,
String compactionInstantTime)
Run Compaction on the table.
|
abstract HoodieWriteMetadata<O> |
delete(HoodieEngineContext context,
String instantTime,
K keys)
|
void |
deleteMetadataIndexIfNecessary()
Deletes the metadata partition if the writer disables any metadata index.
|
abstract HoodieWriteMetadata |
deletePartitions(HoodieEngineContext context,
String instantTime,
List<String> partitions)
Deletes all data of partitions.
|
void |
finalizeWrite(HoodieEngineContext context,
String instantTs,
List<HoodieWriteStat> stats)
Finalize the written data onto storage.
|
HoodieActiveTimeline |
getActiveTimeline() |
String |
getBaseFileExtension() |
HoodieFileFormat |
getBaseFileFormat() |
TableFileSystemView.BaseFileOnlyView |
getBaseFileOnlyView()
Get the base file only view of the file system for this table.
|
HoodieTimeline |
getCleanTimeline()
Get clean timeline.
|
HoodieTimeline |
getCompletedCleanTimeline()
Get only the completed (no-inflights) clean timeline.
|
HoodieTimeline |
getCompletedCommitsTimeline()
Get only the completed (no-inflights) commit + deltacommit timeline.
|
HoodieTimeline |
getCompletedCommitTimeline()
Get only the completed (no-inflights) commit timeline.
|
HoodieTimeline |
getCompletedSavepointTimeline()
Get only the completed (no-inflights) savepoint timeline.
|
HoodieWriteConfig |
getConfig() |
static ConsistencyGuard |
getConsistencyGuard(org.apache.hadoop.fs.FileSystem fs,
ConsistencyGuardConfig consistencyGuardConfig)
Instantiate
ConsistencyGuard based on configs. |
HoodieEngineContext |
getContext() |
TableFileSystemView |
getFileSystemView()
Get the view of the file system for this table.
|
org.apache.hadoop.conf.Configuration |
getHadoopConf() |
SyncableFileSystemView |
getHoodieView()
Get complete view of the file system for this table with ability to force sync.
|
HoodieIndex<?,?> |
getIndex()
Return the index.
|
protected abstract HoodieIndex<?,?> |
getIndex(HoodieWriteConfig config,
HoodieEngineContext context) |
protected Set<String> |
getInvalidDataPaths(WriteMarkers markers)
Returns the possible invalid data file name with given marker files.
|
HoodieFileFormat |
getLogFileFormat() |
HoodieTableMetaClient |
getMetaClient() |
HoodieTableMetadata |
getMetadata() |
HoodieTableMetadata |
getMetadataTable() |
Option<HoodieTableMetadataWriter> |
getMetadataWriter(String triggeringInstantTimestamp)
Get Table metadata writer.
|
<R extends org.apache.avro.specific.SpecificRecordBase> |
getMetadataWriter(String triggeringInstantTimestamp,
Option<R> actionMetadata)
Get Table metadata writer.
|
Option<HoodieFileFormat> |
getPartitionMetafileFormat() |
HoodieTimeline |
getPendingCommitTimeline()
Get only the inflights (no-completed) commit timeline.
|
Runnable |
getPreExecuteRunnable() |
HoodieTimeline |
getRestoreTimeline()
Get restore timeline.
|
HoodieTimeline |
getRollbackTimeline()
Get rollback timeline.
|
List<String> |
getSavepoints()
Get the list of savepoints in this table.
|
TableFileSystemView.SliceView |
getSliceView()
Get the full view of the file system for this table.
|
HoodieStorageLayout |
getStorageLayout() |
protected HoodieStorageLayout |
getStorageLayout(HoodieWriteConfig config) |
TaskContextSupplier |
getTaskContextSupplier() |
abstract Option<HoodieIndexCommitMetadata> |
index(HoodieEngineContext context,
String indexInstantTime)
Execute requested index action.
|
abstract HoodieWriteMetadata<O> |
insert(HoodieEngineContext context,
String instantTime,
I records)
Insert a batch of new records into Hoodie table at the supplied instantTime.
|
abstract HoodieWriteMetadata<O> |
insertOverwrite(HoodieEngineContext context,
String instantTime,
I records)
Replaces all the existing records and inserts the specified new records into Hoodie table at the supplied instantTime,
for the partition paths contained in input records.
|
abstract HoodieWriteMetadata<O> |
insertOverwriteTable(HoodieEngineContext context,
String instantTime,
I records)
Delete all the existing records of the Hoodie table and inserts the specified new records into Hoodie table at the supplied instantTime,
for the partition paths contained in input records.
|
abstract HoodieWriteMetadata<O> |
insertPrepped(HoodieEngineContext context,
String instantTime,
I preppedRecords)
Inserts the given prepared records into the Hoodie table, at the supplied instantTime.
|
abstract boolean |
isTableServiceAction(String actionType)
Check if action type is a table service.
|
void |
maybeDeleteMetadataTable()
Deletes the metadata table if the writer disables metadata table with hoodie.metadata.enable=false
|
protected void |
reconcileAgainstMarkers(HoodieEngineContext context,
String instantTs,
List<HoodieWriteStat> stats,
boolean consistencyCheckEnabled)
Reconciles WriteStats and marker files to detect and safely delete duplicate data files created because of Spark
retries.
|
boolean |
requireSortedRecords() |
abstract HoodieRestoreMetadata |
restore(HoodieEngineContext context,
String restoreInstantTime,
String instantToRestore)
Restore the table to the given instant.
|
abstract HoodieRollbackMetadata |
rollback(HoodieEngineContext context,
String rollbackInstantTime,
HoodieInstant commitInstant,
boolean deleteInstants,
boolean skipLocking)
Rollback the (inflight/committed) record changes with the given commit time.
|
abstract void |
rollbackBootstrap(HoodieEngineContext context,
String instantTime)
Perform rollback of bootstrap of a Hudi table.
|
void |
rollbackInflightCompaction(HoodieInstant inflightInstant) |
void |
rollbackInflightCompaction(HoodieInstant inflightInstant,
Function<String,Option<HoodiePendingRollbackInfo>> getPendingRollbackInstantFunc)
Rollback failed compactions.
|
abstract HoodieSavepointMetadata |
savepoint(HoodieEngineContext context,
String instantToSavepoint,
String user,
String comment)
Create a savepoint at the specified instant, so that the table can be restored
to this point-in-timeline later if needed.
|
abstract Option<HoodieCleanerPlan> |
scheduleCleaning(HoodieEngineContext context,
String instantTime,
Option<Map<String,String>> extraMetadata)
Schedule cleaning for the instant time.
|
abstract Option<HoodieClusteringPlan> |
scheduleClustering(HoodieEngineContext context,
String instantTime,
Option<Map<String,String>> extraMetadata)
Schedule clustering for the instant time.
|
abstract Option<HoodieCompactionPlan> |
scheduleCompaction(HoodieEngineContext context,
String instantTime,
Option<Map<String,String>> extraMetadata)
Schedule compaction for the instant time.
|
abstract Option<HoodieIndexPlan> |
scheduleIndexing(HoodieEngineContext context,
String indexInstantTime,
List<MetadataPartitionType> partitionsToIndex)
Schedules Indexing for the table to the given instant.
|
abstract Option<HoodieRestorePlan> |
scheduleRestore(HoodieEngineContext context,
String restoreInstantTime,
String instantToRestore)
Schedules Restore for the table to the given instant.
|
abstract Option<HoodieRollbackPlan> |
scheduleRollback(HoodieEngineContext context,
String instantTime,
HoodieInstant instantToRollback,
boolean skipTimelinePublish,
boolean shouldRollbackUsingMarkers)
Schedule rollback for the instant time.
|
abstract HoodieWriteMetadata<O> |
upsert(HoodieEngineContext context,
String instantTime,
I records)
Upsert a batch of new records into Hoodie table at the supplied instantTime.
|
abstract HoodieWriteMetadata<O> |
upsertPrepped(HoodieEngineContext context,
String instantTime,
I preppedRecords)
Upserts the given prepared records into the Hoodie table, at the supplied instantTime.
|
void |
validateInsertSchema() |
void |
validateUpsertSchema() |
protected final HoodieWriteConfig config
protected final HoodieTableMetaClient metaClient
protected final HoodieIndex<?,?> index
protected final TaskContextSupplier taskContextSupplier
protected final transient HoodieEngineContext context
protected HoodieTable(HoodieWriteConfig config, HoodieEngineContext context, HoodieTableMetaClient metaClient)
protected abstract HoodieIndex<?,?> getIndex(HoodieWriteConfig config, HoodieEngineContext context)
protected HoodieStorageLayout getStorageLayout(HoodieWriteConfig config)
public HoodieTableMetadata getMetadata()
public abstract HoodieWriteMetadata<O> upsert(HoodieEngineContext context, String instantTime, I records)
context - HoodieEngineContextinstantTime - Instant Time for the actionrecords - hoodieRecords to upsertpublic abstract HoodieWriteMetadata<O> insert(HoodieEngineContext context, String instantTime, I records)
context - HoodieEngineContextinstantTime - Instant Time for the actionrecords - hoodieRecords to upsertpublic abstract HoodieWriteMetadata<O> bulkInsert(HoodieEngineContext context, String instantTime, I records, Option<BulkInsertPartitioner> bulkInsertPartitioner)
context - HoodieEngineContextinstantTime - Instant Time for the actionrecords - hoodieRecords to upsertbulkInsertPartitioner - User Defined Partitionerpublic abstract HoodieWriteMetadata<O> delete(HoodieEngineContext context, String instantTime, K keys)
public abstract HoodieWriteMetadata deletePartitions(HoodieEngineContext context, String instantTime, List<String> partitions)
context - HoodieEngineContextinstantTime - Instant Time for the actionpartitions - List of partition to be deletedpublic abstract HoodieWriteMetadata<O> upsertPrepped(HoodieEngineContext context, String instantTime, I preppedRecords)
This implementation requires that the input records are already tagged, and de-duped if needed.
context - HoodieEngineContextinstantTime - Instant Time for the actionpreppedRecords - hoodieRecords to upsertpublic abstract HoodieWriteMetadata<O> insertPrepped(HoodieEngineContext context, String instantTime, I preppedRecords)
This implementation requires that the input records are already tagged, and de-duped if needed.
context - HoodieEngineContextinstantTime - Instant Time for the actionpreppedRecords - hoodieRecords to upsertpublic abstract HoodieWriteMetadata<O> bulkInsertPrepped(HoodieEngineContext context, String instantTime, I preppedRecords, Option<BulkInsertPartitioner> bulkInsertPartitioner)
This implementation requires that the input records are already tagged, and de-duped if needed.
context - HoodieEngineContextinstantTime - Instant Time for the actionpreppedRecords - hoodieRecords to upsertbulkInsertPartitioner - User Defined Partitionerpublic abstract HoodieWriteMetadata<O> insertOverwrite(HoodieEngineContext context, String instantTime, I records)
context - HoodieEngineContextinstantTime - Instant time for the replace actionrecords - input recordspublic abstract HoodieWriteMetadata<O> insertOverwriteTable(HoodieEngineContext context, String instantTime, I records)
context - HoodieEngineContextinstantTime - Instant time for the replace actionrecords - input recordspublic HoodieWriteConfig getConfig()
public HoodieTableMetaClient getMetaClient()
public org.apache.hadoop.conf.Configuration getHadoopConf()
public TableFileSystemView getFileSystemView()
public TableFileSystemView.BaseFileOnlyView getBaseFileOnlyView()
public TableFileSystemView.SliceView getSliceView()
public SyncableFileSystemView getHoodieView()
public HoodieTimeline getCompletedCommitsTimeline()
public HoodieTimeline getCompletedCommitTimeline()
public HoodieTimeline getPendingCommitTimeline()
public HoodieTimeline getCompletedCleanTimeline()
public HoodieTimeline getCleanTimeline()
public HoodieTimeline getRollbackTimeline()
public HoodieTimeline getRestoreTimeline()
public HoodieTimeline getCompletedSavepointTimeline()
public HoodieActiveTimeline getActiveTimeline()
public HoodieIndex<?,?> getIndex()
public HoodieStorageLayout getStorageLayout()
public abstract Option<HoodieCompactionPlan> scheduleCompaction(HoodieEngineContext context, String instantTime, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextinstantTime - Instant Time for scheduling compactionextraMetadata - additional metadata to write into planpublic abstract HoodieWriteMetadata<O> compact(HoodieEngineContext context, String compactionInstantTime)
context - HoodieEngineContextcompactionInstantTime - Instant Timepublic abstract Option<HoodieClusteringPlan> scheduleClustering(HoodieEngineContext context, String instantTime, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextinstantTime - Instant Time for scheduling clusteringextraMetadata - additional metadata to write into planpublic abstract HoodieWriteMetadata<O> cluster(HoodieEngineContext context, String clusteringInstantTime)
context - HoodieEngineContextclusteringInstantTime - Instant Timepublic abstract HoodieBootstrapWriteMetadata<O> bootstrap(HoodieEngineContext context, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextextraMetadata - Additional Metadata for storing in commit file.public abstract void rollbackBootstrap(HoodieEngineContext context, String instantTime)
context - HoodieEngineContextpublic abstract Option<HoodieCleanerPlan> scheduleCleaning(HoodieEngineContext context, String instantTime, Option<Map<String,String>> extraMetadata)
context - HoodieEngineContextinstantTime - Instant Time for scheduling cleaningextraMetadata - additional metadata to write into planpublic abstract HoodieCleanMetadata clean(HoodieEngineContext context, String cleanInstantTime, boolean skipLocking)
public abstract Option<HoodieRollbackPlan> scheduleRollback(HoodieEngineContext context, String instantTime, HoodieInstant instantToRollback, boolean skipTimelinePublish, boolean shouldRollbackUsingMarkers)
context - HoodieEngineContextinstantTime - Instant Time for scheduling rollbackinstantToRollback - instant to be rolled backshouldRollbackUsingMarkers - uses marker based rollback strategy when set to true. uses list based rollback when false.public abstract HoodieRollbackMetadata rollback(HoodieEngineContext context, String rollbackInstantTime, HoodieInstant commitInstant, boolean deleteInstants, boolean skipLocking)
Three steps: (1) Atomically unpublish this commit (2) clean indexing data (3) clean new generated parquet files. (4) Finally delete .commit or .inflight file, if deleteInstants = true
public abstract Option<HoodieIndexPlan> scheduleIndexing(HoodieEngineContext context, String indexInstantTime, List<MetadataPartitionType> partitionsToIndex)
context - HoodieEngineContextindexInstantTime - Instant time for scheduling index action.partitionsToIndex - List of MetadataPartitionType that should be indexed.public abstract Option<HoodieIndexCommitMetadata> index(HoodieEngineContext context, String indexInstantTime)
context - HoodieEngineContextindexInstantTime - Instant time for which index action was scheduled.public abstract HoodieSavepointMetadata savepoint(HoodieEngineContext context, String instantToSavepoint, String user, String comment)
public abstract HoodieRestoreMetadata restore(HoodieEngineContext context, String restoreInstantTime, String instantToRestore)
public abstract Option<HoodieRestorePlan> scheduleRestore(HoodieEngineContext context, String restoreInstantTime, String instantToRestore)
public void rollbackInflightCompaction(HoodieInstant inflightInstant)
public void rollbackInflightCompaction(HoodieInstant inflightInstant, Function<String,Option<HoodiePendingRollbackInfo>> getPendingRollbackInstantFunc)
inflightInstant - Inflight Compaction Instantpublic void finalizeWrite(HoodieEngineContext context, String instantTs, List<HoodieWriteStat> stats) throws HoodieIOException
context - HoodieEngineContextstats - List of HoodieWriteStatsHoodieIOException - if some paths can't be finalized on storageprotected Set<String> getInvalidDataPaths(WriteMarkers markers) throws IOException
IOExceptionprotected void reconcileAgainstMarkers(HoodieEngineContext context, String instantTs, List<HoodieWriteStat> stats, boolean consistencyCheckEnabled) throws HoodieIOException
context - HoodieEngineContextinstantTs - Instant Timestampstats - Hoodie Write StatconsistencyCheckEnabled - Consistency Check EnabledHoodieIOExceptionpublic static ConsistencyGuard getConsistencyGuard(org.apache.hadoop.fs.FileSystem fs, ConsistencyGuardConfig consistencyGuardConfig) throws IOException
ConsistencyGuard based on configs.
Default consistencyGuard class is OptimisticConsistencyGuard.
IOExceptionpublic TaskContextSupplier getTaskContextSupplier()
public void validateUpsertSchema()
throws HoodieUpsertException
HoodieUpsertExceptionpublic void validateInsertSchema()
throws HoodieInsertException
HoodieInsertExceptionpublic HoodieFileFormat getBaseFileFormat()
public HoodieFileFormat getLogFileFormat()
public Option<HoodieFileFormat> getPartitionMetafileFormat()
public String getBaseFileExtension()
public boolean requireSortedRecords()
public HoodieEngineContext getContext()
public final Option<HoodieTableMetadataWriter> getMetadataWriter(String triggeringInstantTimestamp)
triggeringInstantTimestamp - - The instant that is triggering this metadata writeHoodieTableMetadataWriterpublic abstract boolean isTableServiceAction(String actionType)
actionType - action type of interest.public <R extends org.apache.avro.specific.SpecificRecordBase> Option<HoodieTableMetadataWriter> getMetadataWriter(String triggeringInstantTimestamp, Option<R> actionMetadata)
Note: Get the metadata writer for the conf. If the metadata table doesn't exist, this wil trigger the creation of the table and the initial bootstrapping. Since this call is under the transaction lock, other concurrent writers are blocked from doing the similar initial metadata table creation and the bootstrapping.
triggeringInstantTimestamp - - The instant that is triggering this metadata writeHoodieTableMetadataWriterpublic void maybeDeleteMetadataTable()
public void deleteMetadataIndexIfNecessary()
public HoodieTableMetadata getMetadataTable()
public Runnable getPreExecuteRunnable()
Copyright © 2022 The Apache Software Foundation. All rights reserved.