public class ParquetUtils extends FileFormatUtils
FileFormatUtils.HoodieKeyIterator| Constructor and Description |
|---|
ParquetUtils() |
| Modifier and Type | Method and Description |
|---|---|
ClosableIterator<Pair<HoodieKey,Long>> |
fetchRecordKeysWithPositions(HoodieStorage storage,
StoragePath filePath)
Fetch
HoodieKeys with row positions from the given parquet file. |
ClosableIterator<Pair<HoodieKey,Long>> |
fetchRecordKeysWithPositions(HoodieStorage storage,
StoragePath filePath,
Option<BaseKeyGenerator> keyGeneratorOpt,
Option<String> partitionPath)
Fetch
HoodieKeys with row positions from the given parquet file. |
Set<Pair<String,Long>> |
filterRowKeys(HoodieStorage storage,
StoragePath filePath,
Set<String> filter)
Read the rowKey list matching the given filter, from the given parquet file.
|
static org.apache.parquet.hadoop.metadata.CompressionCodecName |
getCompressionCodecName(String codecName) |
HoodieFileFormat |
getFormat() |
ClosableIterator<HoodieKey> |
getHoodieKeyIterator(HoodieStorage storage,
StoragePath filePath) |
ClosableIterator<HoodieKey> |
getHoodieKeyIterator(HoodieStorage storage,
StoragePath filePath,
Option<BaseKeyGenerator> keyGeneratorOpt,
Option<String> partitionPath)
Returns a closable iterator for reading the given parquet file.
|
long |
getRowCount(HoodieStorage storage,
StoragePath filePath)
Returns the number of records in the parquet file.
|
List<org.apache.avro.generic.GenericRecord> |
readAvroRecords(HoodieStorage storage,
StoragePath filePath)
NOTE: This literally reads the entire file contents, thus should be used with caution.
|
List<org.apache.avro.generic.GenericRecord> |
readAvroRecords(HoodieStorage storage,
StoragePath filePath,
org.apache.avro.Schema schema) |
org.apache.avro.Schema |
readAvroSchema(HoodieStorage storage,
StoragePath filePath) |
List<HoodieColumnRangeMetadata<Comparable>> |
readColumnStatsFromMetadata(HoodieStorage storage,
StoragePath filePath,
List<String> columnList) |
Map<String,String> |
readFooter(HoodieStorage storage,
boolean required,
StoragePath filePath,
String... footerNames) |
static org.apache.parquet.hadoop.metadata.ParquetMetadata |
readMetadata(HoodieStorage storage,
StoragePath parquetFilePath) |
org.apache.parquet.schema.MessageType |
readSchema(HoodieStorage storage,
StoragePath parquetFilePath)
Get the schema of the given parquet file.
|
ByteArrayOutputStream |
serializeRecordsToLogBlock(HoodieStorage storage,
List<HoodieRecord> records,
org.apache.avro.Schema writerSchema,
org.apache.avro.Schema readerSchema,
String keyFieldName,
Map<String,String> paramsMap) |
void |
writeMetaFile(HoodieStorage storage,
StoragePath filePath,
Properties props) |
getColumnRangeInPartition, readBloomFilterFromMetadata, readMinMaxRecordKeys, readRowKeyspublic Set<Pair<String,Long>> filterRowKeys(HoodieStorage storage, StoragePath filePath, Set<String> filter)
filterRowKeys in class FileFormatUtilsstorage - HoodieStorage instance.filePath - The parquet file path.filter - record keys filterpublic static org.apache.parquet.hadoop.metadata.ParquetMetadata readMetadata(HoodieStorage storage, StoragePath parquetFilePath)
public static org.apache.parquet.hadoop.metadata.CompressionCodecName getCompressionCodecName(String codecName)
codecName - codec name in String.CompressionCodecName Enum.public ClosableIterator<Pair<HoodieKey,Long>> fetchRecordKeysWithPositions(HoodieStorage storage, StoragePath filePath)
HoodieKeys with row positions from the given parquet file.fetchRecordKeysWithPositions in class FileFormatUtilsstorage - HoodieStorage instance.filePath - The parquet file path.List of pairs of HoodieKey and row position fetched from the parquet filepublic ClosableIterator<HoodieKey> getHoodieKeyIterator(HoodieStorage storage, StoragePath filePath)
getHoodieKeyIterator in class FileFormatUtilspublic ClosableIterator<HoodieKey> getHoodieKeyIterator(HoodieStorage storage, StoragePath filePath, Option<BaseKeyGenerator> keyGeneratorOpt, Option<String> partitionPath)
getHoodieKeyIterator in class FileFormatUtilsstorage - HoodieStorage instance.filePath - The parquet file pathkeyGeneratorOpt - instance of KeyGeneratorpartitionPath - optional partition path for the file, if provided only the record key is read from the fileClosableIterator of HoodieKeys for reading the parquet filepublic ClosableIterator<Pair<HoodieKey,Long>> fetchRecordKeysWithPositions(HoodieStorage storage, StoragePath filePath, Option<BaseKeyGenerator> keyGeneratorOpt, Option<String> partitionPath)
HoodieKeys with row positions from the given parquet file.fetchRecordKeysWithPositions in class FileFormatUtilsstorage - HoodieStorage instance.filePath - The parquet file path.keyGeneratorOpt - instance of KeyGenerator.partitionPath - optional partition path for the file, if provided only the record key is read from the fileList of pairs of HoodieKey and row position fetched from the parquet filepublic org.apache.parquet.schema.MessageType readSchema(HoodieStorage storage, StoragePath parquetFilePath)
public Map<String,String> readFooter(HoodieStorage storage, boolean required, StoragePath filePath, String... footerNames)
readFooter in class FileFormatUtilspublic org.apache.avro.Schema readAvroSchema(HoodieStorage storage, StoragePath filePath)
readAvroSchema in class FileFormatUtilspublic List<HoodieColumnRangeMetadata<Comparable>> readColumnStatsFromMetadata(HoodieStorage storage, StoragePath filePath, List<String> columnList)
readColumnStatsFromMetadata in class FileFormatUtilspublic HoodieFileFormat getFormat()
getFormat in class FileFormatUtilspublic List<org.apache.avro.generic.GenericRecord> readAvroRecords(HoodieStorage storage, StoragePath filePath)
readAvroRecords in class FileFormatUtilspublic List<org.apache.avro.generic.GenericRecord> readAvroRecords(HoodieStorage storage, StoragePath filePath, org.apache.avro.Schema schema)
readAvroRecords in class FileFormatUtilspublic long getRowCount(HoodieStorage storage, StoragePath filePath)
getRowCount in class FileFormatUtilsstorage - HoodieStorage instance.filePath - path of the filepublic void writeMetaFile(HoodieStorage storage, StoragePath filePath, Properties props) throws IOException
writeMetaFile in class FileFormatUtilsIOExceptionpublic ByteArrayOutputStream serializeRecordsToLogBlock(HoodieStorage storage, List<HoodieRecord> records, org.apache.avro.Schema writerSchema, org.apache.avro.Schema readerSchema, String keyFieldName, Map<String,String> paramsMap) throws IOException
serializeRecordsToLogBlock in class FileFormatUtilsIOExceptionCopyright © 2025 The Apache Software Foundation. All rights reserved.