public class ParquetReaderUtility extends Object
| Modifier and Type | Class and Description |
|---|---|
static class |
ParquetReaderUtility.DateCorruptionStatus
For most recently created parquet files, we can determine if we have corrupted dates (see DRILL-4203)
based on the file metadata.
|
static class |
ParquetReaderUtility.NanoTimeUtils
Utilities for converting from parquet INT96 binary (impala, hive timestamp)
to date time value.
|
| Modifier and Type | Field and Description |
|---|---|
static long |
CORRECT_CORRUPT_DATE_SHIFT
All old parquet files (which haven't "is.date.correct=true" or "parquet-writer.version" properties
in metadata) have a corrupt date shift: 4881176L days or 2 * 2440588L
|
static int |
DATE_CORRUPTION_THRESHOLD
The year 5000 (or 1106685 day from Unix epoch) is chosen as the threshold for auto-detecting date corruption.
|
static int |
DRILL_WRITER_VERSION_STD_DATE_FORMAT
Version 2 (and later) of the Drill Parquet writer uses the date format described in the
Parquet spec.
|
static long |
JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH
Number of days between Julian day epoch (January 1, 4713 BC) and Unix day epoch (January 1, 1970).
|
| Constructor and Description |
|---|
ParquetReaderUtility() |
| Modifier and Type | Method and Description |
|---|---|
static int |
autoCorrectCorruptedDate(int corruptedDate) |
static void |
checkDecimalTypeEnabled(OptionManager options) |
static ParquetReaderUtility.DateCorruptionStatus |
checkForCorruptDateValuesInStatistics(org.apache.parquet.hadoop.metadata.ParquetMetadata footer,
List<SchemaPath> columns,
boolean autoCorrectCorruptDates)
Detect corrupt date values by looking at the min/max values in the metadata.
|
static void |
correctBinaryInMetadataCache(Metadata.ParquetTableMetadataBase parquetTableMetadata)
Checks assigns byte arrays to min/max values obtained from the deserialized string
for BINARY.
|
static void |
correctDatesInMetadataCache(Metadata.ParquetTableMetadataBase parquetTableMetadata) |
static ParquetReaderUtility.DateCorruptionStatus |
detectCorruptDates(org.apache.parquet.hadoop.metadata.ParquetMetadata footer,
List<SchemaPath> columns,
boolean autoCorrectCorruptDates)
Check for corrupted dates in a parquet file.
|
static Map<String,org.apache.parquet.format.SchemaElement> |
getColNameToSchemaElementMapping(org.apache.parquet.hadoop.metadata.ParquetMetadata footer) |
static int |
getIntFromLEBytes(byte[] input,
int start) |
public static final long JULIAN_DAY_NUMBER_FOR_UNIX_EPOCH
public static final long CORRECT_CORRUPT_DATE_SHIFT
public static final int DATE_CORRUPTION_THRESHOLD
public static final int DRILL_WRITER_VERSION_STD_DATE_FORMAT
CORRECT_CORRUPT_DATE_SHIFTpublic static void checkDecimalTypeEnabled(OptionManager options)
public static int getIntFromLEBytes(byte[] input,
int start)
public static Map<String,org.apache.parquet.format.SchemaElement> getColNameToSchemaElementMapping(org.apache.parquet.hadoop.metadata.ParquetMetadata footer)
public static int autoCorrectCorruptedDate(int corruptedDate)
public static void correctDatesInMetadataCache(Metadata.ParquetTableMetadataBase parquetTableMetadata)
public static void correctBinaryInMetadataCache(Metadata.ParquetTableMetadataBase parquetTableMetadata)
parquetTableMetadata - table metadata that should be correctedpublic static ParquetReaderUtility.DateCorruptionStatus detectCorruptDates(org.apache.parquet.hadoop.metadata.ParquetMetadata footer, List<SchemaPath> columns, boolean autoCorrectCorruptDates)
public static ParquetReaderUtility.DateCorruptionStatus checkForCorruptDateValuesInStatistics(org.apache.parquet.hadoop.metadata.ParquetMetadata footer, List<SchemaPath> columns, boolean autoCorrectCorruptDates)
ParquetRecordWriter.WRITER_VERSION_PROPERTY <
DRILL_WRITER_VERSION_STD_DATE_FORMAT)
This method only checks the first Row Group, because Drill has only ever written
a single Row Group per file.footer - columns - autoCorrectCorruptDates - user setting to allow enabling/disabling of auto-correction
of corrupt dates. There are some rare cases (storing dates thousands
of years into the future, with tools other than Drill writing files)
that would result in the date values being "corrected" into bad values.Copyright © 2017 The Apache Software Foundation. All rights reserved.