package diffy
- Alphabetic
- Public
- Protected
Type Members
- class AvroDiffy[T <: GenericRecord] extends Diffy[T]
Field level diff tool for Avro records.
- class BigDiffy[T] extends Serializable
Big diff between two data sets given a primary key.
- case class Delta(field: String, left: Option[Any], right: Option[Any], delta: DeltaValue) extends Product with Serializable
Delta of a single field between two records.
Delta of a single field between two records.
- field
"." separated field identifier
- left
Option(left hand side value), None if null
- right
Option(right hand side value), None if null
- delta
delta of numerical values
- case class DeltaStats(deltaType: DeltaType.Value, min: Double, max: Double, count: Long, mean: Double, variance: Double, stddev: Double, skewness: Double, kurtosis: Double) extends Product with Serializable
Delta level statistics, mean, and the four standardized moments.
Delta level statistics, mean, and the four standardized moments.
deltaType - one of NUMERIC, STRING, VECTOR min - minimum distance seen max - maximum distance seen count - number of differences seen mean - mean of all differences variance - squared deviation from the mean stddev - standard deviation from the mean skewness - measure of data asymmetry in all deltas kurtosis - measure of distribution sharpness and tail thickness in deltas
- sealed trait DeltaValue extends AnyRef
Delta value of a single node between two records.
- abstract class Diffy[T] extends Serializable
Field level diff tool.
- case class FieldStats(field: String, count: Long, fraction: Double, deltaStats: Option[DeltaStats]) extends Product with Serializable
Field level statistics.
Field level statistics.
field - "." separated field identifier. count - number of records with different values of the given field. fraction - fraction over total number of keys with different records on both sides. deltaStats - statistics of field value deltas.
- case class GlobalStats(numTotal: Long, numSame: Long, numDiff: Long, numMissingLhs: Long, numMissingRhs: Long) extends Product with Serializable
Global level statistics.
Global level statistics.
numTotal - number of total unique keys. numSame - number of keys with same records on both sides. numDiff - number of keys with different records on both sides. numMissingLhs - number of keys with missing left hand side record. numMissingRhs - number of keys with missing right hand side record.
- case class KeyStats(keys: MultiKey, diffType: DiffType.Value, delta: Option[Delta]) extends Product with Serializable
Key-field level DiffType and delta.
Key-field level DiffType and delta.
If DiffType are SAME, MISSING_LHS, or MISSING_RHS they will appear once with no Delta If DiffType is DIFFERENT, there is one KeyStats for every field that is different for that key with that field's Delta
keys - primary being compared. diffType - how the two records of the given key compares. delta - a single field's difference including field name, values, and distance
- final case class MultiKey(keys: Seq[String]) extends AnyVal with Product with Serializable
- sealed trait OutputMode extends AnyRef
- class ProtoBufDiffy[T <: AbstractMessage] extends Diffy[T]
Field level diff tool for ProtoBuf records.
- class TableRowDiffy extends Diffy[TableRow]
Field level diff tool for TableRow records.
- case class TypedDelta(deltaType: DeltaType.Value, value: Double) extends DeltaValue with Product with Serializable
Delta value with a known type and computed difference.
Value Members
- case object BQ extends OutputMode with Product with Serializable
- object BigDiffy extends Command with Serializable
Big diff between two data sets given a primary key.
- object CosineDistance
Compute cosine distance between two vectors.
- object DeltaType extends Enumeration
Delta type of a single node between two records.
Delta type of a single node between two records.
UNKNOWN - unknown type, no numeric delta is computed. NUMERIC - numeric type, e.g. Long, Double, default delta is numeric difference. STRING - string type, default delta is Levenshtein edit distance. VECTOR - repeated numeric type, default delta is 1.0 - cosine similarity.
- object DiffType extends Enumeration
Diff type between two records of the same key.
Diff type between two records of the same key.
SAME - the two records are identical. DIFFERENT - the two records are different. MISSING_LHS - left hand side record is missing. MISSING_RHS - right hand side record is missing.
- case object GCS extends OutputMode with Product with Serializable
- object Levenshtein
Compute Levenshtein edit distance between two strings.
Compute Levenshtein edit distance between two strings. https://rosettacode.org/wiki/Levenshtein_distance#Scala
- object MultiKey extends Serializable
- object NumericDelta
Companion objects for
TypedDelta. - object StringDelta
- case object UnknownDelta extends DeltaValue with Product with Serializable
Delta value of unknown type.
- object VectorDelta