Class SparkDeserializerFactory

java.lang.Object
com.linkedin.feathr.common.featurizeddataset.SparkDeserializerFactory

public final class SparkDeserializerFactory extends Object
A converter from FDS Spark objects to TensorData. Instead of creating a new memory copy, a thin wrapper (view) around the Spark structure is created. This reduces GC pressure, but some operations (e.g., cardinality or isEmpty) are more expensive than usual. For working with individual feature values, use getFeatureDeserializer(com.linkedin.feathr.common.tensor.TensorType). NOTE that this class directly operates with scala.collection.Seq values coming from Spark DataFrame. An alternative would be converting those to java.util.List, but given the performance requirements of this class, that was not chosen.
  • Method Details

    • getFeatureDeserializer

      public static FeatureDeserializer getFeatureDeserializer(TensorType tensorType)
      Creates a converter from Spark feature value to a TensorData.
      Parameters:
      tensorType - the type of the converted values
      Returns:
      a converter from Spark feature value to a TensorData of tensorType