Class FeatureDependencyGraph

java.lang.Object
com.linkedin.feathr.common.FeatureDependencyGraph

@InternalApi public class FeatureDependencyGraph extends Object
A dependency graph for feature anchors and feature derivations. Purpose 1: Given a list of features' dependencies and which features are anchored, build a graph that can determine which features are reachable and can resolve features' transitive dependencies on demand. Purpose 2: Given a list of features with entity key bindings (a.k.a. key tags), provide fully expanded list of all transitive dependencies including entity key bindings. E.g. if feature2 depends on feature1, and the request was for [ (a1):feature2, (a2):feature2 ], then the expanded dependencies would be: [ (a1):feature2, (a2):feature2, (a1):feature1, (a2):feature1]
  • Constructor Details

    • FeatureDependencyGraph

      public FeatureDependencyGraph(Map<String,Set<ErasedEntityTaggedFeature>> dependencyFeatures, Collection<String> anchoredFeatures)
      Constructs a FeatureDependencyGraph for a given map of features' dependency relationships and a list of which features are anchored.
      Parameters:
      dependencyFeatures - Map of derived feature names to their sets of required inputs (described as TaggedFeatureNames)
      anchoredFeatures - List of anchored feature names
  • Method Details

    • isDeclared

      public boolean isDeclared(String feature)
      Returns whether a given feature name is present in the dependency graph
      Parameters:
      feature -
      Returns:
      whether a given feature name is present in the dependency graph
    • isReachable

      @Deprecated public boolean isReachable(String feature)
      Deprecated.
      Returns whether a given feature is reachable
      Parameters:
      feature -
      Returns:
      isReachable Boolean
    • isReachableWithErrorMessage

      public com.linkedin.feathr.common.FeatureDependencyGraph.Pair<Boolean,String> isReachableWithErrorMessage(String feature)
      Returns whether a given feature is reachable
      Parameters:
      feature -
      Returns:
      A Pair of Boolean and String. Boolean indicates if it's reachable and the String indicates the error message if not reachable.
    • getPlan

      public List<String> getPlan(Collection<String> features)
      Construct a plan for procuring a group of features. For a given group of features, what are all the features (transitive dependencies) that need to be procured in order to derive them? This function returns a complete, ordered list of features sufficient to derive the given group of features when evaluated in-order.
      Parameters:
      features -
      Returns:
      Ordered list of feature names that can be resolved in-sequence to produce a superset of the given group of features. The returned list is NOT just a re-ordering of the input features, and may contain other features that weren't specifically requested but are required as dependencies. The returned list will always contain all of the features provided in the input.
    • getOrderedPlanForRequest

      @Deprecated public List<TaggedFeatureName> getOrderedPlanForRequest(Collection<TaggedFeatureName> request)
      Deprecated.
    • getOrderedPlanForFeatureUrns

      public List<TaggedFeatureName> getOrderedPlanForFeatureUrns(Collection<TaggedFeatureName> request)
      Returns an ordered list of features including the requested features and its dependencies that represents their execution order. For example, if the feature dependency is A->B, B->C and (A,C) -> D. Then one possible execution order would be: A, B, C, D
    • getComputationPipeline

      public List<Set<TaggedFeatureName>> getComputationPipeline(Collection<TaggedFeatureName> requestedFeatures)
      Return a computation pipeline (a collection of stages that can be computed simultaneously) for the requested features by examining their dependencies and grouping them based on their depth. For each stage, all features will have their direct dependencies resolved from the previous stages so they can be computed simultaneously For example, if the feature dependencies are as follows and features E and F are requested A -> C B -> D (C,D) -> E F Then the result would be returned in the form of an ordered list: [[A,B, F],[C,D],[E]] which represents stage 1: A, B, F stage 2: C, D stage 3: E Disclaimer: the pipeline based approach provides one way to optimize the feature execution and their dependencies but it is by no means the most optimal. For example, in the example above, if A is very slow compared to B and F, then computation of C will be blocked until A is ready. The optimal solution where each feature is computed in isolation all the way from its root dependency.
      Returns:
      a sorted list of features represents the stages for feature execution
    • toString

      public String toString()
      Overrides:
      toString in class Object