Class IterativeSimilarityAggregation


  • public final class IterativeSimilarityAggregation
    extends java.lang.Object
    Iterative similarity aggregation for named entity recognition and set expansion based on the paper "SEISA: Set Expansion by Iterative Similarity Aggregation". Those who wonder what the package name "ner" stands for, it is "named entity recognition".
    Author:
    thomas.jungblut
    • Constructor Summary

      Constructors 
      Constructor Description
      IterativeSimilarityAggregation​(java.lang.String[] seedTokens, de.jungblut.math.tuple.Tuple<java.lang.String[],​de.jungblut.math.DoubleMatrix> bipartiteGraph)
      Constructs the similarity aggregation by seed tokens to expand and a given bipartite graph.
      IterativeSimilarityAggregation​(java.lang.String[] seedTokens, de.jungblut.math.tuple.Tuple<java.lang.String[],​de.jungblut.math.DoubleMatrix> bipartiteGraph, double alpha, DistanceMeasurer distance)
      Constructs the similarity aggregation by seed tokens to expand and a given bipartite graph.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String[] startStaticThresholding​(double similarityThreshold, int maxIterations, boolean verbose)
      Starts the static thresholding algorithm and returns the expandedset of newly found related tokens.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • IterativeSimilarityAggregation

        public IterativeSimilarityAggregation​(java.lang.String[] seedTokens,
                                              de.jungblut.math.tuple.Tuple<java.lang.String[],​de.jungblut.math.DoubleMatrix> bipartiteGraph)
        Constructs the similarity aggregation by seed tokens to expand and a given bipartite graph. The bipartite graph is represented as a two tuple, which consists of the vertices (called (candidate-) terms or entities) on the first item and the edges between those is a NxM matrix, where n is the entity tokens count and m is the number of the context vertices. Alpha is set to 0.5 and the cosine distance is used.
      • IterativeSimilarityAggregation

        public IterativeSimilarityAggregation​(java.lang.String[] seedTokens,
                                              de.jungblut.math.tuple.Tuple<java.lang.String[],​de.jungblut.math.DoubleMatrix> bipartiteGraph,
                                              double alpha,
                                              DistanceMeasurer distance)
        Constructs the similarity aggregation by seed tokens to expand and a given bipartite graph. The bipartite graph is represented as a three tuple, which consists of the vertices (called (candidate-) terms or entities) on the first item, the context vertices on the second item and the edges between those is a NxM matrix, where n is the entity tokens count and m is the number of the context vertices. Alpha is the constant weighting factor used throughout the paper (usually 0.5). The distance measurer to be used must be also defined.
    • Method Detail

      • startStaticThresholding

        public java.lang.String[] startStaticThresholding​(double similarityThreshold,
                                                          int maxIterations,
                                                          boolean verbose)
        Starts the static thresholding algorithm and returns the expandedset of newly found related tokens.
        Parameters:
        maxIterations - if > 0 the algorithm will stop after reached maxIterations.