Package de.jungblut.ner
Class IterativeSimilarityAggregation
- java.lang.Object
-
- de.jungblut.ner.IterativeSimilarityAggregation
-
public final class IterativeSimilarityAggregation extends java.lang.ObjectIterative similarity aggregation for named entity recognition and set expansion based on the paper "SEISA: Set Expansion by Iterative Similarity Aggregation". Those who wonder what the package name "ner" stands for, it is "named entity recognition".- Author:
- thomas.jungblut
-
-
Constructor Summary
Constructors Constructor Description IterativeSimilarityAggregation(java.lang.String[] seedTokens, de.jungblut.math.tuple.Tuple<java.lang.String[],de.jungblut.math.DoubleMatrix> bipartiteGraph)Constructs the similarity aggregation by seed tokens to expand and a given bipartite graph.IterativeSimilarityAggregation(java.lang.String[] seedTokens, de.jungblut.math.tuple.Tuple<java.lang.String[],de.jungblut.math.DoubleMatrix> bipartiteGraph, double alpha, DistanceMeasurer distance)Constructs the similarity aggregation by seed tokens to expand and a given bipartite graph.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String[]startStaticThresholding(double similarityThreshold, int maxIterations, boolean verbose)Starts the static thresholding algorithm and returns the expandedset of newly found related tokens.
-
-
-
Constructor Detail
-
IterativeSimilarityAggregation
public IterativeSimilarityAggregation(java.lang.String[] seedTokens, de.jungblut.math.tuple.Tuple<java.lang.String[],de.jungblut.math.DoubleMatrix> bipartiteGraph)Constructs the similarity aggregation by seed tokens to expand and a given bipartite graph. The bipartite graph is represented as a two tuple, which consists of the vertices (called (candidate-) terms or entities) on the first item and the edges between those is a NxM matrix, where n is the entity tokens count and m is the number of the context vertices. Alpha is set to 0.5 and the cosine distance is used.
-
IterativeSimilarityAggregation
public IterativeSimilarityAggregation(java.lang.String[] seedTokens, de.jungblut.math.tuple.Tuple<java.lang.String[],de.jungblut.math.DoubleMatrix> bipartiteGraph, double alpha, DistanceMeasurer distance)Constructs the similarity aggregation by seed tokens to expand and a given bipartite graph. The bipartite graph is represented as a three tuple, which consists of the vertices (called (candidate-) terms or entities) on the first item, the context vertices on the second item and the edges between those is a NxM matrix, where n is the entity tokens count and m is the number of the context vertices. Alpha is the constant weighting factor used throughout the paper (usually 0.5). The distance measurer to be used must be also defined.
-
-
Method Detail
-
startStaticThresholding
public java.lang.String[] startStaticThresholding(double similarityThreshold, int maxIterations, boolean verbose)Starts the static thresholding algorithm and returns the expandedset of newly found related tokens.- Parameters:
maxIterations- if > 0 the algorithm will stop after reached maxIterations.
-
-