Package de.julielab.jcore.utility.index
Class TermGenerators
- java.lang.Object
-
- de.julielab.jcore.utility.index.TermGenerators
-
public class TermGenerators extends Object
This class offers a range of predefined term generators (to be used as a constructor argument toJCoReMapAnnotationIndexthat might be useful in a range of applications.- Author:
- faessler
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classTermGenerators.LongOffsetIndexTermGenerator
-
Constructor Summary
Constructors Constructor Description TermGenerators()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static IndexTermGenerator<String>edgeNGramTermGenerator(int n)Generates all prefixes between length of 1 and length of max(n, annotation.getCoveredText().length()) for an annotation a.static IndexTermGenerator<String>exactPrefixTermGenerator(int length)Generates as a search term the prefix of the covered text of an annotation of length length.static IndexTermGenerator<String>exactSuffixTermGenerator(int length)Generates as a search term the suffix of the covered text of an annotation of length length.static TermGenerators.LongOffsetIndexTermGeneratorlongOffsetTermGenerator()static IndexTermGenerator<String>nGramTermGenerator(int n)Creates strict n-grams of the covered text of an annotation.static IndexTermGenerator<String>prefixTermGenerator(int maxLength)Generates as a search term the prefix of the covered text of an annotation up to length length.static IndexTermGenerator<String>suffixTermGenerator(int maxLength)Generates as a search term the suffix of the covered text of an annotation up to length length.
-
-
-
Method Detail
-
nGramTermGenerator
public static IndexTermGenerator<String> nGramTermGenerator(int n)
Creates strict n-grams of the covered text of an annotation. Returned terms are always of length n. Annotations shorter than n will not return any terms.- Parameters:
n- The n-gram size.- Returns:
- The n-gram index terms.
-
edgeNGramTermGenerator
public static IndexTermGenerator<String> edgeNGramTermGenerator(int n)
Generates all prefixes between length of 1 and length of max(n, annotation.getCoveredText().length()) for an annotation a.- Parameters:
n- The maximum prefix length.- Returns:
- An index generated generating edge n-grams to a maxmimum length of n.
-
prefixTermGenerator
public static IndexTermGenerator<String> prefixTermGenerator(int maxLength)
Generates as a search term the prefix of the covered text of an annotation up to length length. If the annotation is shorter than length the whole annotation text is returned.- Parameters:
maxLength- The maximum prefix length.- Returns:
- The annotation text prefix of maximum length length
-
suffixTermGenerator
public static IndexTermGenerator<String> suffixTermGenerator(int maxLength)
Generates as a search term the suffix of the covered text of an annotation up to length length. If the annotation is shorter than length the whole annotation text is returned.- Parameters:
maxLength- The maximum suffix length.- Returns:
- The annotation text suffix of maximum length length
-
exactPrefixTermGenerator
public static IndexTermGenerator<String> exactPrefixTermGenerator(int length)
Generates as a search term the prefix of the covered text of an annotation of length length. If the annotation is shorter than length no terms are generated.- Parameters:
length- The prefix length.- Returns:
- The annotation text prefix of length length
-
exactSuffixTermGenerator
public static IndexTermGenerator<String> exactSuffixTermGenerator(int length)
Generates as a search term the suffix of the covered text of an annotation of length length. If the annotation is shorter than length no terms are generated.- Parameters:
length- The suffix length.- Returns:
- The annotation text suffix of length length
-
longOffsetTermGenerator
public static TermGenerators.LongOffsetIndexTermGenerator longOffsetTermGenerator()
-
-