Package de.julielab.jules.ae.genemapping
Class CandidateFilter
- java.lang.Object
-
- de.julielab.jules.ae.genemapping.CandidateFilter
-
public class CandidateFilter extends java.lang.Object
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringAMINO_ACIDSjava.lang.StringDOMAIN_FAMILIESstatic java.lang.String[]GREEKstatic java.lang.StringGREEK_REGEXstatic java.util.Map<java.lang.String,java.lang.String>greekAbbrMapstatic java.lang.String[]LAT_NUMstatic java.lang.StringLAT_NUM_REGEXjava.util.regex.MatchermatcherNonDescjava.util.regex.MatchermatcherUnspecifiedsstatic java.lang.StringMODIFIERjava.lang.StringNON_DESCstatic java.lang.StringNON_DESCRIPTIVEjava.util.regex.PatternpatternDomainFamiliesjava.util.regex.PatternpatternNonDescjava.util.regex.PatternpatternPreModsjava.util.regex.PatternpatternUnspecifiedsjava.lang.StringPREMODSstatic java.lang.StringSUB_GREEKjava.lang.StringUNSPECIFIEDS
-
Constructor Summary
Constructors Constructor Description CandidateFilter()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static java.lang.StringexpendGreek(java.lang.String s)Looks for single letters that could be interpreted as the short form of a greek character such as a -> alpha, b -> beta und returns a string with expanded greek letters.booleanfilterOut(java.lang.String searchTerm, java.lang.String foundTerm)method to filtered out some hits by some rules rule 1: if overlap is only constituted by numbersstatic com.google.common.collect.Multiset<java.lang.String>getContentTokens(java.lang.String[] tokens)static com.google.common.collect.Multiset<java.lang.String>getNumberOfCommonTokens(java.lang.String normalizedMention, java.lang.String synonym)static com.google.common.collect.Multiset<java.lang.String>getNumbers(java.lang.String[] tokens)static com.google.common.collect.Multiset<java.lang.String>getSingleSymbols(java.lang.String[] tokens)Single characters, numbers, greek characters.booleanhasContradictingGreek(java.lang.String s1, java.lang.String s2)voidinitPreModifiers()voidinitUnspecifieds()booleanisNonDescriptive(java.lang.String word)static booleanisNumberCompatible(java.lang.String normalizedMention, java.lang.String synonym)booleanisUnspecified(java.lang.String word)static voidmain(java.lang.String[] args)
-
-
-
Field Detail
-
GREEK
public static final java.lang.String[] GREEK
-
LAT_NUM
public static final java.lang.String[] LAT_NUM
-
GREEK_REGEX
public static java.lang.String GREEK_REGEX
-
LAT_NUM_REGEX
public static java.lang.String LAT_NUM_REGEX
-
greekAbbrMap
public static final java.util.Map<java.lang.String,java.lang.String> greekAbbrMap
-
SUB_GREEK
public static final java.lang.String SUB_GREEK
- See Also:
- Constant Field Values
-
MODIFIER
public static java.lang.String MODIFIER
-
NON_DESCRIPTIVE
public static java.lang.String NON_DESCRIPTIVE
-
AMINO_ACIDS
public static java.lang.String AMINO_ACIDS
-
NON_DESC
public java.lang.String NON_DESC
-
patternNonDesc
public java.util.regex.Pattern patternNonDesc
-
matcherNonDesc
public java.util.regex.Matcher matcherNonDesc
-
DOMAIN_FAMILIES
public java.lang.String DOMAIN_FAMILIES
-
patternDomainFamilies
public java.util.regex.Pattern patternDomainFamilies
-
UNSPECIFIEDS
public java.lang.String UNSPECIFIEDS
-
patternUnspecifieds
public java.util.regex.Pattern patternUnspecifieds
-
matcherUnspecifieds
public java.util.regex.Matcher matcherUnspecifieds
-
PREMODS
public java.lang.String PREMODS
-
patternPreMods
public java.util.regex.Pattern patternPreMods
-
-
Method Detail
-
main
public static void main(java.lang.String[] args) throws java.io.IOException- Throws:
java.io.IOException
-
filterOut
public boolean filterOut(java.lang.String searchTerm, java.lang.String foundTerm)method to filtered out some hits by some rules rule 1: if overlap is only constituted by numbers- Returns:
-
initUnspecifieds
public void initUnspecifieds() throws java.io.IOException- Throws:
java.io.IOException
-
initPreModifiers
public void initPreModifiers() throws java.io.IOException- Throws:
java.io.IOException
-
hasContradictingGreek
public boolean hasContradictingGreek(java.lang.String s1, java.lang.String s2)
-
expendGreek
public static java.lang.String expendGreek(java.lang.String s)
Looks for single letters that could be interpreted as the short form of a greek character such as a -> alpha, b -> beta und returns a string with expanded greek letters. For letter collision, always the first greek letter is used, i.e. e -> epsilon. Thus, eta won't every be returned.- Parameters:
s- The string to expand greek abbreviation characters.- Returns:
-
isNumberCompatible
public static boolean isNumberCompatible(java.lang.String normalizedMention, java.lang.String synonym)
-
getNumbers
public static com.google.common.collect.Multiset<java.lang.String> getNumbers(java.lang.String[] tokens)
-
getSingleSymbols
public static com.google.common.collect.Multiset<java.lang.String> getSingleSymbols(java.lang.String[] tokens)
Single characters, numbers, greek characters.- Parameters:
tokens-- Returns:
-
getContentTokens
public static com.google.common.collect.Multiset<java.lang.String> getContentTokens(java.lang.String[] tokens)
-
getNumberOfCommonTokens
public static com.google.common.collect.Multiset<java.lang.String> getNumberOfCommonTokens(java.lang.String normalizedMention, java.lang.String synonym)
-
isUnspecified
public boolean isUnspecified(java.lang.String word)
-
isNonDescriptive
public boolean isNonDescriptive(java.lang.String word)
-
-