Package de.julielab.jules.ae.genemapping
Class GeneMapping
- java.lang.Object
-
- de.julielab.jules.ae.genemapping.GeneMapping
-
public class GeneMapping extends java.lang.Object
-
-
Field Summary
Fields Modifier and Type Field Description static intJAROWINKLER_SCORERstatic booleanLEGACY_INDEX_SUPPORTstatic intLEVENSHTEIN_SCORERstatic intLUCENE_SCORERstatic java.lang.StringMAPPING_COREDeprecated.static intMAXENT_SCORERstatic intSIMPLE_SCORERstatic java.lang.StringSOURCE_DEFINITIONstatic intTFIDFDeprecated.static intTOKEN_JAROWINKLER_SCORER
-
Constructor Summary
Constructors Constructor Description GeneMapping(GeneMappingConfiguration configuration)GeneMapping(java.io.File propertiesFile)Main constructor for the GeneMapper reading especially properties information.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description GeneMappingConfigurationgetConfiguration()MappingCoregetMappingCore()DocumentMappingResultmap(GeneDocument document)MentionMappingResultmap(GeneMention searchTerm, org.apache.lucene.search.BooleanQuery contextQuery, java.lang.String documentContext)Actual mapping method.java.util.List<SynHit>map(java.lang.String searchTerm, org.apache.lucene.search.BooleanQuery contextQuery)A wrapper to the main mapping function.MentionMappingResultmap(java.lang.String term, org.apache.lucene.search.BooleanQuery contextQuery, java.lang.String documentContext)Convenience method mostly used for tests.java.util.ArrayList<SynHit>mapTopN(java.lang.String searchTerm, int topN)This mapping returns a list of SynHits.static java.lang.StringremoveDomainFamilies(java.lang.String normalizedSearchTerm)static java.lang.StringremoveModifiers(java.lang.String normalizedSearchTerm)static java.lang.StringremoveNondescriptives(java.lang.String normalizedSearchTerm)static java.lang.StringremovePremodifiers(java.lang.String normalizedSearchTerm)static java.lang.StringremoveUnspecifieds(java.lang.String normalizedSearchTerm)voidsetMappingCore(MappingCore mappingCore)
-
-
-
Field Detail
-
LEGACY_INDEX_SUPPORT
public static final boolean LEGACY_INDEX_SUPPORT
- See Also:
- Constant Field Values
-
SOURCE_DEFINITION
public static final java.lang.String SOURCE_DEFINITION
- See Also:
- Constant Field Values
-
SIMPLE_SCORER
public static final int SIMPLE_SCORER
- See Also:
- Constant Field Values
-
TOKEN_JAROWINKLER_SCORER
public static final int TOKEN_JAROWINKLER_SCORER
- See Also:
- Constant Field Values
-
MAXENT_SCORER
public static final int MAXENT_SCORER
- See Also:
- Constant Field Values
-
JAROWINKLER_SCORER
public static final int JAROWINKLER_SCORER
- See Also:
- Constant Field Values
-
LEVENSHTEIN_SCORER
public static final int LEVENSHTEIN_SCORER
- See Also:
- Constant Field Values
-
TFIDF
public static final int TFIDF
Deprecated.This was a test using SecondString TFIDF for scoring but wasn't used eventually. Lucene is very good.- See Also:
- Constant Field Values
-
LUCENE_SCORER
public static final int LUCENE_SCORER
- See Also:
- Constant Field Values
-
MAPPING_CORE
@Deprecated public static final java.lang.String MAPPING_CORE
Deprecated.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
GeneMapping
public GeneMapping(java.io.File propertiesFile) throws java.io.IOException, GeneMappingExceptionMain constructor for the GeneMapper reading especially properties information.- Parameters:
propertiesFile-- Throws:
java.io.IOExceptionGeneMappingExceptionorg.apache.lucene.index.CorruptIndexException
-
GeneMapping
public GeneMapping(GeneMappingConfiguration configuration) throws java.io.IOException, GeneMappingException
- Throws:
java.io.IOExceptionGeneMappingException
-
-
Method Detail
-
mapTopN
public java.util.ArrayList<SynHit> mapTopN(java.lang.String searchTerm, int topN) throws java.io.IOException, GeneCandidateRetrievalException
This mapping returns a list of SynHits. No semantic disambiguation is done here. TopN hits with the highest (lucene) scores are returned. Not needed for actual mapping but used for generating training material for MaxEntScorer.- Parameters:
searchTerm- the term to be mappedtopN- number of hits to be returned- Throws:
GeneCandidateRetrievalExceptionjava.io.IOException
-
map
public java.util.List<SynHit> map(java.lang.String searchTerm, org.apache.lucene.search.BooleanQuery contextQuery) throws GeneMappingException
A wrapper to the main mapping function. This one does not require an organism to be specified and does thus completely organism-agnostic search (currently used basically for backward compatibility to BC evaluation).- Parameters:
searchTerm-contextQuery-- Returns:
- the SynHits that apply to the given searchTerm
- Throws:
java.lang.ExceptionGeneMappingException
-
map
public MentionMappingResult map(GeneMention searchTerm, org.apache.lucene.search.BooleanQuery contextQuery, java.lang.String documentContext) throws GeneMappingException
Actual mapping method. This mapping functions has semantic disambiguation as well. First it checks for general, organism-specific hits (getCandidates). If organisms is given (i.e. is not null or not empty) semantic disambiguation is performed with this organism list.- Parameters:
searchTerm- the term to do the mapping forcontextQuery- the term's context (i.e. the document/abstract where it was found in)documentContext-- Returns:
- ArrayList with SynHits
- Throws:
java.lang.ExceptionGeneMappingException
-
map
public DocumentMappingResult map(GeneDocument document) throws GeneMappingException
- Throws:
GeneMappingException
-
removeModifiers
public static java.lang.String removeModifiers(java.lang.String normalizedSearchTerm)
- Parameters:
normalizedSearchTerm-- Returns:
- the normalizedSearchTerm with all modifiers removed
-
removeUnspecifieds
public static java.lang.String removeUnspecifieds(java.lang.String normalizedSearchTerm)
-
removeNondescriptives
public static java.lang.String removeNondescriptives(java.lang.String normalizedSearchTerm)
-
removeDomainFamilies
public static java.lang.String removeDomainFamilies(java.lang.String normalizedSearchTerm)
-
removePremodifiers
public static java.lang.String removePremodifiers(java.lang.String normalizedSearchTerm)
-
getMappingCore
public MappingCore getMappingCore()
-
setMappingCore
public void setMappingCore(MappingCore mappingCore)
-
map
public MentionMappingResult map(java.lang.String term, org.apache.lucene.search.BooleanQuery contextQuery, java.lang.String documentContext) throws GeneMappingException
Convenience method mostly used for tests. The term will be wrapped into aGeneMention. However, no offset information or other data about the original gene mention will be known, of course.- Parameters:
term-contextQuery-documentContext-- Returns:
- Throws:
java.lang.ExceptionGeneMappingException
-
getConfiguration
public GeneMappingConfiguration getConfiguration()
-
-