Class HunmorphLanguageHandler
java.lang.Object
com.github.szgabsz91.morpher.languagehandlers.hunmorph.impl.HunmorphLanguageHandler
- All Implemented Interfaces:
com.github.szgabsz91.morpher.core.io.IConvertable<HunmorphLanguageHandlerMessage>,com.github.szgabsz91.morpher.core.io.ISavable,com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>,AutoCloseable
@Qualifier("com.github.szgabsz91.morpher.languagehandlers.hunmorph.HunmorphLanguageHandler")
public class HunmorphLanguageHandler
extends Object
implements com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
Hunmorph-Ocamorph based language handler implementation.
-
Constructor Summary
ConstructorsConstructorDescriptionDefault constructor that initializes the language handler state. -
Method Summary
Modifier and TypeMethodDescriptioncom.github.szgabsz91.morpher.languagehandlers.api.model.LanguageHandlerResponseanalyze(com.github.szgabsz91.morpher.core.model.FrequencyAwareWord word) Analyzes the given word, updates the internal state and returns new word pairs.com.github.szgabsz91.morpher.languagehandlers.api.model.LanguageHandlerResponseAnalyzes the given set of words, updates the internal state and returns new word pairs.List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult> analyzeInternally(com.github.szgabsz91.morpher.core.model.FrequencyAwareWord word) Returns a list ofAnnotationTokenizerResultfor the givenWord.List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult> analyzeInternally(com.github.szgabsz91.morpher.core.model.FrequencyAwareWord word, boolean guess) Returns a list ofAnnotationTokenizerResultfor the givenWord.com.github.szgabsz91.morpher.languagehandlers.api.model.AffixTypeChaincalculateProbabilities(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes) Creates a newAffixTypeChaincontaining the given affix types.voidclose()Closes the internalHunmorphWordProcessorinstance.voidLoads the state from the given message.voidfromMessage(com.google.protobuf.Any message) Loads the state from the given message.List<com.github.szgabsz91.morpher.languagehandlers.api.model.ProbabilisticAffixType> getAnalysisCandidates(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes) Returns the list of affix type candidates for the given affix types, sorted by relative frequency, in descending order.Returns the map of annotation tokenizer results.com.github.szgabsz91.morpher.languagehandlers.api.model.ProbabilisticAffixTypegetEndingAnalysisCandidate(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes) Returns aProbabilisticAffixTypethat has an ending affix type and the conditional probability of ending the given chain.Returns the map of lemmas.Returns the underlyingIMarkovModelinstance to use for inflection.List<com.github.szgabsz91.morpher.languagehandlers.api.model.ProbabilisticAffixType> getPOSCandidates(com.github.szgabsz91.morpher.core.model.Word lemma) Returns the part of speech tag of the given lemma, or an empty optional if the lemma is unknown.Returns the underlyingIMarkovModelinstance to use for morphological analysis.List<com.github.szgabsz91.morpher.core.model.AffixType> Returns the list of supported affix types.booleanisAffixTypeChainValid(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypeChain) Returns if the given affix type chain is valid or not.booleanisPOS(com.github.szgabsz91.morpher.core.model.AffixType affixType) Returns if the given affix type is a part of speech tag or not.voidlearnAffixTypeChains(Set<List<com.github.szgabsz91.morpher.core.model.AffixType>> affixTypeChains) Learns the givenAffixTypechains.voidlearnAnnotationTokenizerResults(Map<String, List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>> annotationTokenizerResults) Learns the givenAnnotationTokenizerResults.voidlearnLemmas(com.github.szgabsz91.morpher.languagehandlers.api.model.LemmaMap lemmaMap) Adds the words in the given map to the set of known lemmas.voidLoads the language handler from the given file.voidSaves the language handler state to the given file.voidsetAnnotationTokenizerResultMap(Map<String, List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>> annotationTokenizerResultMap) Sets the map of annotation tokenizer results.voidsetLemmaMap(Map<String, Set<com.github.szgabsz91.morpher.core.model.AffixType>> lemmaMap) Sets the map of lemmas.voidsetMarkovModel(IMarkovModel markovModel) Sets the underlyingIMarkovModelinstance to use for inflection.voidsetReversedMarkovModel(IMarkovModel reversedMarkovModel) Sets the underlyingIMarkovModelinstance to use for morphological analysis.List<com.github.szgabsz91.morpher.languagehandlers.api.model.AffixTypeChain> sortAffixTypes(com.github.szgabsz91.morpher.core.model.Word lemma, Set<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes) Sorts the given set of affix types based on the previously learnt probabilities.Converts the state to a message.
-
Constructor Details
-
HunmorphLanguageHandler
public HunmorphLanguageHandler()Default constructor that initializes the language handler state.
-
-
Method Details
-
getAnnotationTokenizerResultMap
public Map<String,List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>> getAnnotationTokenizerResultMap()Returns the map of annotation tokenizer results.- Returns:
- the map of annotation tokenizer results
-
setAnnotationTokenizerResultMap
public void setAnnotationTokenizerResultMap(Map<String, List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>> annotationTokenizerResultMap) Sets the map of annotation tokenizer results.- Parameters:
annotationTokenizerResultMap- the map of annotation tokenizer results
-
getMarkovModel
Returns the underlyingIMarkovModelinstance to use for inflection.- Returns:
- the underlying
IMarkovModelinstance to use for inflection
-
setMarkovModel
Sets the underlyingIMarkovModelinstance to use for inflection.- Parameters:
markovModel- the underlyingIMarkovModelinstance to use for inflection
-
getReversedMarkovModel
Returns the underlyingIMarkovModelinstance to use for morphological analysis.- Returns:
- the underlying
IMarkovModelinstance to use for morphological analysis
-
setReversedMarkovModel
Sets the underlyingIMarkovModelinstance to use for morphological analysis.- Parameters:
reversedMarkovModel- the underlyingIMarkovModelinstance to use for morphological analysis
-
getLemmaMap
Returns the map of lemmas.- Returns:
- the map of lemmas
-
setLemmaMap
public void setLemmaMap(Map<String, Set<com.github.szgabsz91.morpher.core.model.AffixType>> lemmaMap) Sets the map of lemmas.- Parameters:
lemmaMap- the map of lemmas
-
close
public void close()Closes the internalHunmorphWordProcessorinstance.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
-
learnAnnotationTokenizerResults
public void learnAnnotationTokenizerResults(Map<String, List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>> annotationTokenizerResults) Learns the givenAnnotationTokenizerResults.- Specified by:
learnAnnotationTokenizerResultsin interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>- Parameters:
annotationTokenizerResults- the map ofAnnotationTokenizerResults for each affix type
-
learnAffixTypeChains
public void learnAffixTypeChains(Set<List<com.github.szgabsz91.morpher.core.model.AffixType>> affixTypeChains) Learns the givenAffixTypechains.- Specified by:
learnAffixTypeChainsin interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
-
learnLemmas
public void learnLemmas(com.github.szgabsz91.morpher.languagehandlers.api.model.LemmaMap lemmaMap) Adds the words in the given map to the set of known lemmas.- Specified by:
learnLemmasin interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>- Parameters:
lemmaMap- map containing the lemmas and their part of speech tags
-
analyzeInternally
public List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult> analyzeInternally(com.github.szgabsz91.morpher.core.model.FrequencyAwareWord word) Returns a list ofAnnotationTokenizerResultfor the givenWord.- Parameters:
word- theWordto analyze- Returns:
- a list of
AnnotationTokenizerResult
-
analyzeInternally
public List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult> analyzeInternally(com.github.szgabsz91.morpher.core.model.FrequencyAwareWord word, boolean guess) Returns a list ofAnnotationTokenizerResultfor the givenWord.- Parameters:
word- theWordto analyzeguess- flag that indicates if unknown words should be analyzed or not- Returns:
- a list of
AnnotationTokenizerResult
-
analyze
public com.github.szgabsz91.morpher.languagehandlers.api.model.LanguageHandlerResponse analyze(com.github.szgabsz91.morpher.core.model.FrequencyAwareWord word) Analyzes the given word, updates the internal state and returns new word pairs.- Specified by:
analyzein interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>- Parameters:
word- the word to analyze- Returns:
- the response that contains a map, where the keys are affix types and values are word pair sets
-
analyze
public com.github.szgabsz91.morpher.languagehandlers.api.model.LanguageHandlerResponse analyze(Set<com.github.szgabsz91.morpher.core.model.FrequencyAwareWord> words) Analyzes the given set of words, updates the internal state and returns new word pairs.- Specified by:
analyzein interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>- Parameters:
words- the words to analyze- Returns:
- the response that contains a map, where the keys are affix types and values are word pair sets
-
getAnalysisCandidates
public List<com.github.szgabsz91.morpher.languagehandlers.api.model.ProbabilisticAffixType> getAnalysisCandidates(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes) Returns the list of affix type candidates for the given affix types, sorted by relative frequency, in descending order.- Specified by:
getAnalysisCandidatesin interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
-
getEndingAnalysisCandidate
public com.github.szgabsz91.morpher.languagehandlers.api.model.ProbabilisticAffixType getEndingAnalysisCandidate(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes) Returns aProbabilisticAffixTypethat has an ending affix type and the conditional probability of ending the given chain.- Specified by:
getEndingAnalysisCandidatein interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
-
getPOSCandidates
public List<com.github.szgabsz91.morpher.languagehandlers.api.model.ProbabilisticAffixType> getPOSCandidates(com.github.szgabsz91.morpher.core.model.Word lemma) Returns the part of speech tag of the given lemma, or an empty optional if the lemma is unknown.- Specified by:
getPOSCandidatesin interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
-
isPOS
public boolean isPOS(com.github.szgabsz91.morpher.core.model.AffixType affixType) Returns if the given affix type is a part of speech tag or not.- Specified by:
isPOSin interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
-
isAffixTypeChainValid
public boolean isAffixTypeChainValid(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypeChain) Returns if the given affix type chain is valid or not.- Specified by:
isAffixTypeChainValidin interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
-
sortAffixTypes
public List<com.github.szgabsz91.morpher.languagehandlers.api.model.AffixTypeChain> sortAffixTypes(com.github.szgabsz91.morpher.core.model.Word lemma, Set<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes) Sorts the given set of affix types based on the previously learnt probabilities.- Specified by:
sortAffixTypesin interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
-
calculateProbabilities
public com.github.szgabsz91.morpher.languagehandlers.api.model.AffixTypeChain calculateProbabilities(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes) Creates a newAffixTypeChaincontaining the given affix types.- Specified by:
calculateProbabilitiesin interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
-
getSupportedAffixTypes
Returns the list of supported affix types.- Specified by:
getSupportedAffixTypesin interfacecom.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
-
toMessage
Converts the state to a message.- Specified by:
toMessagein interfacecom.github.szgabsz91.morpher.core.io.IConvertable<HunmorphLanguageHandlerMessage>- Returns:
- the message
-
fromMessage
Loads the state from the given message.- Specified by:
fromMessagein interfacecom.github.szgabsz91.morpher.core.io.IConvertable<HunmorphLanguageHandlerMessage>- Parameters:
message- the message
-
fromMessage
public void fromMessage(com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException Loads the state from the given message.- Specified by:
fromMessagein interfacecom.github.szgabsz91.morpher.core.io.IConvertable<HunmorphLanguageHandlerMessage>- Parameters:
message- the message- Throws:
com.google.protobuf.InvalidProtocolBufferException- if the provided message is not aHunmorphLanguageHandlerMessage
-
saveTo
Saves the language handler state to the given file.- Specified by:
saveToin interfacecom.github.szgabsz91.morpher.core.io.ISavable- Parameters:
file- the file to save the state to- Throws:
IOException- if the file cannot be written
-
loadFrom
Loads the language handler from the given file.- Specified by:
loadFromin interfacecom.github.szgabsz91.morpher.core.io.ISavable- Parameters:
file- the file with the state of the language handler- Throws:
IOException- if the file cannot be read
-