java.lang.Object
com.github.szgabsz91.morpher.languagehandlers.hunmorph.impl.HunmorphLanguageHandler
All Implemented Interfaces:
com.github.szgabsz91.morpher.core.io.IConvertable<HunmorphLanguageHandlerMessage>, com.github.szgabsz91.morpher.core.io.ISavable, com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>, AutoCloseable

@Qualifier("com.github.szgabsz91.morpher.languagehandlers.hunmorph.HunmorphLanguageHandler") public class HunmorphLanguageHandler extends Object implements com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
Hunmorph-Ocamorph based language handler implementation.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Default constructor that initializes the language handler state.
  • Method Summary

    Modifier and Type
    Method
    Description
    com.github.szgabsz91.morpher.languagehandlers.api.model.LanguageHandlerResponse
    analyze(com.github.szgabsz91.morpher.core.model.FrequencyAwareWord word)
    Analyzes the given word, updates the internal state and returns new word pairs.
    com.github.szgabsz91.morpher.languagehandlers.api.model.LanguageHandlerResponse
    analyze(Set<com.github.szgabsz91.morpher.core.model.FrequencyAwareWord> words)
    Analyzes the given set of words, updates the internal state and returns new word pairs.
    List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>
    analyzeInternally(com.github.szgabsz91.morpher.core.model.FrequencyAwareWord word)
    Returns a list of AnnotationTokenizerResult for the given Word.
    List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>
    analyzeInternally(com.github.szgabsz91.morpher.core.model.FrequencyAwareWord word, boolean guess)
    Returns a list of AnnotationTokenizerResult for the given Word.
    com.github.szgabsz91.morpher.languagehandlers.api.model.AffixTypeChain
    calculateProbabilities(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes)
    Creates a new AffixTypeChain containing the given affix types.
    void
    Closes the internal HunmorphWordProcessor instance.
    void
    Loads the state from the given message.
    void
    fromMessage(com.google.protobuf.Any message)
    Loads the state from the given message.
    List<com.github.szgabsz91.morpher.languagehandlers.api.model.ProbabilisticAffixType>
    getAnalysisCandidates(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes)
    Returns the list of affix type candidates for the given affix types, sorted by relative frequency, in descending order.
    Map<String,List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>>
    Returns the map of annotation tokenizer results.
    com.github.szgabsz91.morpher.languagehandlers.api.model.ProbabilisticAffixType
    getEndingAnalysisCandidate(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes)
    Returns a ProbabilisticAffixType that has an ending affix type and the conditional probability of ending the given chain.
    Map<String,Set<com.github.szgabsz91.morpher.core.model.AffixType>>
    Returns the map of lemmas.
    Returns the underlying IMarkovModel instance to use for inflection.
    List<com.github.szgabsz91.morpher.languagehandlers.api.model.ProbabilisticAffixType>
    getPOSCandidates(com.github.szgabsz91.morpher.core.model.Word lemma)
    Returns the part of speech tag of the given lemma, or an empty optional if the lemma is unknown.
    Returns the underlying IMarkovModel instance to use for morphological analysis.
    List<com.github.szgabsz91.morpher.core.model.AffixType>
    Returns the list of supported affix types.
    boolean
    isAffixTypeChainValid(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypeChain)
    Returns if the given affix type chain is valid or not.
    boolean
    isPOS(com.github.szgabsz91.morpher.core.model.AffixType affixType)
    Returns if the given affix type is a part of speech tag or not.
    void
    learnAffixTypeChains(Set<List<com.github.szgabsz91.morpher.core.model.AffixType>> affixTypeChains)
    Learns the given AffixType chains.
    void
    learnAnnotationTokenizerResults(Map<String,List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>> annotationTokenizerResults)
    Learns the given AnnotationTokenizerResults.
    void
    learnLemmas(com.github.szgabsz91.morpher.languagehandlers.api.model.LemmaMap lemmaMap)
    Adds the words in the given map to the set of known lemmas.
    void
    loadFrom(Path file)
    Loads the language handler from the given file.
    void
    saveTo(Path file)
    Saves the language handler state to the given file.
    void
    setAnnotationTokenizerResultMap(Map<String,List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>> annotationTokenizerResultMap)
    Sets the map of annotation tokenizer results.
    void
    setLemmaMap(Map<String,Set<com.github.szgabsz91.morpher.core.model.AffixType>> lemmaMap)
    Sets the map of lemmas.
    void
    Sets the underlying IMarkovModel instance to use for inflection.
    void
    setReversedMarkovModel(IMarkovModel reversedMarkovModel)
    Sets the underlying IMarkovModel instance to use for morphological analysis.
    List<com.github.szgabsz91.morpher.languagehandlers.api.model.AffixTypeChain>
    sortAffixTypes(com.github.szgabsz91.morpher.core.model.Word lemma, Set<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes)
    Sorts the given set of affix types based on the previously learnt probabilities.
    Converts the state to a message.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • HunmorphLanguageHandler

      public HunmorphLanguageHandler()
      Default constructor that initializes the language handler state.
  • Method Details

    • getAnnotationTokenizerResultMap

      public Map<String,List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>> getAnnotationTokenizerResultMap()
      Returns the map of annotation tokenizer results.
      Returns:
      the map of annotation tokenizer results
    • setAnnotationTokenizerResultMap

      public void setAnnotationTokenizerResultMap(Map<String,List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>> annotationTokenizerResultMap)
      Sets the map of annotation tokenizer results.
      Parameters:
      annotationTokenizerResultMap - the map of annotation tokenizer results
    • getMarkovModel

      public IMarkovModel getMarkovModel()
      Returns the underlying IMarkovModel instance to use for inflection.
      Returns:
      the underlying IMarkovModel instance to use for inflection
    • setMarkovModel

      public void setMarkovModel(IMarkovModel markovModel)
      Sets the underlying IMarkovModel instance to use for inflection.
      Parameters:
      markovModel - the underlying IMarkovModel instance to use for inflection
    • getReversedMarkovModel

      public IMarkovModel getReversedMarkovModel()
      Returns the underlying IMarkovModel instance to use for morphological analysis.
      Returns:
      the underlying IMarkovModel instance to use for morphological analysis
    • setReversedMarkovModel

      public void setReversedMarkovModel(IMarkovModel reversedMarkovModel)
      Sets the underlying IMarkovModel instance to use for morphological analysis.
      Parameters:
      reversedMarkovModel - the underlying IMarkovModel instance to use for morphological analysis
    • getLemmaMap

      public Map<String,Set<com.github.szgabsz91.morpher.core.model.AffixType>> getLemmaMap()
      Returns the map of lemmas.
      Returns:
      the map of lemmas
    • setLemmaMap

      public void setLemmaMap(Map<String,Set<com.github.szgabsz91.morpher.core.model.AffixType>> lemmaMap)
      Sets the map of lemmas.
      Parameters:
      lemmaMap - the map of lemmas
    • close

      public void close()
      Closes the internal HunmorphWordProcessor instance.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
    • learnAnnotationTokenizerResults

      public void learnAnnotationTokenizerResults(Map<String,List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult>> annotationTokenizerResults)
      Learns the given AnnotationTokenizerResults.
      Specified by:
      learnAnnotationTokenizerResults in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
      Parameters:
      annotationTokenizerResults - the map of AnnotationTokenizerResults for each affix type
    • learnAffixTypeChains

      public void learnAffixTypeChains(Set<List<com.github.szgabsz91.morpher.core.model.AffixType>> affixTypeChains)
      Learns the given AffixType chains.
      Specified by:
      learnAffixTypeChains in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
    • learnLemmas

      public void learnLemmas(com.github.szgabsz91.morpher.languagehandlers.api.model.LemmaMap lemmaMap)
      Adds the words in the given map to the set of known lemmas.
      Specified by:
      learnLemmas in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
      Parameters:
      lemmaMap - map containing the lemmas and their part of speech tags
    • analyzeInternally

      public List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult> analyzeInternally(com.github.szgabsz91.morpher.core.model.FrequencyAwareWord word)
      Returns a list of AnnotationTokenizerResult for the given Word.
      Parameters:
      word - the Word to analyze
      Returns:
      a list of AnnotationTokenizerResult
    • analyzeInternally

      public List<com.github.szgabsz91.morpher.languagehandlers.api.model.AnnotationTokenizerResult> analyzeInternally(com.github.szgabsz91.morpher.core.model.FrequencyAwareWord word, boolean guess)
      Returns a list of AnnotationTokenizerResult for the given Word.
      Parameters:
      word - the Word to analyze
      guess - flag that indicates if unknown words should be analyzed or not
      Returns:
      a list of AnnotationTokenizerResult
    • analyze

      public com.github.szgabsz91.morpher.languagehandlers.api.model.LanguageHandlerResponse analyze(com.github.szgabsz91.morpher.core.model.FrequencyAwareWord word)
      Analyzes the given word, updates the internal state and returns new word pairs.
      Specified by:
      analyze in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
      Parameters:
      word - the word to analyze
      Returns:
      the response that contains a map, where the keys are affix types and values are word pair sets
    • analyze

      public com.github.szgabsz91.morpher.languagehandlers.api.model.LanguageHandlerResponse analyze(Set<com.github.szgabsz91.morpher.core.model.FrequencyAwareWord> words)
      Analyzes the given set of words, updates the internal state and returns new word pairs.
      Specified by:
      analyze in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
      Parameters:
      words - the words to analyze
      Returns:
      the response that contains a map, where the keys are affix types and values are word pair sets
    • getAnalysisCandidates

      public List<com.github.szgabsz91.morpher.languagehandlers.api.model.ProbabilisticAffixType> getAnalysisCandidates(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes)
      Returns the list of affix type candidates for the given affix types, sorted by relative frequency, in descending order.
      Specified by:
      getAnalysisCandidates in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
    • getEndingAnalysisCandidate

      public com.github.szgabsz91.morpher.languagehandlers.api.model.ProbabilisticAffixType getEndingAnalysisCandidate(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes)
      Returns a ProbabilisticAffixType that has an ending affix type and the conditional probability of ending the given chain.
      Specified by:
      getEndingAnalysisCandidate in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
    • getPOSCandidates

      public List<com.github.szgabsz91.morpher.languagehandlers.api.model.ProbabilisticAffixType> getPOSCandidates(com.github.szgabsz91.morpher.core.model.Word lemma)
      Returns the part of speech tag of the given lemma, or an empty optional if the lemma is unknown.
      Specified by:
      getPOSCandidates in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
    • isPOS

      public boolean isPOS(com.github.szgabsz91.morpher.core.model.AffixType affixType)
      Returns if the given affix type is a part of speech tag or not.
      Specified by:
      isPOS in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
    • isAffixTypeChainValid

      public boolean isAffixTypeChainValid(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypeChain)
      Returns if the given affix type chain is valid or not.
      Specified by:
      isAffixTypeChainValid in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
    • sortAffixTypes

      public List<com.github.szgabsz91.morpher.languagehandlers.api.model.AffixTypeChain> sortAffixTypes(com.github.szgabsz91.morpher.core.model.Word lemma, Set<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes)
      Sorts the given set of affix types based on the previously learnt probabilities.
      Specified by:
      sortAffixTypes in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
    • calculateProbabilities

      public com.github.szgabsz91.morpher.languagehandlers.api.model.AffixTypeChain calculateProbabilities(List<com.github.szgabsz91.morpher.core.model.AffixType> affixTypes)
      Creates a new AffixTypeChain containing the given affix types.
      Specified by:
      calculateProbabilities in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
    • getSupportedAffixTypes

      public List<com.github.szgabsz91.morpher.core.model.AffixType> getSupportedAffixTypes()
      Returns the list of supported affix types.
      Specified by:
      getSupportedAffixTypes in interface com.github.szgabsz91.morpher.languagehandlers.api.ILanguageHandler<HunmorphLanguageHandlerMessage>
    • toMessage

      public HunmorphLanguageHandlerMessage toMessage()
      Converts the state to a message.
      Specified by:
      toMessage in interface com.github.szgabsz91.morpher.core.io.IConvertable<HunmorphLanguageHandlerMessage>
      Returns:
      the message
    • fromMessage

      public void fromMessage(HunmorphLanguageHandlerMessage message)
      Loads the state from the given message.
      Specified by:
      fromMessage in interface com.github.szgabsz91.morpher.core.io.IConvertable<HunmorphLanguageHandlerMessage>
      Parameters:
      message - the message
    • fromMessage

      public void fromMessage(com.google.protobuf.Any message) throws com.google.protobuf.InvalidProtocolBufferException
      Loads the state from the given message.
      Specified by:
      fromMessage in interface com.github.szgabsz91.morpher.core.io.IConvertable<HunmorphLanguageHandlerMessage>
      Parameters:
      message - the message
      Throws:
      com.google.protobuf.InvalidProtocolBufferException - if the provided message is not a HunmorphLanguageHandlerMessage
    • saveTo

      public void saveTo(Path file) throws IOException
      Saves the language handler state to the given file.
      Specified by:
      saveTo in interface com.github.szgabsz91.morpher.core.io.ISavable
      Parameters:
      file - the file to save the state to
      Throws:
      IOException - if the file cannot be written
    • loadFrom

      public void loadFrom(Path file) throws IOException
      Loads the language handler from the given file.
      Specified by:
      loadFrom in interface com.github.szgabsz91.morpher.core.io.ISavable
      Parameters:
      file - the file with the state of the language handler
      Throws:
      IOException - if the file cannot be read