Class OptimaizeLangDetector


  • public class OptimaizeLangDetector
    extends org.apache.tika.language.detect.LanguageDetector
    Implementation of the LanguageDetector API that uses https://github.com/optimaize/language-detector
    • Field Detail

      • DEFAULT_MAX_CHARS_FOR_DETECTION

        public static final int DEFAULT_MAX_CHARS_FOR_DETECTION
        See Also:
        Constant Field Values
      • DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION

        public static final int DEFAULT_MAX_CHARS_FOR_SHORT_DETECTION
        See Also:
        Constant Field Values
    • Constructor Detail

      • OptimaizeLangDetector

        public OptimaizeLangDetector()
      • OptimaizeLangDetector

        public OptimaizeLangDetector​(int maxCharsForDetection)
    • Method Detail

      • loadModels

        public org.apache.tika.language.detect.LanguageDetector loadModels()
        Specified by:
        loadModels in class org.apache.tika.language.detect.LanguageDetector
      • loadModels

        public org.apache.tika.language.detect.LanguageDetector loadModels​(Set<String> languages)
                                                                    throws IOException
        Specified by:
        loadModels in class org.apache.tika.language.detect.LanguageDetector
        Throws:
        IOException
      • hasModel

        public boolean hasModel​(String language)
        Specified by:
        hasModel in class org.apache.tika.language.detect.LanguageDetector
      • setPriors

        public org.apache.tika.language.detect.LanguageDetector setPriors​(Map<String,​Float> languageProbabilities)
                                                                   throws IOException
        Specified by:
        setPriors in class org.apache.tika.language.detect.LanguageDetector
        Throws:
        IOException
      • reset

        public void reset()
        Specified by:
        reset in class org.apache.tika.language.detect.LanguageDetector
      • addText

        public void addText​(char[] cbuf,
                            int off,
                            int len)
        Specified by:
        addText in class org.apache.tika.language.detect.LanguageDetector
      • detectAll

        public List<org.apache.tika.language.detect.LanguageResult> detectAll()
        Specified by:
        detectAll in class org.apache.tika.language.detect.LanguageDetector
        Returns:
        the detected list of languages
        Throws:
        IllegalStateException - if no models have been loaded with loadModels() or loadModels(java.util.Set)
      • hasEnoughText

        public boolean hasEnoughText()
        Overrides:
        hasEnoughText in class org.apache.tika.language.detect.LanguageDetector