Class GeneMentionFilter


  • public class GeneMentionFilter
    extends Object
    • Constructor Detail

      • GeneMentionFilter

        public GeneMentionFilter()
    • Method Detail

      • preFilter

        public static void preFilter​(de.julielab.geneexpbase.genemodel.GeneDocument document)
      • intersectionFilter

        public static void intersectionFilter​(de.julielab.geneexpbase.genemodel.GeneDocument document,
                                              de.julielab.geneexpbase.TermNormalizer normalizer,
                                              boolean onlySortOutEmptyCandidateLists)
                                       throws IOException

        Must be done after all genes have received their candidates. Preliminary experiments: This does a bit good for precision, a bit more bad for recall. Overall F-score on IGN train is a bit less with this on.

        A variant is to perform the filtering, but in the end, only reject those mentions whose candidate list is left completely empty and leave all other lists untouched. This helps a tiny bit on IGN train.

        Parameters:
        document - The gene document.
        normalizer - The used term normalizer from the mapping core or disambiguation instance.
        onlySortOutEmptyCandidateLists - Use the second strategy depicted above.
        Throws:
        IOException