Class MutationFinder


  • public class MutationFinder
    extends MutationExtractor
    This is the Java implementation of MutationFinder (original version in Python by J. Gregory Caporaso).
    Version:
    1.0
    Author:
    William A. Baumgartner, Jr.
    william.baumgartner@uchsc.edu
    • Constructor Detail

      • MutationFinder

        public MutationFinder​(String fileName)
        Initialization of MutationFinder requires a set of regular expressions that will be used to detect mutations. This constructor loads the regular expressions from a file designated by the filename input parameter.

        Parameters:
        fileName - Since the original development of MutationFinder was conducted in Python, the input file contains regular expressions that are Python-specific (due to the fact that that Java does not handle explicitly named groups). These regular expressions must therefore be converted prior to use in the Java implementation. This conversion is handled by this constructor.
      • MutationFinder

        public MutationFinder​(File file)
        Initialization of MutationFinder requires a set of regular expressions that will be used to detect mutations. This constructor loads the regular expressions from a file designated by the Java File input parameter.

        Parameters:
        file - Since the original development of MutationFinder was conducted in Python, the input file contains regular expressions that are Python-specific (due to the fact that that Java does not handle explicitly named groups). These regular expressions must therefore be converted prior to use in the Java implementation. This conversion is handled by this constructor.
      • MutationFinder

        public MutationFinder​(Set<String> unprocessed_python_regexes)
        Initialization of MutationFinder requires a set of regular expressions that will be used to detect mutations. This constructor loads the regular expressions from a Set of Strings representing the regular expressions.

        Parameters:
        unprocessed_python_regexes - Since the original development of MutationFinder was conducted in Python, the set of regular expressions used is Python-specific in that Java does not handle explicitly named groups. These regular expressions must therefore be converted prior to use in the Java implementation. This conversion is handled by this constructor.
    • Method Detail

      • extractMappingsFromPythonRegex

        public static Map<String,​Integer> extractMappingsFromPythonRegex​(String pythonRegex)
      • removeTagsFromPythonRegex

        public static String removeTagsFromPythonRegex​(String regexStr)
      • main

        public static void main​(String[] args)
        The main method demonstrates the execution of MutationFinder. Three input arguments are required, the regular expression file used by MutationFinder, an input file to process, and output file to write the generated results.

        The input file in this case contains one document per line, where each line takes the format:

        documentIDdocumentText

        The output file will contain the mutations found for each document (one document per line) where each line takes the format:

        documentIDmutationmutationmutation...
        Parameters:
        args - args[0] - the regular expression file
        args[1] - the input file containing text to process
        args[2] - the output file
      • extractMutations

        public Map<Mutation,​Set<int[]>> extractMutations​(String rawText)
                                                        throws MutationException
        Extract point mutations mentions from raw_text and return them in a map.

        The result of this method is a mapping of PointMutation objects to a set of spans (int arrays of size 2) where they were identified. Spans are presented in the form of character-offsets in text.

        Example result:
        raw_text: 'We constructed A42G and L22G, and crystalized A42G.'
        result = {PointMutation(42,'A','G'):[(15,19),(46,50)],
        PointMutation(22,'L','G'):[(24,28)]}

        Note that the spans won't necessarily be in increasing order, due to the order of processing regular expressions.

        Specified by:
        extractMutations in class MutationExtractor
        Parameters:
        rawText - the text to be processed
        Returns:
        Throws:
        MutationException