Package edu.uchsc.ccp.nlp.ei.mutation
Class MutationFinder
- java.lang.Object
-
- edu.uchsc.ccp.nlp.ei.mutation.MutationExtractor
-
- edu.uchsc.ccp.nlp.ei.mutation.MutationFinder
-
public class MutationFinder extends MutationExtractor
This is the Java implementation of MutationFinder (original version in Python by J. Gregory Caporaso).- Version:
- 1.0
- Author:
- William A. Baumgartner, Jr.
william.baumgartner@uchsc.edu
-
-
Constructor Summary
Constructors Constructor Description MutationFinder(File file)Initialization of MutationFinder requires a set of regular expressions that will be used to detect mutations.MutationFinder(InputStream is)MutationFinder(String fileName)Initialization of MutationFinder requires a set of regular expressions that will be used to detect mutations.MutationFinder(Set<String> unprocessed_python_regexes)Initialization of MutationFinder requires a set of regular expressions that will be used to detect mutations.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static Map<String,Integer>extractMappingsFromPythonRegex(String pythonRegex)Map<Mutation,Set<int[]>>extractMutations(String rawText)Extract point mutations mentions from raw_text and return them in a map.static voidmain(String[] args)The main method demonstrates the execution of MutationFinder.static StringremoveTagsFromPythonRegex(String regexStr)-
Methods inherited from class edu.uchsc.ccp.nlp.ei.mutation.MutationExtractor
error, populateAminoAcidNameToOneLookupMap, populateAminoAcidThreeToOneLookupMap, warn
-
-
-
-
Field Detail
-
MUT_RES
protected static final String MUT_RES
- See Also:
- Constant Field Values
-
WT_RES
protected static final String WT_RES
- See Also:
- Constant Field Values
-
POS
protected static final String POS
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
MutationFinder
public MutationFinder(String fileName)
Initialization of MutationFinder requires a set of regular expressions that will be used to detect mutations. This constructor loads the regular expressions from a file designated by the filename input parameter.
- Parameters:
fileName- Since the original development of MutationFinder was conducted in Python, the input file contains regular expressions that are Python-specific (due to the fact that that Java does not handle explicitly named groups). These regular expressions must therefore be converted prior to use in the Java implementation. This conversion is handled by this constructor.
-
MutationFinder
public MutationFinder(InputStream is) throws IOException
- Throws:
IOException
-
MutationFinder
public MutationFinder(File file)
Initialization of MutationFinder requires a set of regular expressions that will be used to detect mutations. This constructor loads the regular expressions from a file designated by the Java File input parameter.
- Parameters:
file- Since the original development of MutationFinder was conducted in Python, the input file contains regular expressions that are Python-specific (due to the fact that that Java does not handle explicitly named groups). These regular expressions must therefore be converted prior to use in the Java implementation. This conversion is handled by this constructor.
-
MutationFinder
public MutationFinder(Set<String> unprocessed_python_regexes)
Initialization of MutationFinder requires a set of regular expressions that will be used to detect mutations. This constructor loads the regular expressions from a Set of Strings representing the regular expressions.
- Parameters:
unprocessed_python_regexes- Since the original development of MutationFinder was conducted in Python, the set of regular expressions used is Python-specific in that Java does not handle explicitly named groups. These regular expressions must therefore be converted prior to use in the Java implementation. This conversion is handled by this constructor.
-
-
Method Detail
-
extractMappingsFromPythonRegex
public static Map<String,Integer> extractMappingsFromPythonRegex(String pythonRegex)
-
main
public static void main(String[] args)
The main method demonstrates the execution of MutationFinder. Three input arguments are required, the regular expression file used by MutationFinder, an input file to process, and output file to write the generated results.
The input file in this case contains one document per line, where each line takes the format:
documentIDdocumentText
The output file will contain the mutations found for each document (one document per line) where each line takes the format:
documentIDmutation mutation mutation... - Parameters:
args- args[0] - the regular expression file
args[1] - the input file containing text to process
args[2] - the output file
-
extractMutations
public Map<Mutation,Set<int[]>> extractMutations(String rawText) throws MutationException
Extract point mutations mentions from raw_text and return them in a map.The result of this method is a mapping of PointMutation objects to a set of spans (int arrays of size 2) where they were identified. Spans are presented in the form of character-offsets in text.
Example result:
raw_text: 'We constructed A42G and L22G, and crystalized A42G.'
result = {PointMutation(42,'A','G'):[(15,19),(46,50)],
PointMutation(22,'L','G'):[(24,28)]}
Note that the spans won't necessarily be in increasing order, due to the order of processing regular expressions.
- Specified by:
extractMutationsin classMutationExtractor- Parameters:
rawText- the text to be processed- Returns:
- Throws:
MutationException
-
-