public class EngDocumentParser extends Object implements DocumentParser
Document Parser for English Text
Copyright: Copyright (c) 2005
Company: IST, Drexel University
| Modifier and Type | Field and Description |
|---|---|
static String |
defParaDelimitor |
static String |
defSentDelimitor |
static String |
defWordDelimitor |
protected String |
paraDelimitor |
static String |
punctuations |
protected String |
sentDelimitor |
protected String |
wordDelimitor |
| Constructor and Description |
|---|
EngDocumentParser() |
EngDocumentParser(String wordDelimitor) |
| Modifier and Type | Method and Description |
|---|---|
protected int |
isApostrophesAsWord(int apoPos,
int startPos,
String context) |
protected boolean |
isNumber(String str) |
protected boolean |
isPeriodAsToken(int periodPos,
int startPos,
String context) |
protected boolean |
isPeriodAsWord(int periodPos,
int startPos,
String context) |
protected boolean |
isSentencePeriod(int pos,
String context) |
Document |
parse(String doc)
Parses a text into a Document object
|
Paragraph |
parseParagraph(String paragraph)
Parses a text into a Paragraph object
|
Sentence |
parseSentence(String sentence)
Parses a text into a Sentence object
|
ArrayList |
parseTokens(String content)
Prases a text into a sequence of tokens.
|
protected Word |
parseWord(String content) |
public static final String defParaDelimitor
public static final String defSentDelimitor
public static final String defWordDelimitor
public static final String punctuations
protected String wordDelimitor
protected String paraDelimitor
protected String sentDelimitor
public EngDocumentParser()
public EngDocumentParser(String wordDelimitor)
public Document parse(String doc)
DocumentParserparse in interface DocumentParserdoc - the text for parsingpublic Paragraph parseParagraph(String paragraph)
DocumentParserparseParagraph in interface DocumentParserparagraph - the text for parsingpublic Sentence parseSentence(String sentence)
DocumentParserparseSentence in interface DocumentParsersentence - the text for parsingpublic ArrayList parseTokens(String content)
DocumentParserparseTokens in interface DocumentParsercontent - the text for parsingprotected boolean isPeriodAsWord(int periodPos,
int startPos,
String context)
protected boolean isPeriodAsToken(int periodPos,
int startPos,
String context)
protected int isApostrophesAsWord(int apoPos,
int startPos,
String context)
protected boolean isSentencePeriod(int pos,
String context)
protected boolean isNumber(String str)
Copyright © 2018 JULIE Lab, Germany. All rights reserved.