public class SgmArticleParser extends Object implements ArticleParser
sgm-styled news article parser
Copyright: Copyright (c) 2005
Company: IST, Drexel University
| Modifier and Type | Field and Description |
|---|---|
protected SortedArray |
tagList |
| Constructor and Description |
|---|
SgmArticleParser() |
| Modifier and Type | Method and Description |
|---|---|
String |
assemble(Article article)
Assemble an article into a sequence of text which could be saved in files for future use.
|
protected SortedArray |
collectTagInformation(String content) |
protected String |
extractAbstract(String rawText) |
protected String |
extractBody(String rawText) |
protected Date |
extractDate(String rawText) |
protected String |
extractDocNo(String rawText) |
protected int |
extractLength(String rawText) |
protected String |
extractMeta(String rawText) |
protected String |
extractTitle(String rawText) |
protected Token |
getAbstractTag() |
protected Token |
getBodyTag() |
protected Token |
getDocNoTag() |
protected Token |
getMetaTag() |
protected String |
getTagContent(String rawText,
String tagName,
boolean preprocess) |
protected int |
getTagContent(String content,
String tag,
int start,
StringBuffer out) |
protected String |
getTagContent(String rawText,
Token tag,
boolean preprocess) |
protected Token |
getTitleTag() |
Article |
parse(String content)
Parse a sequence of text into an article
|
protected String |
removeTag(String content) |
protected SortedArray tagList
public String assemble(Article article)
ArticleParserassemble in interface ArticleParserarticle - the article for assemblingpublic Article parse(String content)
ArticleParserparse in interface ArticleParsercontent - the sequence of textprotected int extractLength(String rawText)
protected Token getDocNoTag()
protected Token getTitleTag()
protected Token getAbstractTag()
protected Token getMetaTag()
protected Token getBodyTag()
protected int getTagContent(String content, String tag, int start, StringBuffer out)
protected SortedArray collectTagInformation(String content)
Copyright © 2018 JULIE Lab, Germany. All rights reserved.