public class CoarseSgmArticleParser extends SgmArticleParser
A coarse parser for sgm-styled news articles
It extracts document # as the key and treats all remaining tags as the body of an article. All tags will be removed and some special characters wil be replaced.
Copyright: Copyright (c) 2005
Company: IST, Drexel University
tagList| Constructor and Description |
|---|
CoarseSgmArticleParser() |
| Modifier and Type | Method and Description |
|---|---|
protected String |
getBodyContent(String rawText,
int start)
One can override this method to exclude some noisy tags.
|
Article |
parse(String content)
Parse a sequence of text into an article
|
assemble, collectTagInformation, extractAbstract, extractBody, extractDate, extractDocNo, extractLength, extractMeta, extractTitle, getAbstractTag, getBodyTag, getDocNoTag, getMetaTag, getTagContent, getTagContent, getTagContent, getTitleTag, removeTagpublic Article parse(String content)
ArticleParserparse in interface ArticleParserparse in class SgmArticleParsercontent - the sequence of textCopyright © 2018 JULIE Lab, Germany. All rights reserved.