Package de.jungblut.crawl.extraction
Interface Extractor<T extends FetchResult>
-
- All Known Implementing Classes:
ArticleContentExtrator,HtmlExtrator,OutlinkExtractor
public interface Extractor<T extends FetchResult>Simple extraction logic interface for a site and a result.- Author:
- thomas.jungblut
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description Textract(java.lang.String site)Extracts from a given URL all the content needed and return it.
-
-
-
Method Detail
-
extract
T extract(java.lang.String site)
Extracts from a given URL all the content needed and return it. Null if nothing should be returned or could be parsed.
-
-