public class WebPageDateExtractor extends Object
| Modifier and Type | Class and Description |
|---|---|
static class |
WebPageDateExtractor.DateSource |
static class |
WebPageDateExtractor.ExtractionException |
static class |
WebPageDateExtractor.WebPageDate |
| Constructor and Description |
|---|
WebPageDateExtractor() |
| Modifier and Type | Method and Description |
|---|---|
static WebPageDateExtractor.WebPageDate |
extractModifiedDate(org.jsoup.nodes.Document dom)
Extract the likely modification date from a parsed document.
|
static WebPageDateExtractor.WebPageDate |
getModifiedDate(String url,
org.jsoup.nodes.Document document,
Long httpModifiedTime,
org.apache.hadoop.mapreduce.Mapper.Context context) |
public static WebPageDateExtractor.WebPageDate extractModifiedDate(org.jsoup.nodes.Document dom) throws InterruptedException
dom - DOM tree of an HTML documentInterruptedExceptionpublic static WebPageDateExtractor.WebPageDate getModifiedDate(String url, org.jsoup.nodes.Document document, Long httpModifiedTime, org.apache.hadoop.mapreduce.Mapper.Context context) throws InterruptedException
InterruptedExceptionCopyright © 2017. All rights reserved.