| Package | Description |
|---|---|
| de.jungblut.crawl | |
| de.jungblut.crawl.extraction |
| Modifier and Type | Class and Description |
|---|---|
class |
ConsoleResultWriter<T extends FetchResult>
Simple class that outputs to console.
|
interface |
Crawler<T extends FetchResult>
Basic Crawler Interface, all implements should implicit give a constructor
with the same arguments like setup and redirect the call to it.
|
class |
FetchResultPersister<T extends FetchResult>
Asynchronous persister thread, taking a resultwriter and handles the logic
behind asynchronous writing to disk or an arbitrary sink implemented by the
ResultWriter. |
class |
FetchThread<T extends FetchResult>
|
class |
MultithreadedCrawler<T extends FetchResult>
Fast multithreaded crawler, will start a fixed threadpool of 32 threads each
will be fed by 10 urls at once.
|
interface |
ResultWriter<T extends FetchResult>
Result writing interface.
|
class |
ResultWriterAdapter<T extends FetchResult>
Empty Adapter class for a
ResultWriter. |
class |
SequenceFileResultWriter<T extends FetchResult>
Writes the result into a sequencefile "files/crawl/result.seq".
|
class |
SequentialCrawler<T extends FetchResult>
Sequential crawler, mainly for debugging or development.
|
| Modifier and Type | Method and Description |
|---|---|
void |
SequenceFileResultWriter.write(FetchResult result) |
| Modifier and Type | Interface and Description |
|---|---|
interface |
Extractor<T extends FetchResult>
Simple extraction logic interface for a site and a result.
|
| Modifier and Type | Class and Description |
|---|---|
static class |
ArticleContentExtrator.ContentFetchResult
Article content fetch result.
|
static class |
HtmlExtrator.HtmlFetchResult
Article content fetch result.
|
| Modifier and Type | Method and Description |
|---|---|
FetchResult |
OutlinkExtractor.extract(String realUrl) |
Copyright © 2016. All rights reserved.