| Interface | Description |
|---|---|
| Crawler<T extends FetchResult> |
Basic Crawler Interface, all implements should implicit give a constructor
with the same arguments like setup and redirect the call to it.
|
| ResultWriter<T extends FetchResult> |
Result writing interface.
|
| Class | Description |
|---|---|
| ConsoleResultWriter<T extends FetchResult> |
Simple class that outputs to console.
|
| FetchResult |
Fetch Result class, contains the origin url and its outlinks for further
crawling.
|
| FetchResultPersister<T extends FetchResult> |
Asynchronous persister thread, taking a resultwriter and handles the logic
behind asynchronous writing to disk or an arbitrary sink implemented by the
ResultWriter. |
| FetchThread<T extends FetchResult> | |
| MultithreadedCrawler<T extends FetchResult> |
Fast multithreaded crawler, will start a fixed threadpool of 32 threads each
will be fed by 10 urls at once.
|
| ResultWriterAdapter<T extends FetchResult> |
Empty Adapter class for a
ResultWriter. |
| SequenceFileResultWriter<T extends FetchResult> |
Writes the result into a sequencefile "files/crawl/result.seq".
|
| SequentialCrawler<T extends FetchResult> |
Sequential crawler, mainly for debugging or development.
|
Copyright © 2016. All rights reserved.