public final class MultithreadedCrawler<T extends FetchResult> extends Object implements Crawler<T>
| Constructor and Description |
|---|
MultithreadedCrawler(int fetches,
Extractor<T> extractor,
ResultWriter<T> writer)
Constructs a new Multithreaded Crawler with 32 threads working on 10 url
batches at each time.
|
MultithreadedCrawler(int threadPoolSize,
int batchSize,
int fetches,
Extractor<T> extractor,
ResultWriter<T> writer)
Constructs a new Multithreaded Crawler.
|
| Modifier and Type | Method and Description |
|---|---|
static void |
main(String[] args) |
void |
process(String... seedUrls)
Starts the crawler, starting by the seedURL.
|
void |
setup(int fetches,
Extractor<T> extractor,
ResultWriter<T> writer)
Setups this crawler.
|
public MultithreadedCrawler(int threadPoolSize,
int batchSize,
int fetches,
Extractor<T> extractor,
ResultWriter<T> writer)
throws IOException
threadPoolSize - the number of threads to use.batchSize - the number of URLs a batch for a thread should contain.fetches - the number of urls to fetch.extractor - the extraction logic.writer - the writer.IOExceptionpublic MultithreadedCrawler(int fetches,
Extractor<T> extractor,
ResultWriter<T> writer)
throws IOException
fetches - the number of urls to fetch.extractor - the extraction logic.writer - the writer.IOExceptionpublic final void setup(int fetches,
Extractor<T> extractor,
ResultWriter<T> writer)
throws IOException
Crawlersetup in interface Crawler<T extends FetchResult>fetches - how many maximum fetches it should do.extractor - the given Extractor to extract a
FetchResult.writer - the ResultWriter to write the result to a sink.IOExceptionpublic final void process(String... seedUrls) throws InterruptedException, ExecutionException
Crawlerprocess in interface Crawler<T extends FetchResult>InterruptedExceptionExecutionExceptionpublic static void main(String[] args) throws InterruptedException, ExecutionException, IOException
Copyright © 2016. All rights reserved.