| Package | Description |
|---|---|
| de.jungblut.crawl | |
| de.jungblut.crawl.extraction |
| Modifier and Type | Method and Description |
|---|---|
void |
SequentialCrawler.setup(int fetches,
Extractor<T> extractor,
ResultWriter<T> writer) |
void |
MultithreadedCrawler.setup(int fetches,
Extractor<T> extractor,
ResultWriter<T> writer) |
void |
Crawler.setup(int fetches,
Extractor<T> extractor,
ResultWriter<T> writer)
Setups this crawler.
|
| Constructor and Description |
|---|
FetchThread(List<String> url,
Extractor<T> extractor) |
MultithreadedCrawler(int fetches,
Extractor<T> extractor,
ResultWriter<T> writer)
Constructs a new Multithreaded Crawler with 32 threads working on 10 url
batches at each time.
|
MultithreadedCrawler(int threadPoolSize,
int batchSize,
int fetches,
Extractor<T> extractor,
ResultWriter<T> writer)
Constructs a new Multithreaded Crawler.
|
SequentialCrawler(int fetches,
Extractor<T> extractor,
ResultWriter<T> writer) |
| Modifier and Type | Class and Description |
|---|---|
class |
ArticleContentExtrator
Extractor for news articles.
|
class |
HtmlExtrator
Extractor for raw html.
|
class |
OutlinkExtractor
Outlink extractor, parses a page just for its outlinks.
|
Copyright © 2016. All rights reserved.