public class OOSpider<T> extends Spider
@TargetUrl("http://my.oschina.net/flashsword/blog/\\d+")
public class OschinaBlog{
@ExtractBy("//title")
private String title;
@ExtractBy(value = "div.BlogContent",type = ExtractBy.Type.Css)
private String content;
@ExtractBy(value = "//div[@class='BlogTags']/a/text()", multi = true)
private List<String> tags;
}
And start the spider by:
OOSpider.create(Site.me().addStartUrl("http://my.oschina.net/flashsword/blog")
,new JsonFilePageModelPipeline(), OschinaBlog.class).run();
}
Spider.Status| 限定符和类型 | 字段和说明 |
|---|---|
private ModelPageProcessor |
modelPageProcessor |
private ModelPipeline |
modelPipeline |
private List<Class> |
pageModelClasses |
private PageModelPipeline |
pageModelPipeline |
destroyWhenExit, downloader, executorService, exitWhenComplete, logger, pageProcessor, pipelines, scheduler, site, spawnUrl, startRequests, stat, STAT_INIT, STAT_RUNNING, STAT_STOPPED, threadNum, threadPool, uuid| 限定符 | 构造器和说明 |
|---|---|
protected |
OOSpider(ModelPageProcessor modelPageProcessor) |
|
OOSpider(PageProcessor pageProcessor) |
|
OOSpider(Site site,
PageModelPipeline pageModelPipeline,
Class... pageModels)
create a spider
|
| 限定符和类型 | 方法和说明 |
|---|---|
OOSpider |
addPageModel(PageModelPipeline pageModelPipeline,
Class... pageModels) |
static OOSpider |
create(Site site,
Class... pageModels) |
static OOSpider |
create(Site site,
PageModelPipeline pageModelPipeline,
Class... pageModels) |
protected CollectorPipeline |
getCollectorPipeline() |
OOSpider |
setIsExtractLinks(boolean isExtractLinks) |
addPipeline, addRequest, addUrl, checkIfRunning, clearPipeline, close, create, downloader, extractAndAddRequests, get, getAll, getPageCount, getScheduler, getSite, getSpiderListeners, getStartTime, getStatus, getThreadAlive, getUUID, initComponent, isExitWhenComplete, isSpawnUrl, onError, onSuccess, pipeline, run, runAsync, scheduler, setDownloader, setEmptySleepTime, setExecutorService, setExitWhenComplete, setPipelines, setScheduler, setSpawnUrl, setSpiderListeners, setUUID, sleep, start, startRequest, startUrls, stop, test, thread, threadprivate ModelPageProcessor modelPageProcessor
private ModelPipeline modelPipeline
private PageModelPipeline pageModelPipeline
protected OOSpider(ModelPageProcessor modelPageProcessor)
public OOSpider(PageProcessor pageProcessor)
public OOSpider(Site site, PageModelPipeline pageModelPipeline, Class... pageModels)
site - sitepageModelPipeline - pageModelPipelinepageModels - pageModelsprotected CollectorPipeline getCollectorPipeline()
getCollectorPipeline 在类中 Spiderpublic static OOSpider create(Site site, PageModelPipeline pageModelPipeline, Class... pageModels)
public OOSpider addPageModel(PageModelPipeline pageModelPipeline, Class... pageModels)
public OOSpider setIsExtractLinks(boolean isExtractLinks)
Copyright © 2021. All rights reserved.