public class PhantomJSDownloader extends AbstractDownloader
| 限定符和类型 | 字段和说明 |
|---|---|
private static String |
crawlJsPath |
private static org.slf4j.Logger |
logger |
private static String |
phantomJsCommand |
private int |
retryNum |
private int |
threadNum |
| 构造器和说明 |
|---|
PhantomJSDownloader() |
PhantomJSDownloader(String phantomJsCommand)
添加新的构造函数,支持phantomjs自定义命令
example:
phantomjs.exe 支持windows环境
phantomjs --ignore-ssl-errors=yes 忽略抓取地址是https时的一些错误
/usr/local/bin/phantomjs 命令的绝对路径,避免因系统环境变量引起的IOException
|
PhantomJSDownloader(String phantomJsCommand,
String crawlJsPath)
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.js
crawl.js start --
var system = require('system');
var url = system.args[1];
var page = require('webpage').create();
page.settings.loadImages = false;
page.settings.resourceTimeout = 5000;
page.open(url, function (status) {
if (status !
|
| 限定符和类型 | 方法和说明 |
|---|---|
Page |
download(Request request,
Task task) |
protected String |
getPage(Request request) |
int |
getRetryNum() |
private void |
initPhantomjsCrawlPath() |
PhantomJSDownloader |
setRetryNum(int retryNum) |
void |
setThread(int threadNum) |
download, download, onError, onSuccessprivate static org.slf4j.Logger logger
private static String crawlJsPath
private static String phantomJsCommand
private int retryNum
private int threadNum
public PhantomJSDownloader()
public PhantomJSDownloader(String phantomJsCommand)
phantomJsCommand - phantomJsCommandpublic PhantomJSDownloader(String phantomJsCommand, String crawlJsPath)
crawl.js start --
var system = require('system');
var url = system.args[1];
var page = require('webpage').create();
page.settings.loadImages = false;
page.settings.resourceTimeout = 5000;
page.open(url, function (status) {
if (status != 'success') {
console.log("HTTP request failed!");
} else {
console.log(page.content);
}
page.close();
phantom.exit();
});
-- crawl.js end
具体项目时可以将以上js代码复制下来使用
example:
new PhantomJSDownloader("/your/path/phantomjs", "/your/path/crawl.js");phantomJsCommand - phantomJsCommandcrawlJsPath - crawlJsPathprivate void initPhantomjsCrawlPath()
public void setThread(int threadNum)
public int getRetryNum()
public PhantomJSDownloader setRetryNum(int retryNum)
Copyright © 2021. All rights reserved.